[PATCH] fix possible overflow in pointer arithmetics strnlen()

newer
[PATCH] features.h: error out when...

Marcus Haehnel

7 Jan 2025 7 Jan '25

12:09 p.m.

From: Frank Mehnert <frank.mehnert@kernkonzept.com> It is undefined behavior to compare two pointers belonging to different objects. This includes the case where the addition overflows. Clang-20 seems to follow this rule more eagerly and optimizes away the old test. Fix the test by performing the addition on uintptr_t values rather than on on char pointers. See also https://github.com/llvm/llvm-project/issues/121909. Signed-off-by: Marcus Haehnel <marcus.haehnel@kernkonzept.com> --- libc/string/generic/strnlen.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libc/string/generic/strnlen.c b/libc/string/generic/strnlen.c index 4d4cde84f..82d4122ec 100644 --- a/libc/string/generic/strnlen.c +++ b/libc/string/generic/strnlen.c @@ -29,15 +29,17 @@ '\0' terminator is found in that many characters, return MAXLEN. */ size_t strnlen (const char *str, size_t maxlen) { - const char *char_ptr, *end_ptr = str + maxlen; + const char *char_ptr, *end_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic; if (maxlen == 0) return 0; - if (__builtin_expect (end_ptr < str, 0)) + if (__builtin_expect ((uintptr_t)str + maxlen < (uintptr_t)str, 0)) end_ptr = (const char *) ~0UL; + else + end_ptr = str + maxlen; /* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ -- 2.47.1

Show replies by date

Kjetil Oftedal

7 Jan 7 Jan

12:29 p.m.

Hi, Can you point to where in the C standard this is treated as undefined behavior? Best regards, Kjetil Oftedal On Tue, 7 Jan 2025 at 13:10, Marcus Haehnel <marcus.haehnel@kernkonzept.com> wrote:

...

From: Frank Mehnert <frank.mehnert@kernkonzept.com>

It is undefined behavior to compare two pointers belonging to different objects. This includes the case where the addition overflows. Clang-20 seems to follow this rule more eagerly and optimizes away the old test.

Fix the test by performing the addition on uintptr_t values rather than on on char pointers.

See also https://github.com/llvm/llvm-project/issues/121909.

Signed-off-by: Marcus Haehnel <marcus.haehnel@kernkonzept.com> --- libc/string/generic/strnlen.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libc/string/generic/strnlen.c b/libc/string/generic/strnlen.c index 4d4cde84f..82d4122ec 100644 --- a/libc/string/generic/strnlen.c +++ b/libc/string/generic/strnlen.c @@ -29,15 +29,17 @@ '\0' terminator is found in that many characters, return MAXLEN. */ size_t strnlen (const char *str, size_t maxlen) { - const char *char_ptr, *end_ptr = str + maxlen; + const char *char_ptr, *end_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic;

if (maxlen == 0) return 0;

- if (__builtin_expect (end_ptr < str, 0)) + if (__builtin_expect ((uintptr_t)str + maxlen < (uintptr_t)str, 0)) end_ptr = (const char *) ~0UL; + else + end_ptr = str + maxlen;

/* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ -- 2.47.1

_______________________________________________ devel mailing list -- devel@uclibc-ng.org To unsubscribe send an email to devel-leave@uclibc-ng.org

Frank Mehnert

12:38 p.m.

There is a longer discussion on stack overflow: https://stackoverflow.com/questions/6702161/pointer-comparisons-in-c-are-the... See the remarks about "comparison of pointers belonging to different objects". See also the remark in the LLVM ticket from 'nikic' about the overflow being the same case of undefined behavior. IMO this makes sense: If we add a huge offset to a pointer triggering an overflow, the compiler may indeed assume that the result does not point to the same object as the original pointer. Kind regards, Frank On Dienstag, 7. Januar 2025 13:29:33 MEZ Kjetil Oftedal wrote:

...

Hi,

Can you point to where in the C standard this is treated as undefined behavior?

Best regards, Kjetil Oftedal

On Tue, 7 Jan 2025 at 13:10, Marcus Haehnel <marcus.haehnel@kernkonzept.com> wrote:

...
From: Frank Mehnert <frank.mehnert@kernkonzept.com>

It is undefined behavior to compare two pointers belonging to different objects. This includes the case where the addition overflows. Clang-20 seems to follow this rule more eagerly and optimizes away the old test.

Fix the test by performing the addition on uintptr_t values rather than on on char pointers.

See also https://github.com/llvm/llvm-project/issues/121909.

Signed-off-by: Marcus Haehnel <marcus.haehnel@kernkonzept.com> --- libc/string/generic/strnlen.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libc/string/generic/strnlen.c b/libc/string/generic/strnlen.c index 4d4cde84f..82d4122ec 100644 --- a/libc/string/generic/strnlen.c +++ b/libc/string/generic/strnlen.c @@ -29,15 +29,17 @@ '\0' terminator is found in that many characters, return MAXLEN. */ size_t strnlen (const char *str, size_t maxlen) { - const char *char_ptr, *end_ptr = str + maxlen; + const char *char_ptr, *end_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic;

if (maxlen == 0) return 0;

- if (__builtin_expect (end_ptr < str, 0)) + if (__builtin_expect ((uintptr_t)str + maxlen < (uintptr_t)str, 0)) end_ptr = (const char *) ~0UL; + else + end_ptr = str + maxlen;

/* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ -- 2.47.1

_______________________________________________ devel mailing list -- devel@uclibc-ng.org To unsubscribe send an email to devel-leave@uclibc-ng.org

-- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

1 p.m.

Hi, How can the compiler in this case assume that endptr and str do not point to the same array object? ISO C99 6.5,8 Relational operators 5) ... "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined" How does the compiler at compile time know that endptr which is equivalent is to str[maxlen] is not within the range of string declared by str? Or in the terms of the above standard, how does it know that endptr is not Q+1? Best regards, Kjetil Oftedal On Tue, 7 Jan 2025 at 13:38, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

There is a longer discussion on stack overflow:

https://stackoverflow.com/questions/6702161/pointer-comparisons-in-c-are-the...

See the remarks about "comparison of pointers belonging to different objects".

See also the remark in the LLVM ticket from 'nikic' about the overflow being the same case of undefined behavior. IMO this makes sense: If we add a huge offset to a pointer triggering an overflow, the compiler may indeed assume that the result does not point to the same object as the original pointer.

Kind regards,

Frank

On Dienstag, 7. Januar 2025 13:29:33 MEZ Kjetil Oftedal wrote:

...
Hi,

Can you point to where in the C standard this is treated as undefined behavior?

Best regards, Kjetil Oftedal

On Tue, 7 Jan 2025 at 13:10, Marcus Haehnel <marcus.haehnel@kernkonzept.com> wrote:

...
From: Frank Mehnert <frank.mehnert@kernkonzept.com>

It is undefined behavior to compare two pointers belonging to different objects. This includes the case where the addition overflows. Clang-20 seems to follow this rule more eagerly and optimizes away the old test.

Fix the test by performing the addition on uintptr_t values rather than on on char pointers.

See also https://github.com/llvm/llvm-project/issues/121909.

Signed-off-by: Marcus Haehnel <marcus.haehnel@kernkonzept.com> --- libc/string/generic/strnlen.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libc/string/generic/strnlen.c b/libc/string/generic/strnlen.c index 4d4cde84f..82d4122ec 100644 --- a/libc/string/generic/strnlen.c +++ b/libc/string/generic/strnlen.c @@ -29,15 +29,17 @@ '\0' terminator is found in that many characters, return MAXLEN. */ size_t strnlen (const char *str, size_t maxlen) { - const char *char_ptr, *end_ptr = str + maxlen; + const char *char_ptr, *end_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic;

if (maxlen == 0) return 0;

- if (__builtin_expect (end_ptr < str, 0)) + if (__builtin_expect ((uintptr_t)str + maxlen < (uintptr_t)str, 0)) end_ptr = (const char *) ~0UL; + else + end_ptr = str + maxlen;

/* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ -- 2.47.1

_______________________________________________ devel mailing list -- devel@uclibc-ng.org To unsubscribe send an email to devel-leave@uclibc-ng.org

-- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

_______________________________________________ devel mailing list -- devel@uclibc-ng.org To unsubscribe send an email to devel-leave@uclibc-ng.org

Frank Mehnert

1:38 p.m.

Hi Kjetil, On Dienstag, 7. Januar 2025 14:00:16 MEZ Kjetil Oftedal wrote:

...

How can the compiler in this case assume that endptr and str do not point to the same array object?

ISO C99 6.5,8 Relational operators 5) ... "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined"

How does the compiler at compile time know that endptr which is equivalent is to str[maxlen] is not within the range of string declared by str? Or in the terms of the above standard, how does it know that endptr is not Q+1?

[...]

The compiler does not need to know! Let's consider the pointer 'x', the offset 'offs' and y = x + offset with y < x (because offset is huge). So if y < (x + offset) then y points to a different object by definition. Hence, the compiler is allowed to remove this test. Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

1:54 p.m.

On Tue, 7 Jan 2025 at 14:39, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

Hi Kjetil,

On Dienstag, 7. Januar 2025 14:00:16 MEZ Kjetil Oftedal wrote:

...
How can the compiler in this case assume that endptr and str do not point to the same array object?

ISO C99 6.5,8 Relational operators 5) ... "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined"

How does the compiler at compile time know that endptr which is equivalent is to str[maxlen] is not within the range of string declared by str? Or in the terms of the above standard, how does it know that endptr is not Q+1?

[...]

The compiler does not need to know!

Let's consider the pointer 'x', the offset 'offs' and

y = x + offset

with y < x (because offset is huge).

So if y < (x + offset) then y points to a different object by definition. Hence, the compiler is allowed to remove this test.

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Correspondingly y = x + offset with y < x (Offset is small E.g 1). So if y < (x + offset) and offset is 1, then the Q+1 rule holds, and the compiler is not allowed to remove this test as it is well defined. This cannot be assumed at compile time, thus the test needs to be done to respect the requirements of the C standards. Best regards, Kjetil Oftedal

Frank Mehnert

2:33 p.m.

On Dienstag, 7. Januar 2025 14:54:54 MEZ Kjetil Oftedal wrote:

...

On Tue, 7 Jan 2025 at 14:39, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
Hi Kjetil,

On Dienstag, 7. Januar 2025 14:00:16 MEZ Kjetil Oftedal wrote:

...
How can the compiler in this case assume that endptr and str do not point to the same array object?

ISO C99 6.5,8 Relational operators 5) ... "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined"

How does the compiler at compile time know that endptr which is equivalent is to str[maxlen] is not within the range of string declared by str? Or in the terms of the above standard, how does it know that endptr is not Q+1?

[...]

The compiler does not need to know!

Let's consider the pointer 'x', the offset 'offs' and

y = x + offset

with y < x (because offset is huge).

So if y < (x + offset) then y points to a different object by definition. Hence, the compiler is allowed to remove this test.

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Correspondingly y = x + offset with y < x (Offset is small E.g 1).

So if y < (x + offset) and offset is 1, then the Q+1 rule holds, and the compiler is not allowed to remove this test as it is well defined.

This cannot be assumed at compile time, thus the test needs to be done to respect the requirements of the C standards.

I think you mix things up: The error here is the pointer arithmetic, not the comparison. With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1! The comparison tests the wrap around and adapts end_ptr in that case. Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds. Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

2:55 p.m.

On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

On Dienstag, 7. Januar 2025 14:54:54 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 14:39, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
Hi Kjetil,

On Dienstag, 7. Januar 2025 14:00:16 MEZ Kjetil Oftedal wrote:

...
How can the compiler in this case assume that endptr and str do not point to the same array object?

ISO C99 6.5,8 Relational operators 5) ... "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined"

How does the compiler at compile time know that endptr which is equivalent is to str[maxlen] is not within the range of string declared by str? Or in the terms of the above standard, how does it know that endptr is not Q+1?

[...]

The compiler does not need to know!

Let's consider the pointer 'x', the offset 'offs' and

y = x + offset

with y < x (because offset is huge).

So if y < (x + offset) then y points to a different object by definition. Hence, the compiler is allowed to remove this test.

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Correspondingly y = x + offset with y < x (Offset is small E.g 1).

So if y < (x + offset) and offset is 1, then the Q+1 rule holds, and the compiler is not allowed to remove this test as it is well defined.

This cannot be assumed at compile time, thus the test needs to be done to respect the requirements of the C standards.

I think you mix things up: The error here is the pointer arithmetic, not the comparison.

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Hi, I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code) And this is what I disagree with. If the code have defined behaviour by the C standard, for some input arguments, then it must treat the code as being well behaved and omit code as such. If I have some special machine where I can use every single byte of the address space.. (For a 32-bit machine that could provide the code with the ability to declare a 4GB continuous string/char array) And in this case should not str < end_of_str and str+X < end_of_str always be true, even with overflow wraparounds? Best regards, Kjetil Oftedal

Frank Mehnert

3:18 p.m.

On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...

On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
[...]

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion: There is a test in the code. The result of the test can _only_ be true if the code before the test triggered undefined behavior. In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior. Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true. If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out. TBH, I'm still convinced that the compiler is right. :-) Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

3:45 p.m.

On Tue, 7 Jan 2025 at 16:19, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
[...]

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion:

There is a test in the code.

The result of the test can _only_ be true if the code before the test triggered undefined behavior.

In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior.

Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true.

If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out.

TBH, I'm still convinced that the compiler is right. :-)

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Let us flip the variables around and review: E.g. if ( str < end_ptr ) instead of if ( end_ptr < str ) Should the compiler then always evaluate the clause to true? end_ptr might still be lower than str due to overflow in the artihmetic. Best regards, Kjetil Oftedal

Frank Mehnert

4:09 p.m.

On Dienstag, 7. Januar 2025 16:45:03 MEZ Kjetil Oftedal wrote:

...

On Tue, 7 Jan 2025 at 16:19, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
[...]

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion:

There is a test in the code.

The result of the test can _only_ be true if the code before the test triggered undefined behavior.

In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior.

Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true.

If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out.

[...]

Let us flip the variables around and review: E.g. if ( str < end_ptr ) instead of if ( end_ptr < str )

Should the compiler then always evaluate the clause to true? end_ptr might still be lower than str due to overflow in the artihmetic.

Not sure what you mean by "then always evaluate the clause to true". Are you asking if the compiler may assume that (str < end_ptr) is always true because only this is "defined behavior"? No, the compiler cannot assume that. But this is no contradiction to what I said before. The compiler does the following steps for the original problem, (end_ptr < str): 1. Generate code for the test. 2. The test result can either be true or false. The compiler generates code for both cases. 3. No code needs to be generated for the case "false". 4. Generate code for the case "true". 5. Optimize the code. The compiler observes that the "false" case can only happen if the previous calculation triggered undefined behavior (str and end_ptr point to different objects). In your case (str < end_ptr), the compiler still needs to generate code for the test because the compiler must assume that both pointers (str and end_ptr) belong to the same object, therefore the test is valid. In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects! Kind regards, Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

4:38 p.m.

On Tue, 7 Jan 2025 at 17:09, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

On Dienstag, 7. Januar 2025 16:45:03 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 16:19, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
[...]

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion:

There is a test in the code.

The result of the test can _only_ be true if the code before the test triggered undefined behavior.

In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior.

Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true.

If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out.

[...]

Let us flip the variables around and review: E.g. if ( str < end_ptr ) instead of if ( end_ptr < str )

Should the compiler then always evaluate the clause to true? end_ptr might still be lower than str due to overflow in the artihmetic.

Not sure what you mean by "then always evaluate the clause to true". Are you asking if the compiler may assume that (str < end_ptr) is always true because only this is "defined behavior"?

No, the compiler cannot assume that. But this is no contradiction to what I said before.

The compiler does the following steps for the original problem, (end_ptr < str):

1. Generate code for the test. 2. The test result can either be true or false. The compiler generates code for both cases. 3. No code needs to be generated for the case "false". 4. Generate code for the case "true". 5. Optimize the code. The compiler observes that the "false" case can only happen if the previous calculation triggered undefined behavior (str and end_ptr point to different objects).

In your case (str < end_ptr), the compiler still needs to generate code for the test because the compiler must assume that both pointers (str and end_ptr) belong to the same object, therefore the test is valid.

In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

Kind regards,

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Is this true for all cases for strings?

...

In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

--- const char* str = "Hello World!" const char* sub_str = strstr(str, "Wor"); strnlen(sub_str, X); --- For some values of X end_ptr is still pointing to a valid array entry for the original "Hello World" string, even if it is not within the "World!" substring. Is it then not pointing to a valid array of objects? (E.g if X is the unsigned representation of -1,-2,...-6) Best regards, Kjetil Oftedal

Frank Mehnert

5:07 p.m.

On Dienstag, 7. Januar 2025 17:38:52 MEZ Kjetil Oftedal wrote:

...

On Tue, 7 Jan 2025 at 17:09, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 16:45:03 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 16:19, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
[...]

With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to the same object as x. There is no other value of x which could lead to a result y<x with offset=1!

The comparison tests the wrap around and adapts end_ptr in that case.

Again: If there was a wrap around, then the pointer arithmetic operation was wrong (undefined behavior). Hence, we can remove the test plus the code which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion:

There is a test in the code.

The result of the test can _only_ be true if the code before the test triggered undefined behavior.

In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior.

Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true.

If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out.

[...]

Let us flip the variables around and review: E.g. if ( str < end_ptr ) instead of if ( end_ptr < str )

Should the compiler then always evaluate the clause to true? end_ptr might still be lower than str due to overflow in the artihmetic.

Not sure what you mean by "then always evaluate the clause to true". Are you asking if the compiler may assume that (str < end_ptr) is always true because only this is "defined behavior"?

No, the compiler cannot assume that. But this is no contradiction to what I said before.

The compiler does the following steps for the original problem, (end_ptr < str):

1. Generate code for the test. 2. The test result can either be true or false. The compiler generates code for both cases. 3. No code needs to be generated for the case "false". 4. Generate code for the case "true". 5. Optimize the code. The compiler observes that the "false" case can only happen if the previous calculation triggered undefined behavior (str and end_ptr point to different objects).

In your case (str < end_ptr), the compiler still needs to generate code for the test because the compiler must assume that both pointers (str and end_ptr) belong to the same object, therefore the test is valid.

In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

[...]

Is this true for all cases for strings?

...
In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

--- const char* str = "Hello World!" const char* sub_str = strstr(str, "Wor");

strnlen(sub_str, X); ---

For some values of X end_ptr is still pointing to a valid array entry for the original "Hello World" string, even if it is not within the "World!" substring. Is it then not pointing to a valid array of objects? (E.g if X is the unsigned representation of -1,-2,...-6)

OK, I think you have a point. Basically this derives into the question if adding a huge offset to a pointer is the same as subtracting a small offset from a pointer. So far I couldn't find any C++ rule related to this problem. Kind regards, Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

5:18 p.m.

On Tue, 7 Jan 2025 at 18:07, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

On Dienstag, 7. Januar 2025 17:38:52 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 17:09, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 16:45:03 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 16:19, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 15:55:28 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 15:33, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote: > [...] > > With offset=1 and x=0xffffffff'fffffff, y will 0, so y does not belong to > the same object as x. There is no other value of x which could lead to a > result y<x with offset=1! > > The comparison tests the wrap around and adapts end_ptr in that case. > > Again: If there was a wrap around, then the pointer arithmetic operation > was wrong (undefined behavior). Hence, we can remove the test plus the code > which is executed if the test succeeds.

[...]

I see the point. I just don't agree with the conclusion. As I see it the LLVM group is claiming that if there is any possiblity for UB, even if it is dependent on input arguments, then it is always UB, and the compiler can remove code as it sees fit. (Which will be a nightmare for security code)

I admit I thought so some time ago but in my opinion you do a wrong conclusion:

There is a test in the code.

The result of the test can _only_ be true if the code before the test triggered undefined behavior.

In other words, if the test succeeded, then we can be sure that the test compared two pointers belonging to different objects -- which is undefined behavior.

Therefore, the compiler is fine removing the test because the compiler does not support the case where the test result is true.

If the test result is false, everything is fine, both objects may or may not point to different objects, but the compiler assumes that they do. But: If the test result is false, end_ptr does not need an adaption, therefore that line can be optimized out.

[...]

Let us flip the variables around and review: E.g. if ( str < end_ptr ) instead of if ( end_ptr < str )

Should the compiler then always evaluate the clause to true? end_ptr might still be lower than str due to overflow in the artihmetic.

Not sure what you mean by "then always evaluate the clause to true". Are you asking if the compiler may assume that (str < end_ptr) is always true because only this is "defined behavior"?

No, the compiler cannot assume that. But this is no contradiction to what I said before.

The compiler does the following steps for the original problem, (end_ptr < str):

1. Generate code for the test. 2. The test result can either be true or false. The compiler generates code for both cases. 3. No code needs to be generated for the case "false". 4. Generate code for the case "true". 5. Optimize the code. The compiler observes that the "false" case can only happen if the previous calculation triggered undefined behavior (str and end_ptr point to different objects).

In your case (str < end_ptr), the compiler still needs to generate code for the test because the compiler must assume that both pointers (str and end_ptr) belong to the same object, therefore the test is valid.

In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

[...]

Is this true for all cases for strings?

...
In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

--- const char* str = "Hello World!" const char* sub_str = strstr(str, "Wor");

strnlen(sub_str, X); ---

For some values of X end_ptr is still pointing to a valid array entry for the original "Hello World" string, even if it is not within the "World!" substring. Is it then not pointing to a valid array of objects? (E.g if X is the unsigned representation of -1,-2,...-6)

OK, I think you have a point.

Basically this derives into the question if adding a huge offset to a pointer is the same as subtracting a small offset from a pointer. So far I couldn't find any C++ rule related to this problem.

Kind regards,

Frank -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Hi, I think it is purely academic as this point though :) The change is probably fine. Just a bit annoyed that clang forces changes to the code in a lot of projects, instead of clang accepting defacto standard patterns. Best regards, Kjetil Oftedal

Frank Mehnert

8 Jan 8 Jan

10:42 a.m.

Hi Kjetil, On Dienstag, 7. Januar 2025 18:18:14 MEZ Kjetil Oftedal wrote:

...

On Tue, 7 Jan 2025 at 18:07, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 17:38:52 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 17:09, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

[...]

Is this true for all cases for strings?

...
In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

--- const char* str = "Hello World!" const char* sub_str = strstr(str, "Wor");

strnlen(sub_str, X); ---

For some values of X end_ptr is still pointing to a valid array entry for the original "Hello World" string, even if it is not within the "World!" substring. Is it then not pointing to a valid array of objects? (E.g if X is the unsigned representation of -1,-2,...-6)

OK, I think you have a point.

Basically this derives into the question if adding a huge offset to a pointer is the same as subtracting a small offset from a pointer. So far I couldn't find any C++ rule related to this problem.

Hi,

I think it is purely academic as this point though :)

The change is probably fine. Just a bit annoyed that clang forces changes to the code in a lot of projects, instead of clang accepting defacto standard patterns.

The LLVM ticket [1] was updated and Andrew Pinski noted that the the Linux kernel, -fwrapv-pointer was added to keep the "defacto standard patterns". So compiling with -fno-strict-overflow (which included -fwrapv-pointer) would most likely make my fix superfluous. So it's basically about pointer arithmetics warp around. But I hope there is agreement now that my change makes the code "more correct". Kind regards Frank [1] https://github.com/llvm/llvm-project/issues/121909 -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224 Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Kjetil Oftedal

12:57 p.m.

On Wed, 8 Jan 2025 at 11:42, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...

Hi Kjetil,

On Dienstag, 7. Januar 2025 18:18:14 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 18:07, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

...
On Dienstag, 7. Januar 2025 17:38:52 MEZ Kjetil Oftedal wrote:

...
On Tue, 7 Jan 2025 at 17:09, Frank Mehnert <frank.mehnert@kernkonzept.com> wrote:

[...]

Is this true for all cases for strings?

...
In the original case (end_ptr < str), the compiler knows for sure that both pointers belong to different objects!

--- const char* str = "Hello World!" const char* sub_str = strstr(str, "Wor");

strnlen(sub_str, X); ---

For some values of X end_ptr is still pointing to a valid array entry for the original "Hello World" string, even if it is not within the "World!" substring. Is it then not pointing to a valid array of objects? (E.g if X is the unsigned representation of -1,-2,...-6)

OK, I think you have a point.

Basically this derives into the question if adding a huge offset to a pointer is the same as subtracting a small offset from a pointer. So far I couldn't find any C++ rule related to this problem.

Hi,

I think it is purely academic as this point though :)

The change is probably fine. Just a bit annoyed that clang forces changes to the code in a lot of projects, instead of clang accepting defacto standard patterns.

The LLVM ticket [1] was updated and Andrew Pinski noted that the the Linux kernel, -fwrapv-pointer was added to keep the "defacto standard patterns". So compiling with -fno-strict-overflow (which included -fwrapv-pointer) would most likely make my fix superfluous.

So it's basically about pointer arithmetics warp around.

But I hope there is agreement now that my change makes the code "more correct".

Kind regards

Frank

[1] https://github.com/llvm/llvm-project/issues/121909 -- Dr.-Ing. Frank Mehnert, frank.mehnert@kernkonzept.com, +49-351-41 883 224

Kernkonzept GmbH. Sitz: Dresden. Amtsgericht Dresden, HRB 31129. Geschäftsführer: Dr.-Ing. Michael Hohmuth

Sure. I enjoyed the discussion. There is no objection from me on the change. Best regards, Kjetil Oftedal

351

Age (days ago)

352

Last active (days ago)

List overview

Download

15 comments

3 participants

participants (3)

Frank Mehnert
Kjetil Oftedal
Marcus Haehnel

[PATCH] fix possible overflow in pointer arithmetics strnlen()

tags

participants (3)