Hi Rich, using ptrdiff_t is (alas) the right thing to do: pointer subtraction returns that type, and if the result doesn't fit, you get UB. And ptrdiff_t is a signed type.
Assume you successfuly allocate an array "arr" with "sz" elements, where "sz" is of type "size_t". Then "arr + sz" is a valid expression (meaning the same as "&arr[sz]"), because it's OK to compute a pointer one past the last element of an array (but not to dereference it). Next you might be tempted to write "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"), and expect it to produce "sz", because it is valid to compute the element offset difference between two "pointers into an array or one past it". However, that difference is always signed, and if "sz" does not fit into "ptrdiff_t", you get UB from the pointer subtraction.
Given that the C standard (or even POSIX, AIUI) don't relate ptrdiff_t and size_t to each other, we need to restrict array element counts, before allocation, with two limits:
- nelem <= (size_t)-1 / sizeof(element_type)
- nelem <= PTRDIFF_MAX
(I forget which standard header #defines PTRDIFF_MAX; surpisingly, it is not <limits.h>.)
In general, neither condition implies the other. However, once you have enforced both, you can store the element count as either "size_t" or "ptrdiff_t".
I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.
You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!
The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.
It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)
C offers a different solution to the problem in Annex K of the standard. It provides a type `rsize_t`, which like `size_t` is unsigned, and has the same bit width, but where `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or smaller. You perform bounds checking as `<= RSIZE_MAX` to ensure that a value used for indexing is not in the range that would be considered negative if converted to a signed integer. A negative value provided where `rsize_t` is expected would fail the check `<= RSIZE_MAX`.
IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.
I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.
> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
You can, since the number of bits is the same. The mapping of pointer bits to signed integer bits will mean that you can't then do arithmetic on the resulting integers and get meaningful results, but the behavior of such shenanigans is already unspecified with no guarantees other than you can get an integer out of a pointer and then convert it back later.
But also, semantically, what does it even mean to convert a single pointer to a size? A size of an object is naturally defined as the count of chars between two pointers, one pointing at the beginning of the object, the other at its end. Which is to say, a size is a subset of pointer difference that just happens to always be non-negative. So long as the implementation guarantees that for no object that non-negative difference will always fit in a signed int of the appropriate size, it seems reasonable to reflect this in the types.
"Naturally, sizes should be unsigned because they represent values which cannot be unsigned."
Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.
> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
Why?
By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?
Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.
Skeeto and Stroustrup are a bit confused about valid index types. They prefer signed, which will lead to overflows on negative values, but have the advantage of using only half of the valid ranges, so there's more heap for the rest. Very confused
Hi Rich, using ptrdiff_t is (alas) the right thing to do: pointer subtraction returns that type, and if the result doesn't fit, you get UB. And ptrdiff_t is a signed type.
Assume you successfuly allocate an array "arr" with "sz" elements, where "sz" is of type "size_t". Then "arr + sz" is a valid expression (meaning the same as "&arr[sz]"), because it's OK to compute a pointer one past the last element of an array (but not to dereference it). Next you might be tempted to write "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"), and expect it to produce "sz", because it is valid to compute the element offset difference between two "pointers into an array or one past it". However, that difference is always signed, and if "sz" does not fit into "ptrdiff_t", you get UB from the pointer subtraction.
Given that the C standard (or even POSIX, AIUI) don't relate ptrdiff_t and size_t to each other, we need to restrict array element counts, before allocation, with two limits:
- nelem <= (size_t)-1 / sizeof(element_type)
- nelem <= PTRDIFF_MAX
(I forget which standard header #defines PTRDIFF_MAX; surpisingly, it is not <limits.h>.)
In general, neither condition implies the other. However, once you have enforced both, you can store the element count as either "size_t" or "ptrdiff_t".
From one of his other blogposts. "Guidelines for computing sizes and subscripts"
https://nullprogram.com/blog/2024/05/24/
https://www.youtube.com/watch?v=wvtFGa6XJDU
I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.
You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!
The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.
It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)
C offers a different solution to the problem in Annex K of the standard. It provides a type `rsize_t`, which like `size_t` is unsigned, and has the same bit width, but where `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or smaller. You perform bounds checking as `<= RSIZE_MAX` to ensure that a value used for indexing is not in the range that would be considered negative if converted to a signed integer. A negative value provided where `rsize_t` is expected would fail the check `<= RSIZE_MAX`.
IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.
I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf
1 reply →
> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
You can, since the number of bits is the same. The mapping of pointer bits to signed integer bits will mean that you can't then do arithmetic on the resulting integers and get meaningful results, but the behavior of such shenanigans is already unspecified with no guarantees other than you can get an integer out of a pointer and then convert it back later.
But also, semantically, what does it even mean to convert a single pointer to a size? A size of an object is naturally defined as the count of chars between two pointers, one pointing at the beginning of the object, the other at its end. Which is to say, a size is a subset of pointer difference that just happens to always be non-negative. So long as the implementation guarantees that for no object that non-negative difference will always fit in a signed int of the appropriate size, it seems reasonable to reflect this in the types.
"Naturally, sizes should be unsigned because they represent values which cannot be unsigned."
Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.
1 reply →
I dont know either.
int somearray[10];
new_ptr = somearray + signed_value;
or
element = somearray[signedvalue];
this seems almost criminal to how my brain does logic/C code.
The only thing i could think of is this:
somearray+=11; somearray[-1] // index set to somearray[10] ??
if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.
-Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.
As you stated tho, i'd be keen to learn why i am wrong!
3 replies →
> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
Why?
By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?
Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.
1 reply →
Stroustrup believes that signed should be preferred to unsigned even for values that can’t be less than zero: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...
2 replies →
Skeeto and Stroustrup are a bit confused about valid index types. They prefer signed, which will lead to overflows on negative values, but have the advantage of using only half of the valid ranges, so there's more heap for the rest. Very confused