Comment by poly2it

7 months ago

I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.

You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!

The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.

It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)

22 comments

poly2it

sparkie 7 months ago

C offers a different solution to the problem in Annex K of the standard. It provides a type `rsize_t`, which like `size_t` is unsigned, and has the same bit width, but where `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or smaller. You perform bounds checking as `<= RSIZE_MAX` to ensure that a value used for indexing is not in the range that would be considered negative if converted to a signed integer. A negative value provided where `rsize_t` is expected would fail the check `<= RSIZE_MAX`.

IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.

I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf

poly2it 7 months ago

This is an interesting middle ground. As ncruces pointed out in a sibling comment, the sign bit in a pointer cannot be set without contradicting the ptrdiff_t type. That makes this seem like a reasonable approach to storing sizes.

int_19h 7 months ago

> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

You can, since the number of bits is the same. The mapping of pointer bits to signed integer bits will mean that you can't then do arithmetic on the resulting integers and get meaningful results, but the behavior of such shenanigans is already unspecified with no guarantees other than you can get an integer out of a pointer and then convert it back later.

But also, semantically, what does it even mean to convert a single pointer to a size? A size of an object is naturally defined as the count of chars between two pointers, one pointing at the beginning of the object, the other at its end. Which is to say, a size is a subset of pointer difference that just happens to always be non-negative. So long as the implementation guarantees that for no object that non-negative difference will always fit in a signed int of the appropriate size, it seems reasonable to reflect this in the types.

ncruces 7 months ago

The correct integer type to use to store pointers (the only one specified to actually work) is uintptr_t, not size_t.

foldr 7 months ago

Stroustrup believes that signed should be preferred to unsigned even for values that can’t be less than zero: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

poly2it 7 months ago
I've of course read his argument before, and I think it might be more applicable to C++. I exclusively program in C, and in that regard, the relevant aspects as far as I can tell wouldn't be clearly in favour of a signed type. I also think his discussion on iterator signedness mixes issues with improper bounds checking and attributes it to the size type signedness. What remains I cannot see justify using the a signed type other than "just because". I'm not sure it's applicable to C.
- uecker 7 months ago
  
  I also prefer signed types in C for sizes and indices. You can screen for overflow bugs easily using UBSan (or use it to prevent exploitation).

sim7c00 7 months ago

I dont know either.

int somearray[10];

new_ptr = somearray + signed_value;

element = somearray[signedvalue];

this seems almost criminal to how my brain does logic/C code.

The only thing i could think of is this:

somearray+=11; somearray[-1] // index set to somearray[10] ??

if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.

-Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.

As you stated tho, i'd be keen to learn why i am wrong!

windward 7 months ago
In the implementation of something like a deque or merge sort, you could have a variable that represents offsets from pointers but which could sensibly be negative. C developers culturally aren't as particular about theoretical correctness of types as developers in some other languages - there's a lot of implicit casting being used - so you'll typically see an `int` used for this. If you do wish to bring some rigidity to your type system, you may argue that this value is distinct from a general integer which could be used for any arithmetic and definitely not just a pointer. So it should be a signed pointer difference.
Arrays aren't the best example, since they are inherently about linear, scalar offsets, but you might see a negative offset from the start of a (decayed) array in the implementation of an allocator with clobber canaries before and after the data.
- sim7c00 7 months ago
  
  thanks a lot for this. definitely gonna look at these sorts to see whats going on :D interesting!
mandarax8 7 months ago
Any kind of relative/offset pointers require negative pointer arithmetic. https://www.gingerbill.org/article/2020/05/17/relative-point...
- poly2it 7 months ago
  
  I don't think you can make such a broad statement and be correct in all cases. Negative pointer arithmetic is not by itself a reason to use signed types, except if you are:
  1. Certain your added value is negative.
  2. Checking for underflows after computation, which you shouldn't.
  The article was interesting.

ncruces 7 months ago

> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Why?

By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?

uecker 7 months ago

"Naturally, sizes should be unsigned because they represent values which cannot be unsigned."

Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.

TuxSH 7 months ago
So do signed types (guaranteed to be 2's complement since C23, though all sane targets have been using that representation for a long time).
Two's complement encodes -x as ~x + 1 = 2^n - x = -x (mod 2^n) and can therefore be mixed with unsigned for (+, -, *, &, |, ^, ~, <<).
> I think they should be used exclusively when this is needed
The opposite: signed type usage should be kept to a minimum because signed type (and pointer) overflow is UB and will get optimized as such.
- uecker 7 months ago
  
  For signed types you can can also get a run-time trap on overflow, making it safe to use. Bug caused by unsigned wraparound are extremely hard to find.
enqk 7 months ago
yeah unsigned are really about opting to perform modular arithmetic, or for mapping hardware registers.
C is weakly typed, the basic types are really not to maintain invariants or detect their violation
- AlotOfReading 7 months ago
  
  If you need saturating arithmetic, C23 adds that as well with a new header: https://en.cppreference.com/w/c/header/stdckdint

windward 7 months ago

Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.

poly2it 7 months ago

Yes, but there are definitely cases where this doesn't apply, for example when deriving an offset from a user pointer. As such this is not a universal solution.

user____name 7 months ago

FWIW If you don't care about portability and target x64 you only get 48 bits worth of virtual address in the MMU.