Comment by messe

4 months ago

> Another interesting choice in this project is to make lengths signed:

There are good reasons for this choice in C (and C++) due to broken integer promotion and casting rules.

See: "Subscripts and sizes should be signed" (Bjarne Stroustrup) https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0...

As a nice bonus, it means that ubsan traps on overflow (unsigned overflows just wrap).

45 comments

messe

uecker 4 months ago

I do not agree that the integer promotion or casting (?) rules are broken in C. That some people make mistakes because they do not know them is a different problem.

The reason you should make length signed is that you can use the sanitizer to find or mitigate overflow as you correctly observe, while unsigned wraparound leads to bugs which are basically impossible to find. But this has nothing to do with integer promotion and wraparound bugs can also create bugs in - say - Rust.

messe 4 months ago
I meant implicit casting, but I guess that really falls under promotion in most cases where it's relevant here (I'm on a train from Aarhus to Copenhagen right now to catch a flight, and I've slept considerably less than usual, so apologies if I'm making some slight mistakes).
The issues really arise when you mix signed/unsigned arithmetic and end up promoting everything to signed unexpectedly. That's usually "okay", as long as you're not doing arithmetic on anything smaller than an int.
As an aside, if you like C enough to have opinions on promotion rules then you might enjoy the programming language Zig. It's around the same level as C, but with much nicer ergonomics, and overflow traps by default in Debug/ReleaseSafe optimization modes. If you want explicit two's complement overflow it has +%, *% and -% variants of the usual arithmetic operations, as well as saturating +|, *|, -| variants that clamp to [minInt(T), maxInt(T)].
EDIT to the aside: it's also true if you hate C enough to have opinions on promotion rules.
- uecker 4 months ago
  
  I prefer C to Zig. IMHO all the successor languages throw out the baby with the bathwater and add unnecessary complexity. But Zig is much better than Rust, but, still, I would never use it for a serious project.
  The "promoting unexpectedly" is something I do not think happens if you know C well. At least, I can't remember ever having a bug because of this. In most cases the promotion prevents you from having a bug, because you do not get unexpected overflow or wraparound because your type is too small.
  Mixing signed and unsigned is problematic, but I see issues mostly in code from people who think they need to use unsigned when they shouldn't because they heard signed integers are dangerous. Recently I saw somebody "upgrading" a C code basis to C++ and also changing all loop variables to size_t. This caused a bug which he blamed on working on the "legacy C code" he is working on, although the original code was just fine. In general, there are compiler warnings that should catch issues with sign for conversions.
  
  2 replies →
- jacquesm 4 months ago
  
  Yes, this is one of the more subtle pitfalls of C. What helps is that in most contexts the value of 2 billion is large enough that a wraparound would be noticed almost immediately. But when it isn't then it can lead to very subtle errors that can propagate for a long time before anything goes off the rails that is noticed.
OneLessThing 4 months ago
It's interesting to hear these takes. I've never had problems catching unsigned wrap bugs with plain old memory sanitizers, though I must admit to not having a lot of experience with ubsan in particular. Maybe I should use it more.
- uecker 4 months ago
  
  GCC's sanitizer does not catch unsigned wraparound. But the bigger problem is that a lot of code is written where it assumes that unsigned wraps around and this is ok. So you you would use a sanitizer you get a lot of false positives. For signed overflow, one can always consider this a bug in portable C.
  Of course, if you consistently treat unsigned wraparound as a bug in your code, you can also use a sanitizer to screen for it. But in general I find it more practical to use signed integers for everything except for modular arithmetic where I use unsigned (and where wraparound is then expected and not a bug)
- jacquesm 4 months ago
  
  I've had some fun reviewing some very old code I wrote (1980's) to see what it looked like to me after such a long time of gaining experience. It's not unlike what the OP did here, it reads cleanly but I can see many issues that escaped my attention at the time. I always compared C with a very fast car: you can take some corners on two wheels but if you make a habit of that you're going to end up in a wall somewhere. That opinion has not changed.
  
  25 replies →
01HNNWZ0MV43FF 4 months ago

> That some people make mistakes because they do not know them is a different problem.
We can argue til we're blue in the face that people should just not make any mistakes, but history is against us - People will always make mistakes.
That's why surgeons are supposed to follow checklists and count their sponges in and out
Sukera 4 months ago

Could you expand on how these wraparound bugs happen in Rust? As far as I know, integer overflow panics (i.e. aborts) your code when compiled in debug mode, which I think is often used for testing.
bringbart 4 months ago

>while unsigned wraparound leads to bugs which are basically impossible to find.
What?
unsigned sizes are way easier to check, you just need one invariant:
if(x < capacity) // good to go
Always works, regardless how x is calculated and you never have to worry about undefined behavior when computing x. And the same invariant is used for forward and backward loops - some people bring up i >= 0 as a problem with unsigned, but that's because you should use i < n for backward loops as well, The One True Invariant.

kstenerud 4 months ago

Yup, unsigned math is just nasty.

Actually, unchecked math on an integer is going to be bad regardless of whether it's signed or unsigned. The difference is that with signed integers, your sanity check is simple and always the same and requires no thought for edge cases: `if(index < 0 || index > max)`. Plus ubsan, as mentioned above.

My policy is: Always use signed, unless you have a specific reason to use unsigned (such as memory addresses).

bringbart 4 months ago
unsigned is easier: 'if(index >= max)' and has fewer edge cases because you don't need to worry about undefined behavior when computing index.
- int_19h 4 months ago
  
  Just because it's UB doesn't mean it's not a problem, though. If you do unsigned arithmetics but don't account for the possibility of wraparound on overflow, the resulting value is well-defined, but it does you no good if you then try to e.g. index using it and cause a buffer overflow.
lelanthran 4 months ago
> The difference is that with signed integers, your sanity check is simple and always the same and requires no thought for edge cases: `if(index < 0 || index > max)`
Wait, what? How is that easier than `if (index > max)`?
- kstenerud 4 months ago
  
  Because if max is a calculated value, it could silently wrap around and leave index to cause a buffer overflow.
  Or if index is counting down, a calculated index could silently wrap around and cause the same issue.
  And if both are calculated and wrap around, you'll have fun debugging spooky action at a distance!
  If both are signed, that won't happen. You probably do have a bug if max or index is calculated to a negative value, but it's likely not an exploitable one.
  
  1 reply →

accelbred 4 months ago

If using C23, _BitInt allows for integer types without promotion.

user____name 4 months ago

I just put assertions to check the ranges of all sizes and indices upon function entry, doubles as documentation, and I mostly don't have to worry about signedness as a result.