← Back to context

Comment by uecker

1 day ago

Having them available is not the issue, using them for sizes and indices is what causes a lot of tricky bugs.

I find it the opposite. Unsigned integers are intuitive, while signed integers are unintuitive and cause a lot of tricky bugs. Especially in languages, where signed overflow is undefined behavior.

It's pretty rare to have values that can be negative but are always integers. At least in the work I do. The most common case I encounter are approximations of something related to log probability. Such as various scores in dynamic programming and graph algorithms.

Most of the time, when you deal with integers, you need special handling to avoid negative values. Once you get used to thinking about unsigned integers, you quickly develop robust ways of avoiding situations where the values would be negative.

  • It is interesting that you find unsigned integers more intuitive. My experience (also with students, but also analysis of CVE give plenty of evidence) is that the opposite is true: signed integers in C are a model of integers which have a nice mathematical structure which people learn in elementary school. Yes, this breaks down on overflow, but for this you have to reach very high numbers and there is very good tooling to debug this. In contrast, unsigned integers in C are modulo arithmetic which people learn at university, if at all, and get wrong all the time, and the errors are mostly subtle and very difficult to find automatically.

    You are right that often you need to constrain an integer to be non-negative or positive, but usually not during arithmetic, but at certain points in the logic of a program. But then in my experience it is better expressed as some assertion.

    • The fixed-size signed types of C etc are no more the actual integers, and no less modular arithmetic, than the fixed-size unsigned types. They both implement the exact same modular arithmetic for +, -, and *. It's only for other operations (ordering comparisons, or / and % which in turn are defined in C in terms of ordering structure, as in rounding towards zero) where they differ. And in either case, that ordering structure is not one commonly encountered anywhere outside of the context of fixed size computer arithmetic.

      Ask an ordinary person what 3 * (1/3) or 3 * (-1/3) should come to, and they aren't going to say any of the results that you get in C for either signed or unsigned int types.

    • Modulo arithmetic is taught to American students as clock arithmetic in elementary school as well. Signed integers are better described as truncated 2-adics, which are definitely an advanced topic. It's basically a Greenland/Iceland situation. The complicated, difficult one is given a friendly name to trap naive programmers.

      1 reply →

    • I work in bioinformatics. The numbers are typically large enough that you either have to think about numeric limits all the time if you use 32-bit integers (or bit-packed arrays), or you end up wasting (tens of) gigabytes with 64-bit integers.

      I've also done a lot of succinct data structures, data compression, and things like that. When you manipulate the binary representation directly, it's easier to connect representation to unsigned semantics than to signed semantics.

      Unsigned integers are usually integers modulo 2^n, which gives them a convenient algebraic structure. Whether you find that intuitive or not probably depends on your education. From my perspective, abstract algebra and discrete mathematics are things you learn in the first year of your CS degree.

      1 reply →

    • In Germany I learned module arithmetic at something like 13 years old in high school. I also find unsigned much more intuitive and in my code only use sized signed when I absolutely have to. Which is not often. Only in languages that offer automatic big Ints I just use ints, like Python and Haskell.

Why does an unsigned type for sizes or indices fare worse than a signed type? When do I want the -247th element in an array? When do I have a block that is -10 bytes in size?

  • Because doing subtraction on sizes/indicies is common, and signed handles the case where you subtract below 0. Unsigned yields unintuitive results. i.e, unsigned fails silently. For example, looping to the 2nd to last item in an array or getting the index before the given index.

    The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently.

    If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.

    • > The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently.

      C’s implicit casts are tripping you up. Unsigned ints can’t be negative, but C will happily let you assign a negative signed int to an unsigned int variable, but the moment it is assigned it ceases to be negative. In serious programming languages this implicit assignment is forbidden—you have to explicitly cast.

      > For example, looping to the 2nd to last item in an array or getting the index before the given index.

      I don’t understand what you mean here, can you clarify?

      > If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.

      Unsigned integers are the compile time side of the coin, but yes you may want to take care to enforce it at runtime as well, though this typically implies a performance penalty that most don’t want to pay.

      3 replies →

  • the reason is not that you want a negative index or size, but that you want the computation of the index to be correct, and you want to have obvious errors. Both turns out to be easier with signed types.

  • There are (rare) times when you want negative array indices. C lets you index in both directions from a pointer to the middle of an array. That's why array indexing is signed in C. Some libc ctypes lookup tables do this. For sizing there is no strong case for negatives other than to shoehorn them into signed operations.

  • > When do I want the -247th element in an array?

    You never want any element of an array, except elements within the range [0, array_length). Anything outside of that is undefined behavior.

    I think people tend to overthink this. A function which takes an index argument, should simply return a result when the index is within the valid range, and error if it's outside of it (regardless of whether it's outside by being too low or too high). It doesn't particularly matter that the integer is signed.

    If you aren't storing 2^64 elements in your array (which you probably aren't - most systems don't even support addressing that much memory) then the only thing unsigned gets you is a bunch of footguns (like those described in the OP article).