← Back to context

Comment by uecker

13 hours ago

It is interesting that you find unsigned integers more intuitive. My experience (also with students, but also analysis of CVE give plenty of evidence) is that the opposite is true: signed integers in C are a model of integers which have a nice mathematical structure which people learn in elementary school. Yes, this breaks down on overflow, but for this you have to reach very high numbers and there is very good tooling to debug this. In contrast, unsigned integers in C are modulo arithmetic which people learn at university, if at all, and get wrong all the time, and the errors are mostly subtle and very difficult to find automatically.

You are right that often you need to constrain an integer to be non-negative or positive, but usually not during arithmetic, but at certain points in the logic of a program. But then in my experience it is better expressed as some assertion.

The fixed-size signed types of C etc are no more the actual integers, and no less modular arithmetic, than the fixed-size unsigned types. They both implement the exact same modular arithmetic for +, -, and *. It's only for other operations (ordering comparisons, or / and % which in turn are defined in C in terms of ordering structure, as in rounding towards zero) where they differ. And in either case, that ordering structure is not one commonly encountered anywhere outside of the context of fixed size computer arithmetic.

Ask an ordinary person what 3 * (1/3) or 3 * (-1/3) should come to, and they aren't going to say any of the results that you get in C for either signed or unsigned int types.

  • Singed integers in C do not implement modular arithmetic. You can trust me on this.

I work in bioinformatics. The numbers are typically large enough that you either have to think about numeric limits all the time if you use 32-bit integers (or bit-packed arrays), or you end up wasting (tens of) gigabytes with 64-bit integers.

I've also done a lot of succinct data structures, data compression, and things like that. When you manipulate the binary representation directly, it's easier to connect representation to unsigned semantics than to signed semantics.

Unsigned integers are usually integers modulo 2^n, which gives them a convenient algebraic structure. Whether you find that intuitive or not probably depends on your education. From my perspective, abstract algebra and discrete mathematics are things you learn in the first year of your CS degree.

  • Signed ints are also integers modulo 2^n, as concerns +, -, and *. Both unsigned and signed ints have the exact same modular arithmetic structure, for +, -, and *. It is only for other operations (ordering comparisons, or / and %) where they differ, and on these operations, neither signed nor unsigned ints have any convenient algebraic structure commonly encountered elsewhere.

Modulo arithmetic is taught to American students as clock arithmetic in elementary school as well. Signed integers are better described as truncated 2-adics, which are definitely an advanced topic. It's basically a Greenland/Iceland situation. The complicated, difficult one is given a friendly name to trap naive programmers.

  • Signed ints are no more and no less truncated 2-adics than unsigned ints. Signed ints and unsigned ints are isomorphic as rings; they are both the ring of integers modulo some 2^n. It is only on other operations (ordering comparisons, or the / and % of C) where they differ, and in neither case do these operations have anything to do with 2-adics.

    • It's a much more useful correspondence than just an isomorphism. Take your base-2 2-adic integer. Truncate it to whatever bitwidth you want. That's the two's complement representation of that number. Unsigned pops out as a special case, obviously.

      I've mostly seen this used by cryptographers to prove results over arbitrary bitwidths, since you can prove it over the p-adics and deal with truncation/division separately.

In Germany I learned module arithmetic at something like 13 years old in high school. I also find unsigned much more intuitive and in my code only use sized signed when I absolutely have to. Which is not often. Only in languages that offer automatic big Ints I just use ints, like Python and Haskell.