Comment by jart

7 hours ago

It's honestly not that difficult to be rigorous. The things you mentioned in the blog post are pretty obvious forms of degenerate practices once you get used to seeing them. The best way to make your argument would be to bring up pointer overflow being ub. What's great about undefined behavior is that the C language doesn't require you to care. You can play fast and loose as much as you want. You can even use implicit types and yolo your app, writing C that more closely resembles JavaScript, just like how traditional k&r c devs did back in the day under an ilp32 model. Then you add the rigor later if you care about it. For most stuff, like an experiment, we obviously don't care, but when I do, I can usually one shot a file without any UB (which I check by reading the assembly output after building it with UBSAN) except there's just one thing that I usually can't eliminate, which is the compiler generating code that checks for pointer overflow. Because that's just such a ridiculous concept on modern machines which have a 56 bit address space. Maybe it mattered when coding for platforms like i8086. I've seen almost no code that cares about this. I have to sometimes, in my C library. It's important that functions like memchr() for example don't say `for (char *p = data, *e = data + size; p<e; ...` and instead say `for (size_t i = 0; i < n; ++i) ...data[i]...`. But these are just the skills you get with mastery, which is what makes it fun. Oh speaking of which, another fun thing everyone misses is the pitfalls of vectorization. You have to venture off into UB land in order to get better performance. But readahead can get you into trouble if you're trying to scan something like a string that's at the end of a memory page, where the subsequent page isn't mapped. My other favorite thing is designing code in such a way that the stack frame of any given function never exceeds 4096 bytes, and using alloca in a bounded way that pokes pages if it must be exceeded. If you want to have a fun time experiencing why the trickiness of UB rules are the way they are, try writing your own malloc() function that uses shorts and having it be on the stack, so you can have dynamic memory in a signal handler.

> For most stuff, like an experiment, we obviously don't care, but when I do, I can usually one shot a file without any UB (which I check by reading the assembly output after building it with UBSAN)

Does this depend on the project, or part of a project? I'm wondering how far that scales, I don't know labor intensive it is -- maybe you can just look at the output and see that nothing funny is happening?

> It's honestly not that difficult to be rigorous.

Ok, let's try it. I pointed GPT 5.5 at the smallest part of cosmopolitan as I could find in two seconds, net/finger. 299 lines.

describesyn.c:66: q + 13 constructs a pointer that can point well beyond the array plus one element.

C23 6.5.6p9:

> If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined

Now… you may be trolling, but I do feel like this disproves your assertion. Not you, not me, not Theo de Raadt, can avoid UB.

> the compiler generating code that checks for pointer overflow.

Do you need to check for that specifically? What pointer are you constructing that is not either pointing at a valid object correctly aligned (not UB), or exactly one past the element of an array?

Do you mean for the latter, in case you have an array that ends on the maximum expressible pointer address?

I'm a bit unclear on what you mean by "pointer overflow". From mentioning 56 bit address spaces I'm guessing you mean like the pointer wrapped, not what I pointed to in cosmopolitan, above?

Ok, to be clear that it's not just that one type, if you forgive that one:

net/http/base32.c:64: read sc[0] even if sl=0. I assume this is never called with sl=0, so could be fine.

net/http/ssh.c:355: pointer address underflow? Should that be `e - lp`?

net/http/ssh.c:209/229: double destroy of key. can this code path have non-null members, meaning double free? Looks like it, since line 207 does the parsing and checks that parse worked.

net/http/ssh.c:123: uses memset, which assumes that it sets member variable pointers to NULL (per my post, depending on that means depending on UB), and later these pointers are given to free(), so that's UB.

I won't look deeper into net/http, but presenting just the possibly incorrect remaining comments from jippity:

  - ssh.c:211 and parsecidr.c:44: length-taking APIs use unbounded strstr() / strchr(), so explicit n with non-NUL-terminated input can read beyond the buffer.

  - tokenbucket.c:77 and tokenbucket.c:92: x >> (32 - c) is UB for c == 0 and for out-of-range c.

  - isacceptablehost.c:68: long numeric host labels can overflow signed int b before the function eventually rejects/accepts the host.