Comment by proto_lambda
3 years ago
Lying requires intent. This was a mistake, something that humans are well-known for making, and if the compiler is designed to assume otherwise, it borders on useless in the real world.
3 years ago
Lying requires intent. This was a mistake, something that humans are well-known for making, and if the compiler is designed to assume otherwise, it borders on useless in the real world.
A compiler can’t know why you fucked up, it can’t even know that you fucked up, because UBs are just ways for it to infer and propagate constraints.
If an optimising C compiler can’t rely on UBs not happening, its potential is severely cut down due to the dearth of useful information provided by C’s type system.
> A compiler can’t know why you fucked up, it can’t even know that you fucked up, because UBs are just ways for it to infer and propagate constraints.
To be honest, that's just how compiler writers interpret UB these days.
It's perfectly possible (in principle) to use lots of more sophisticated static and dynamic analysis to recover much of what C compiler just assume. You don't have to restrict yourself to what C's type system provides.
(For an example of what's possible, have a look at all the great techniques employed to make JavaScript as fast as possible. They have basically no static types to work with at all.)
> For an example of what's possible, have a look at all the great techniques employed to make JavaScript as fast as possible. They have basically no static types to work with at all.
I’m sure people will be very happy with a C JIT. That’s definitely what they use C for.
JIT-ed code is full of runtime type and range assertions which bail if the compiler’s assumptions are incorrect.
8 replies →
C has always considered that the programmer knows what they are doing. Programs are assumed correct unless proven invalid.
This is -- or at least was -- a feature, not a bug. You can implement any valid program, but you can also implement some invalid programs.
I know the OP mentioned Rust, but it's a valid comparison: if you don't invoke "unsafe" then all your behaviour is well-defined. But the trade-off is that Rust will only let you implement a subset of valid programs unless you invoke "unsafe", which might be better termed "assumed correct".
That's C for you. If you want something saner, use Rust or Haskell or Python or even Java or Go or.. almost any other language that's not C or C++.
These days the whole point of C is this Faustian pact with the devil of speed for sanity.
Writing Rust and Haskell for sanity is not something I would agree with. Maybe for language characteristics but reading those make me jump out of the window.
I have the same issue with Java and Go. (Which I brought up as well.)
Yet, they still compare favourably with C in this regard. Almost anything does.
I think Rust doesn't allow integer overflow either, unless you specifically use the wrapping_* operations. Probably the same kind of thing will also happen to Rust.
Rust either allows or traps on integer overflow depending on the mode. Either way it’s defined behaviour.
1 reply →
And yet C is still the dominant language. Undefined behavior is actually the reason why: any defined behavior is expensive to implement in the compiler and possibly incurs a cost at runtime. The language design intentionally trades programmer’s sanity for ease of implementation.
Not sure that's actually the reason in practice?
For a real prominent counterexample: the Linux kernel is intentionally programmed in a C dialect (defined by a myriad of GCC compiler flags) that removes a lot of UB.
If they craved the Faustian bargain of UB for speed, they could immediately move in that direction by dropping some GCC options.
Legacy reasons, most embedded devs and UNIX clones won't use anything else.
In many other domains, other languages have taken their place, and this will keep on going, even if it takes a couple of generations, or goverment cybersecurity mandatates to make it happen.
> The language design intentionally trades programmer’s sanity for ease of implementation.
There's nothing "easy" at all about UB-exploiting performance optimizations in modern C compilers, and "ease of implementation" is absolutely not why those optimization passes have been included. In fact, the easiest thing for the compiler to do, when it sees an int * int operation, is to emit an IMUL assembly instruction (or the equivalent for your CPU architecture) and not worry about deleting overflow checking code. Which is what C compilers did before the extent of UB exploitation became excessive.
I agree on the top compilers but there are dozens if not hundreds architectures with their own proprietary C compilers maintained by a dinosaur and a couple intern dino chicks if they’re lucky. I postulate any other language wouldn’t be implemented or would be defanged to C-level of (non)safety anyway in a way similar to mrustc.