Comment by bestouff

11 hours ago

The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).

25 comments

bestouff

inkysigma 10 hours ago

One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.

marcosdumay 25 minutes ago

There is that famous example where when you write an infinite loop last thing in your main, a function that you never called runs instead.
account42 8 hours ago
Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.
- Certhas 8 hours ago
  
  Not true, C++ made it so trivial infinite loops are not UB because it turns out they do have legitimate uses.
  https://lists.isocpp.org/std-proposals/2020/05/1322.php
  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p28...
  
  1 reply →
- xigoi 8 hours ago
  
  The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.
  
  3 replies →
- zarzavat 8 hours ago
  
  https://9p.io/sources/plan9/sys/src/libc/9sys/abort.c
  
  1 reply →
1718627440 7 hours ago
That's only true in C++ though, not in C.
- dzaima 7 hours ago
  
  C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)

eru 10 hours ago

Yes, a crash is about the most benign UB: at least it's highly visible.

In worse scenarios, your programme will silently continue with garbage, or format your hard disk or give attackers the key to the kingdom.

1718627440 7 hours ago

Yes, that is a problem, but this is also the most useful feature and reason for UB. People that suggest to just define it or make it unspecified, miss, that the compiler being able to remove whole parts of a program is the point. When I write code, that is UB for certain inputs, it is because I do not intend the program to have any behaviour for these inputs. I do want the compiler to optimize those away or do anything that effects from the behaviour of the other defined cases. It is deeply satisfying to add some conditions triggering log strings and see that they do not occur in the binary, because they can be only reached via UB.

rando1234 9 hours ago

The point in the article that 'It's not about optimisations' really got my attention. I've previously done some work where we wrote an analysis pass under the assumption that it executed last in the transformation pipeline and this was needed for correctness. The assumption was that since no further optimisations happened it was safe. Now I'm not so sure...

account42 8 hours ago

That's a feature, not a problem.

anilakar 10 hours ago

Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).

Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.

flohofwoe 10 hours ago
> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).
The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injecting UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).
- peterfirefly 5 hours ago
  
  > free to eliminate code paths after UB
  before.
- 1718627440 7 hours ago
  
  > The actual issue is that the compiler is free to eliminate code paths after UB
  Not, that the compiler can also emit code paths before UB, as UB is a property of the whole program, not just of a single statement.
amoss 10 hours ago

Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.
gpderetta 8 hours ago
Consider:
enum op_t{ add, mul }; int exec(op_t op, int a, int b) { if(op == add) { return a+b; } if(op == mul) { return a\*b; } } c = exec(add, a,b);
Should be the compiler be prevented from inlining exec and constant-propagating op and removing the mul branch? What about if a and b are constants and the addition itself is optimized away?
4gotunameagain 10 hours ago

This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.