Comment by myrmidon

1 day ago

I honestly feel that "uninitialized by default" is strictly a mistake, a relic from the days when C was basically cross-platform assembly language.

Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.

Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.

This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).

25 comments

myrmidon

rwmj 1 day ago

GCC now supports -ftrivial-auto-var-init=[zero|uninitialized|pattern] for stack variables https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...

For malloc, you could use a custom allocator, or replace all the calls with calloc.

myrmidon 1 day ago

Very nice, did not know about this!
The only problem with vendor extensions like this is that you can't really rely on it, so you're still kinda forced to keep all the (redundant) zero intialization; solving it at the language level is much nicer. Maybe with C2030...

bjourne 1 day ago

There are many low-level devices where initialization is very expensive. It may mean that you need two passes through memory instead of one, making whatever code you are running twice as slow.

myrmidon 1 day ago

I would argue that these cases are pretty rare, and you could always get nominal performance with the __noinit hint, but I think this would seldomly even be needed.
If you have instances of zero-initialized structs where you set individual fields after the initialization, all modern compiler will elide the dead stores in the the typical cases already anyway, and data of relevant size that is supposed to stay uninitialized for long is rare and a bit of an anti-pattern in my opinion anyway.
modeless 1 day ago
Ok, those developers can use a compiler flag. We need defaults that work better for the vast majority.
- bjourne 1 day ago
  
  Then why are you using C? :P
  
  2 replies →
nullc 19 hours ago
meh, the compiler can almost always eliminate the spurious default initialization because it can prove that first use is the variable being set by the real initialization. The only time the redundant initialization will be emitted by an optimizing compiler is when it can't prove its redundant.
I think the better reason to not default initialize as a part of the language syntax is that it hides bugs.
If the developers intent is that the correct initial state is 0 they should just explicitly initialize to zero. If they haven't, then they must intend that the correct initial state is the dynamic one in their code and the compiler silently slipping in a 0 in cases the programmer overlooked is a missed opportunity to detect a bug due to the programmer under-specifying the program.
- RustyRussell 18 hours ago
  
  In recent years I've come to rely on this non-initialization idiom. Both because as code paths change the compiler can warn for simple cases, and because running tests under Valgrind catches it.
- bluecalm 13 hours ago
  
  It only works for simple variables where initialisation to 0 is counter productive because you lose a useful compiler warning (about using initialised variable).
  The main case is about arrays. Here it's often impossible to prove some part of it is used before initialisation. There is no warning. It becomes a tradeoff: potentially costly initialisation (arrays can be very big) or potentially using random values other than 0.
  
  1 reply →

bluGill 1 day ago

C++26 has everything initialiied by default. The value is not specified though. Implementations are encourage to use something weird to detect using before explict initialization.

elromulous 1 day ago

Devil's advocate: this would be unacceptable for os kernels and super performance critical code (e.g. hft).

TuxSH 1 day ago

> this would be unacceptable for os kernels
Depends on the boundary. I can give a non-Linux, microkernel example (but that was/is shipped on dozens of millions of devices):
- prior to 11.0, Nintendo 3DS kernel SVC (syscall) implementations did not clear output parameters, leading to extremely trivial leaks. Unprivileged processes could retrieve kernel-mode stack addresses easily and making exploit code much easier to write, example here: https://github.com/TuxSH/universal-otherapp/blob/master/sour...
- Nintendo started clearing all temporary registers on the Switch kernel at some point (iirc x0-x7 and some more); on the 3DS they never did that, and you can leak kernel object addresses quite easily (iirc by reading r2), this made an entire class of use-after-free and arbwrite bugs easier to exploit (call SvcCreateSemaphore 3 times, get sema kernel object address, use one of the now-patched exploit that can cause a double-decref on the KSemaphore, call SvcWaitSynchronization, profit)
more generally:
- unclearead padding in structures + copy to user = infoleak
so one at least ought to be careful where crossing privilege boundaries
myrmidon 1 day ago

No, just throw the __noinit attribute at every place where its needed.
You probably would not even need it in a lot of instances because the compiler would elide lots of dead stores (zeroing) even without hinting.
sidkshatriya 1 day ago
Would you rather have a HFT trade go correctly and a few nanoseconds slower or a few nanoseconds faster but with some edge case bugs related to variable initialisation ?
You might claim that that you can have both but bugs are more inevitable in the uninitialised by default scenario. I doubt that variable initialisation is the thing that would slow down HFT. I would posit is it things like network latency that would dominate.
- hermitdev 1 day ago
  
  > Would you rather have a HFT trade go correctly and a few nanoseconds slower or a few nanoseconds faster but with some edge case bugs related to variable initialisation ?
  As someone who works in the HFT space: it depends. How frequently and how bad are the bad-trade cases? Some slop happens. We make trade decisions with hardware _without even seeing an entire packet coming in on the network_. Mistakes/bad trades happen. Sometimes it results in trades that don't go our way or missed opportunities.
  Just as important as "can we do better?" is "should we do better?". Queue priority at the exchange matters. Shaving nanoseconds is how you get a competitive edge.
  > I would posit is it things like network latency that would dominate.
  Everything matters. Everything is measured.
  edit to add: I'm not saying we write software that either has or relies upon unitialized values. I'm just saying in such a hypothetical, it's not a cut and dry "do the right thing (correct according to the language spec)" decision.
  
  2 replies →
pjmlp 1 day ago

It is acceptable enough for Windows, Android and macOS, that have been doing for at least the last five years.
That is the usual fearmongering when security improvements are done to C and C++.
saagarjha 14 hours ago
The same OS kernel that zeros out pages before handing them back to me?
- frontfor 14 hours ago
  
  This is arguing in bad faith. Just because the kernel does that doesn’t mean it does that in everywhere else.

nullc 20 hours ago

Zero initializing often hides real and serious bugs, however. Say you have a function with an internal variable LEN that ought to get set to some dynamic length that internal operations will run over. Changes to the code introduce a path which skips the setting of LEN. Current compilers will (very likely) warn you about the potentially uninitialized use, valgrind will warn you (assuming the case gets triggered), and failing all that the program will potentially crash when some large value ends up in LEN-- alerting you to the issue.

Compare with default zero init: The compiler won't warn you, valgrind won't warn you, and the program won't crash. It will just be silently wrong in many cases (particularly for length/count variables).

Generally the attention to exploit safety can sometimes push us in directions that are bad for program correctness. There are many places where exploit safety is important, but also many cases where its irrelevant. For security it's generally 'safe' is a program erroneously shuts down or does less than it should but that is far from true for software generally.

I prefer this behavior: Use of an uninitialized variable is an error which the compiler will warn about, however, in code where the compiler cannot prove that it is not used the compiler's behavior is implementation defined and can include trapping on use, initializing to zero, or initializing to ~0 (the complement of zero) or other likely to crash pattern. The developer may annotate with _noinit which makes any use UB and avoids the cost of inserting a trap or ~0 initialization. ~0 init will usually fail but seldom in a silent way, so hopefully at least any user reports will be reproducible.

Similar to RESTRICT _noinit is a potential footgun, but its usage would presumably be quite rare and only in carefully maintained performance critical code. Code using _noinit like RESTRICT is at least still more maintainable than assembly.

This approach preserves the compiler's ability to detect programmer error, and lets the implementation pick the preferred way to handle the remaining error. In some contexts it's preferable to trap cleanly or crash reliably (init to ~0 or explicit trap), in others its better to be silently wrong (init 0).

Since C99 lets you declare variables wherever so it is often easy to just declare a variable where it is first set and that's probably best, of course. .. when you can.