GCC 15.1

1 day ago (gcc.gnu.org)

> {0} initializer in C or C++ for unions no longer guarantees clearing of the whole union (except for static storage duration initialization), it just initializes the first union member to zero. If initialization of the whole union including padding bits is desirable, use {} (valid in C23 or C++) or use -fzero-init-padding-bits=unions option to restore old GCC behavior.

This is going to silently break so much existing code, especially union based type punning in C code. {0} used to guarantee full zeroing and {} did not, and step by step we've flipped the situation to the reverse. The only sensible thing, in terms of not breaking old code, would be to have both {0} and {} zero initialize the whole union.

I'm sure this change was discussed in depth on the mailing list, but it's absolutely mind boggling to me

  • Fun fact: GCC decided to adopt Clang's (old) behavior at the same time Clang decided to adopt GCC's (old) behavior.

    So now you have this matrix of behaviors: * Old GCC: Initializes whole union. * New GCC: Initializes first member only. * Old Clang: Initializes first member only. * New Clang: Initializes whole union.

    • That's funny and sad at the same time.

      And it shows a deeper problem, even though they are willing to align behavior between each other, they failed to communicate and discuss what would be the best approach. That's a bit tragic, IMO

      2 replies →

    • Since having multiple compilers is often touted as an advantage, how often do situations like what you're describing happen compared to the opposite — when a second compiler surfaces bugs in one's application or the other compiler?

  • This was my instinct too, until I got this little tickle in the back of my head that maybe I remembered that Clang was already acting like this, so maybe it won't be so bad. Notice 32-bit wzr vs 64-bit xzr:

        $ cat union.c && clang -O1 -c union.c -o union.o && objdump -d union.o
        union foo {
            float  f;
            double d;
        };
    
        void create_f(union foo *u) {
            *u = (union foo){0};
        }
    
        void create_d(union foo *u) {
            *u = (union foo){.d=0};
        }
    
        union.o: file format mach-o arm64
    
        Disassembly of section __TEXT,__text:
    
        0000000000000000 <ltmp0>:
               0: b900001f      str wzr, [x0]
               4: d65f03c0      ret
    
        0000000000000008 <_create_d>:
               8: f900001f      str xzr, [x0]
               c: d65f03c0      ret

    • Ah, I can confirm what I see elsewhere in the thread, this is no longer true in Clang. That first clang was Apple Clang 17---who knows what version that actually is---and here is Clang 20:

          $ /opt/homebrew/opt/llvm/bin/clang-20 -O1 -c union.c -o union.o && objdump -d union.o
      
          union.o: file format mach-o arm64
      
          Disassembly of section __TEXT,__text:
      
          0000000000000000 <ltmp0>:
                 0: f900001f      str xzr, [x0]
                 4: d65f03c0      ret
      
          0000000000000008 <_create_d>:
                 8: f900001f      str xzr, [x0]
                 c: d65f03c0      ret

      1 reply →

  • > This is going to silently break so much existing code

    The code was already broken. It was an undefined behavior.

    That's a problem with C and it's undefined behavior minefields.

    • GCC has long been known to define undefined behavior in C unions. In particular, type punning in unions is undefined behavior under the C and C++ standards, but GCC (and Clang) define it.

      40 replies →

    • When you have a big system many people rely on you generally try to look for ways to keep their code working - not look for the changes you’re contractually allowed to make.

      GCC probably has a better justification than “we are allowed to”.

      1 reply →

    • Undefined in the standard doesn't mean undefined in GCC. Type-punning through unions has always been a special case that GCC has taken care with beyond the standard.

  • I thought that {} should always initialize everything regardless of whether or not there is anything in between the braces, and that {0} should only be valid if the first member is a numeric or pointer type (but otherwise has the same effect as {} with nothing in between). I thought that would make more sense, isn't it?

    (If you write {} with multiple values when initializing a union, then it should be an error unless all of the values are the same and all of the corresponding members (the first few if you do not explicitly specify which ones) are of the same type as each other.)

    • C never had {} until C23. In C {0} was the only way to explicitly zero-initialize a structure in a generic manner. It works because in C initializer lists are applied to members as-if nested structures are flattened out lexically.

      However, a long time ago C++ went in a completely different direction with initializer lists, and gcc and clang started emitting warnings (in C mode) about otherwise perfectly valid C code, thus the adoption of C++'s {} for C23. {0} is still technically valid C23, though, as well as valid C89, C90, C99, and C11. In fact, reading both C23 and C89 I'm struck by how little the language has changed:

      C89 3.5.7p16:

      > If the aggregate contains members that are aggregates or unions, or if the first member of a union is an aggregate or union, the rules apply recursively to the subaggregates or contained unions. If the initializer of a subaggregate or contained union begins with a left brace, the initializers enclosed by that brace and its matching right brace initialize the members of the subaggregate or the first member of the contained union. Otherwise, only enough initializers from the list are taken to account for the members of the first subaggregate or the first member of the contained union; any remaining initializers are left to initialize the next member of the aggregate of which the current subaggregate or contained union is a part.

      C23 6.7.10p21:

      > If the aggregate or union contains elements or members that are aggregates or unions, these rules apply recursively to the subaggregates or contained unions. If the initializer of a subaggregate or contained union begins with a left brace, the initializers enclosed by that brace and its matching right brace initialize the elements or members of the subaggregate or the contained union. Otherwise, only enough initializers from the list are taken to account for the elements or members of the subaggregate or the first member of the contained union; any remaining initializers are left to initialize the next element or member of the aggregate of which the current subaggregate or contained union is a part.

  • I honestly feel that "uninitialized by default" is strictly a mistake, a relic from the days when C was basically cross-platform assembly language.

    Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.

    Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.

    This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).

    • There are many low-level devices where initialization is very expensive. It may mean that you need two passes through memory instead of one, making whatever code you are running twice as slow.

      8 replies →

    • C++26 has everything initialiied by default. The value is not specified though. Implementations are encourage to use something weird to detect using before explict initialization.

    • Zero initializing often hides real and serious bugs, however. Say you have a function with an internal variable LEN that ought to get set to some dynamic length that internal operations will run over. Changes to the code introduce a path which skips the setting of LEN. Current compilers will (very likely) warn you about the potentially uninitialized use, valgrind will warn you (assuming the case gets triggered), and failing all that the program will potentially crash when some large value ends up in LEN-- alerting you to the issue.

      Compare with default zero init: The compiler won't warn you, valgrind won't warn you, and the program won't crash. It will just be silently wrong in many cases (particularly for length/count variables).

      Generally the attention to exploit safety can sometimes push us in directions that are bad for program correctness. There are many places where exploit safety is important, but also many cases where its irrelevant. For security it's generally 'safe' is a program erroneously shuts down or does less than it should but that is far from true for software generally.

      I prefer this behavior: Use of an uninitialized variable is an error which the compiler will warn about, however, in code where the compiler cannot prove that it is not used the compiler's behavior is implementation defined and can include trapping on use, initializing to zero, or initializing to ~0 (the complement of zero) or other likely to crash pattern. The developer may annotate with _noinit which makes any use UB and avoids the cost of inserting a trap or ~0 initialization. ~0 init will usually fail but seldom in a silent way, so hopefully at least any user reports will be reproducible.

      Similar to RESTRICT _noinit is a potential footgun, but its usage would presumably be quite rare and only in carefully maintained performance critical code. Code using _noinit like RESTRICT is at least still more maintainable than assembly.

      This approach preserves the compiler's ability to detect programmer error, and lets the implementation pick the preferred way to handle the remaining error. In some contexts it's preferable to trap cleanly or crash reliably (init to ~0 or explicit trap), in others its better to be silently wrong (init 0).

      Since C99 lets you declare variables wherever so it is often easy to just declare a variable where it is first set and that's probably best, of course. .. when you can.

  • Do distros have tooling to deal with this type of change?

    I imagine it would be very useful to be able to search through all the C/C++ source files for all the packages in the distro in a semantic manner, so that it understands typedefs and preprocessor macros etc. The search query for this change would be something like "find all union types whose first member is not its largest member, then find all lines of code where that type is initialized with `{0}`".

    • As a retired Gentoo developer, I can say not really as far as I know. There could be static analysis tools that can find this, but I am not aware of anyone who runs them on the entire distribution.

      2 replies →

  • > This is going to silently break so much existing code

    How much code actually uses unions this way?

    > especially union based type punning in C code

    I have never done type punning via the GNU C compiler extension in a way that would break because of this. I always assign a value to it and then get out the value from a new type. Do you know of any code that does things differently to be affected by this?

    • > How much code actually uses unions this way?

      I see this change caused Mbed-TLS to start failing its test suite when compiled with GCC 15: https://github.com/Mbed-TLS/mbedtls/issues/9814 (kinda scary since it's a security library). Hopefully other projects with less rigorous test suites aren't using {0} in that way. The Github issue mentions that Clang tried a similar optimization a while ago and backed it out after user complaints, so maybe the same thing will happen with GCC.

      1 reply →

    • I would guess a lot. People aren't intimately familiar with the standard, and people are lazy when it comes to writing boilerplate like initialization code. And up until now, it just worked, so even a good test suite wouldn't catch it.

      EDIT: I initially mentioned type punning for arithmetic, but this compiler change wouldn't affect that

      3 replies →

  • I'm skeptical of the claim that this change will "silently break so much existing code". For it to change the behavior of code, the first member would have to be smaller than other members, someone would have to use this construct to initialize union objects, and it would have to affect the behavior. In any case, it's standard for the Fedora, Ubuntu, and Debian developers to go through all the packages and test with new GCC versions before they come out, so that issues are fixed before the new compiler is released.

  • lol this is exactly the kind of stuff I expects from C or C++ haha its kinda insane people just decide to do this amidst all the talk about correctness/safety.

  • There is no reason to use a union unless you're doing some C stuff; in which case just use C.

  • I have to say, I've read the discussion this generated and it's a bit scary how no one seems to know whether type punning through unions is undefined or not in C, or rather, my conclusion reading it all is more so that many people are wrong and that is defined behavior, but some of the people who are wrong about it are actual GCC compiler developers so it can't be too easy to be right.

  • using UNION was always considered sketchy IMHO. This is trivia for security exploiters?

    • No. This is how sum types are implemented.

      And from a runtime perspective it’s going to be a struct with perhaps more padding. You’ll need more details about your specific threat model to explain why that’s bad.

      10 replies →

  • I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.

    I can deal with the footguns if they aren't cheekily mutating over the years. I feel like in C++ especially we barely have the time to come to terms with the unintended consequences of the previous language revision before the next one drops a whole new load of them on us.

    • > If the size of the new type is larger than the size of the last-written type, the contents of the excess bytes are unspecified (and may be a trap representation). Before C99 TC3 (DR 283) this behavior was undefined, but commonly implemented this way.

      https://en.cppreference.com/w/c/language/union

      > When initializing a union, the initializer list must have only one member, which initializes the first member of the union unless a designated initializer is used(since C99).

      https://en.cppreference.com/w/c/language/struct_initializati...

      → = {0} initializes the first union variant, and bytes outside of that first variant are unspecified. Seems like GCC 15.1 follows the 26 year old standard correctly. (not sure how much has changed from C89 here)

    • Programming languages are products, that is like saying you want to keep using vi 1.0.

      Maybe C should have stop at K&R C from UNIX V6, at least that would have spared the world in having it being adopted outside UNIX.

      5 replies →

    • > I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.

      Thank goodness this is not how the software world works overall. I'm not sure you understand the implications of what you ask for.

      > if they aren't cheekily mutating over the years

      You're complaining about languages mutating, then mention C++ which has added stuff but maintained backwards compatibility over the course of many standards (aside from a few hiccups like auto_ptr, which was also short lived), with a high aversion to modifying existing stuff.

    • It's careless development. Why think something in advance when you can fix it later. It works so well for Microsoft, Google and lately Apple. /s

      The release cycle of a software speaks a lot about its quality. Move fast, break things has become the new development process.

      1 reply →

Really excited about #embed support:

> C: #embed preprocessing directive support.

> C++: P1967R14, #embed (PR119065)

See also:

https://news.ycombinator.com/item?id=32201951 - Embed is in C23 (2022-07-23)

  • I'd really wish for an `std::embed<...>` that would be a consteval function (IIRC there is a proposal for this, but I don't know its status). The less pre-processor stuff going on the less there is to worry about, the syntax would end up much cleaner and you can create your own wrapper functions.

"C++ Modules have been greatly improved."

It would be nice to know what these great improvements actually are.

  • Later in the article, it mentions:

        Improved experimental support for C++23, including:
    
            std and std.compat modules (also supported for C++20).
    

    From https://developers.redhat.com/articles/2025/04/24/new-c-feat...:

        The next major version of the GNU Compiler Collection (GCC), 15.1, is expected to be released in April or May 2025.
    
        GCC 15 greatly improved the modules code. For instance, module std is now supported (even in C++20 mode).

  • In GCC 14, C++ modules were unusable (incomplete, full of bugs, no std modules, etc). I haven't tried 15 yet but if that changed, then it definitely qualifies for a "great improvement".

    • Still no std modules but otherwise likely useable. modules are ready for early adoptors to use and start writing the books on what you should do. (Not how to do it, those books are mostly written though not in print. How hou should as is was imbort std a good idea or shoule containers and algorithms been split - or maybe something I haven't though of)