Comment by derriz

1 year ago

Sane defaults should be table stakes for toolchains but C++ has "history".

All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization. This stuff is so brittle, who knows when (with which release of the compiler or linker) a particular combination of optimization flags were actually beneficial? How do you regression test this stuff? So everyone is afraid to touch this stuff.

Other compiled languages have similar issues but none to the extent of C++ that I've experienced.

39 comments

derriz

motorest 1 year ago

> Sane defaults should be table stakes for toolchains but C++ has "history".

Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.

> All significant C++ code-bases and projects I've worked on have had 10s of lines (if not screens) of compiler and linker options - a maintenance nightmare particularly with stuff related to optimization.

No, not really. That is definitely not the norm, at all. I can tell you as a matter of fact that release builds of some production software that's even a household name is built with only a couple of basic custom compiler flags, such as specifying the exact version of the target language.

Moreover, if your project uses a build system such as CMake and your team is able to spend 5 minutes reading an onboarding guide onto modern CMake, you do not even need or care to set compiler flags. You set a few high-level target properties and you never look at it ever again.

rowanG077 1 year ago
> Yes, it has. By "history" you actually mean "production software that is expected to not break just because someone upgrades a compiler". Yes, C++ does have a lot of that.
I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.
Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.
- motorest 1 year ago
  
  > I disagree. Disproportionately in my career random C and C++ code bases failed to build because some new warning was introduced. And this is precisely because compiler options are so bad in that a lot of projects do Wall, Wextra and Werror.
  There is nothing to disagree. It is a statement of fact that there is production software that is not expected to break just because someone breaks a compiler. This is not up for debate. Setting flags like Werror is not even relevant, because that is an explicit choice of development teams and one which is strongly discouraged beyond local builds.
  > Also the way undefined behavior is exploited means that you don't really know of your software that worked fine 10 years ago will actually work fine today, unless you have exhaustive tests.
  No, not really. There are only two scenarios with UB: either you unwittingly used UB and thus you introduced an error, or you purposely used a feature provided by your specific choice of compiler+OS+hardware that leverages UB.
  The latter involves a ton of due diligence and pinning your particular platform, particularly compiler version.
  So either you don't know what you're doing, or you are very well aware and very specific about what you're doing.
  
  4 replies →

nly 1 year ago

I've rarely seen more than a handful of compiler options even on very large codebase

If anything there's tonnes people should be using more of.

The problem with all these hardening options though is they noticeably reduce performance

grandempire 1 year ago

> The problem with all these hardening options though is they noticeably reduce performance
Yep. What I would really like is 2 lists, one for debug/checked mode and one for release.

rollcat 1 year ago

It's because the UB must be continuously exploited by compilers for that extra 1% perf gain.

I've been eyeing Zig recently. It makes a lot of choices straightforward yet explicit, e.g. you choose between four optimisation strategies: debug, safety, size, perf. Individual programs/libraries can have a default or force one (for the whole program or a compilation unit), but it's customary to delegate that choice to the person actually building from source.

Even simpler story with Go. It's been designed by people who favour correctness over performance, and most compiler flags (like -race, -asan, -clobberdead) exist to help debug problems.

I've been observing a lot of people complain about declining software quality; yearly update treadmills delivering unwanted features and creating two bugs for each one fixed. Simplicity and correctness still seem to be a niche thing; I salute everyone who actually cares.

nayuki 1 year ago
> It's because the UB must be continuously exploited by compilers for that extra 1% perf gain.
Your framing of a compiler exploiting UB in programs to gain performance, has an undeserved negative connotation. The fact is, programs are mathematical structures/arguments, and if any single step in the program code or execution is wrong, no matter how small, it can render the whole program invalid. Drawing from math analogies where one wrong step leads to an absurd conclusion:
* https://en.wikipedia.org/wiki/All_horses_are_the_same_color
* https://en.wikipedia.org/wiki/Principle_of_explosion
* https://proofwiki.org/wiki/False_Statement_implies_Every_Sta...
* https://en.wikipedia.org/wiki/Mathematical_fallacy#Division_...
Back to programming, hopefully this example will not be controversial: If a program contains at least one write to an arbitrary address (e.g. `*(char*)0x123 = 0x456;`), the overall behavior will be unpredictable and effectively meaningless. In this case, I would fully agree with a compiler deleting, reordering, and manipulating code as a result of that particular UB.
You could argue that C shouldn't have been designed so that reading out of bounds is UB. Instead, it should read some arbitrary value without crashing or cleanly segfault at that instruction, with absolutely no effects on any surrounding code.
You could argue that C/C++ shouldn't have made it UB to dereference a null pointer for reading, but I fully agree that dereferencing a null pointer for a method call or writing a field must be UB.
Another analogy in programming is, let's forget about UB. Let's say you're writing a hash table in Java (in the normal safe subset without using JNI or Unsafe). If you get even one statement wrong in the data structure implementation, there still might be arbitrarily large consequences like dropping values when you shouldn't, miscounting how many values exist, duplicating values when you shouldn't, having an incorrect state that causes subtle failures far in the future, etc. The consequences are not as severe and pervasive as UB at the language level, but it will still result in corrupt data and/or unpredictable behavior for the user of that library code, which can in turn have arbitrarily large consequences. I guess the only difference compared to C/C++ UB is that for C/C++, there is more "spooky action at a distance", where some piece of UB can have very non-local consequences. But even incorrect code in safe Java can produce large consequences, maybe just not as large on average.
I am not against compilers "exploiting" UB for performance gain. But these are the ways forward that I believe in, for any programming language in general:
* In the language specification, reduce the number of cases/places that are undefined. Not only does it reduce the chances of bad things happening, but it also makes the rules easier to remember for humans, thus making it easier to avoid triggering these cases.
* Adding to that point, favor compile-time errors over run-time UB. For example, reading from an uninitialized local variable is a compile error in Java but UB in C. Rust's whole shtick about lifetimes and borrowing is one huge transformation of run-time problems into compile-time problems.
* Overwhelmingly favor safety by default. For example, array accesses should be bounds-checked using the convenient operator like `array[index]`, whereas the unsafe unchecked version should be something obnoxious and ugly like `unsafe { array.get_unchecked(index) }`. Make the safe way easy and make the unsafe way hard - the exact opposite of C/C++.
* Provide good (and preferably complete) sanitizer tools to check that UB isn't triggered at run time. C/C++ did not have these for the first few decades of their lives, and you were flying blind when triggering UB.
- motorest 1 year ago
  
  > Your framing of a compiler exploiting UB in programs to gain performance, has an undeserved negative connotation. The fact is, programs are mathematical structures/arguments, and if any single step in the program code or execution is wrong, no matter how small, it can render the whole program invalid.
  You're failing to understand the problem domain, and consequently you're oblivious to how UB is actually a solution to problems.
  There are two sides to UB: the one which is associated with erroneous programs, because clueless developers unwittingly do things that the standards explicitly states that lead to unknown and unpredictable behavior, and the one which leads to valid programs, because developers knowingly adopted an implementation that specifies exactly what behavior they should expect from doing things that the standards specify as UB.
  Somehow, those who mindlessly criticize UB only parrot the simplistic take on UB, the "nasal demons" blurb. They don't even stop to think about what is undefined behavior and why would a programming language specification purposely leave specific behavior as undefined instead of unspecified or even implementation-defined. They do not understand what they are discussing and don't invest any moment trying to understand why things are the way they are, and what problems are solved by them. The just parrot cliches.
  
  14 replies →

duped 1 year ago

I mean if you emit compiler commands from any build system they're going to be completely illegible due to the number of -L,-l,-I,-i,-D flags which are mostly generated by things like pkg-config and your build configuration.

There's not many optimization flags that people get fine grained with, the exception being floating point because -ffast-math alone is extremely inadvisable

dapperdrake 1 year ago

It goes even further.
Technically, the compilers can choose to make undefined-behavior implementation-defined-behavior instead. But they don't.
That's kind of also how C++ std::span wound up without overflow checks in practice. And my_arr.at(i) just isn't really being used by anybody.
Seems very user-hostile to me.
dapperdrake 1 year ago
-ffast-math and -Ofast are inadvisable on principle:
Tl;dr: python gevent messes up your x87 float registers (yes.)
https://moyix.blogspot.com/2022/09/someones-been-messing-wit...
- duped 1 year ago
  
  I disagree with "on principle." There are flaws in the design of IEEE 754 and omitting strict adherence for the purposes of performance is fine, if not required for some applications.
  For example, recursive filters (even the humble averaging filter) will suffer untold pain without enabling DAZ/FTZ mode.
  fwiw the linked issue has been remedied in recent compilers and isn't a python problem, it's a gcc problem. Even that said, if your algorithm requires subnormal numbers, for the love of numeric stability, guard your scopes and set the mxcsr register accordingly!
  
  7 replies →
- bobmcnamara 1 year ago
  
  "what kind of math does the compile usually do without this funsafemath flag? Sad dangerous math?"
  
  1 reply →