Comment by codexb

5 months ago

That's basically compilers these days. It used to be that you could try and optimize your code, inline things here and there, but these days, you're not going to beat the compiler optimization.

18 comments

codexb

kragen 5 months ago

That is a meme that people repeat a lot, but it turns out to be wrong:

https://web.archive.org/web/20150213004932/http://x264dev.mu...

kragen 5 months ago

I guess I should mention that https://news.ycombinator.com/item?id=9397169 does kind of disagree, saying, "If GCC didn't beat an expert at optimizing interpreter loops, it was because they didn't file a bug and give us code to optimize," but his actual example is the CPython interpreter loop, which is light-years from the kind of hand-optimized assembly interpreter Mike Pall's post is talking about, and moreover it wasn't feeding an interpreter loop to GCC but rather replacing interpretation with run-time compilation. Mostly what he disagrees about is the same thing Regehr disagrees about: whether there's enough code in the category of "not worth hand-optimizing but still runs often enough to matter", not whether you can beat a compiler by hand-optimizing your code. On the contrary, he brings up whole categories of code where compilers can't hope to compete with hand-optimization, such as numerical algorithms where optimization requires sacrificing numerical stability. mpweiher's comment in response discusses other scenarios where compilers can't hope to compete, like systems-level optimization.
It's worth reading the comments by haberman and Mike Pall in the HN thread there where they correct Berlin about LuaJIT, and kjksf also points out a number of widely-used libraries that got 2–4× speedups over optimized C by hand-optimizing the assembly: libjpeg-turbo, Skia, and ffmpeg. It'd be interesting to see if the intervening 10 years have changed the situation, because GCC and LLVM have improved in that time, but I doubt they've improved by even 50%, much less 300%.

ryao 5 months ago

Meanwhile, GCC will happily implement bsearch() without cmov instructions and the result will be slower than a custom implementation on which it emits cmov instructions. I do not believe anyone has filed a bug report specifically about the inefficient bsearch(), but the bug report I filed a few years ago on inefficient code generation for binary search functions is still open, so I see no point in bothering:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001

Binary searches on OpenZFS B-Tree nodes are faster in part because we did not wait for the compiler:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

Eliminating comparator function overhead via inlining is also a part of the improvement, which we would not have had because the OpenZFS code is not built with LTO, so even if the compiler fixes that bug, the patch will still have been useful.

godelski 5 months ago

Definitely not true.

You aren't going to beat the compiler if you have to consider a wide range of inputs and outputs but that isn't a typical setting and you can actually beat them. Even in general settings this can be true because it's still a really hard problem for the compiler to infer things you might know. That's why C++ has all those compiler hints and why people optimize with gcc flags other than -O.

It's often easy to beat Blas in matrix multiplication if you know some conditions on that matrix. Because Blas will check to find the best algo first but might not know (you could call directly of course and there you likely won't win, but you're competing against a person not the compiler).

Never over estimate the compiler. The work the PL people do is unquestionably useful but they'll also be the first to say you can beat it.

You should always do what Knuth suggested (the often misunderstood "premature optimization" quote) and get the profile.

nerdralph 5 months ago

I was disappointed with how bad avr-gcc was at optimizing code. Here's one of several examples I found. https://nerdralph.blogspot.com/2015/03/fastest-avr-software-...

matheusmoreira 5 months ago

These days optimizing compilers are your number one enemy.

They'll "optimize" your code by deleting it. They'll "prove" your null/overflow checks are useless and just delete them. Then they'll "prove" your entire function is useless or undefined and just "optimize" it to a no-op or something. Make enough things undefined and maybe they'll turn the main function into a no-op.

In languages like C, people are well advised to disable some problematic optimizations and explicitly force the compiler to assume some implementation details to make things sane.

ryao 5 months ago
If they prove a NULL check is always false, it means you have dead code.
For example:
if (p == NULL) return; if (p == NULL) doSomething();
It is safe to delete the second one. Even if it is not deleted, it will never be executed.
What is problematic is when they remove something like memset() right before a free operation, when the memset() is needed to sanitize sensitive data like encryption keys. There are ways of forcing compilers to retain the memset(), such as using functions designed not to be optimized out, such as explicit_bzero(). You can see how we took care of this problem in OpenZFS here:
https://github.com/openzfs/zfs/pull/14544
- matheusmoreira 5 months ago
  
  They just think you have lots of dead code because of silly undefined behavior nonsense.
  char *allocate_a_string_please(int n) { if (n + 1 < n) return 0; // overflow return malloc(n + 1); // space for the NUL }
  This code seems okay at first glance, it's a simple integer overflow check that makes sense to anyone who reads it. The addition will overflow when n equals INT_MAX, it's going to wrap around and the function will return NULL. Reasonable.
  Unfortunately, we cannot have nice things because of optimizing compilers and the holy C standard.
  The compiler "knows" that signed integer overflow is undefined. In practice, it just assumes that integer overflow cannot ever happen and uses this "fact" to "optimize" this program. Since signed integers "cannot" overflow, it "proves" that the condition always evaluates to false. This leads it to conclude that both the condition and the consequent are dead code.
  Then it just deletes the safety check and introduces potential security vulnerabilities into the software.
  They had to add literal compiler builtins to let people detect overflow conditions and make the compiler actually generate the code they want it to generate.
  Fighting the compiler's assumptions and axioms gets annoying at some point and people eventually discover the mercy of compiler flags such as -fwrapv and -fno-strict-aliasing. Anyone doing systems programming with strict aliasing enabled is probably doing it wrong. Can't even cast pointers without the compiler screwing things up.
  
  8 replies →
- Dylan16807 5 months ago
  
  If they prove a check is always false, it means you have dead code or you made a mistake.
  It is very very hard to write C without mistakes.
  When not-actually-dead code gets removed, the consequences of many mistakes get orders of magnitudes worse.