Comment by kjksf
10 years ago
You seem to be talking past DJB.
Libjpeg-turbo is 2x-4x faster than libjpeg because it's written in assembly.
Skia, ffmpeg and many other libraries that deal with graphics have hand-written assembly that provide 2x-4x speedup over what a C compiler can do (for the specific routines).
DJB has lots of experience using assembly to speed up numeric-heavy (crypto) code.
Mike Pall is actually talking real numbers when he compares his assembly lua interpreter vs. C-based interpreter.
There are plenty other examples to support DJB thesis. No amount of protesting and trying to knock his arguments on the edges will change that.
You even agree with him: "But that is usually a programming language limitation, and not a "compilers don't do this" problem.".
From practical point of view this is academic distinction whether it's the language or its compiler to blame.
After protesting so much you essentially confirm his main point: on some code a C compiler can't beat assembly written by an expert and, at least according to DJB, things are getting worse not better i.e. C compiler are getting comparatively worse at optimizing for current hardware than an expert in assembly.
"There are plenty other examples to support DJB thesis. No amount of protesting and trying to knock his arguments on the edges will change that."
I didn't knock it around the edges, i said it's flat out wrong. The fact that some code, and there are plenty of examples, has hot regions does not mean that overall, performance critical code for most people does.
As I said, we have thousands of performance critical apps, and the profilers keep getting flatter anyway. ffmpeg has not climbed the list over time, even if you disable the assembly versions, etc.
"You even agree with him: "But that is usually a programming language limitation, and not a "compilers don't do this" problem.". From practical point of view this is academic distinction whether it's the language or its compiler to blame. " It's not academic when one solution proposed is hand-coding performance critical parts that change the semantics of the program.
If you tell the compiler it can violate the semantics of the programming language, it will happily do the same thing.
ffmpeg only writes inner loops in assembly, and doesn't suffer too much. There would be speed gain in a few situations, but it's just not worth it when it would be so much harder to change the code.
But inside inner loops, you don't need to change code, and assembly can be easier to read and write than C. Have you seen Intel's C SIMD intrinsics? They're so ugly!
> C compiler are getting comparatively worse at optimizing for current hardware than an expert in assembly.
It seems to be Intel's opinion that optimizing compilers for specific desktop CPUs is not worth it. GCC and LLVM have machine-specific models for Atom CPUs, but almost nothing for desktops, even though Intel engineers work on those compilers.
They're probably right, since most people are running generic binaries.