← Back to context

Comment by mikemike

10 years ago

Thank you for taking the time to perform these tests!

One thing that people advocating FDO often forget: this is statically tuning the code for a specific use case. Which is not what you want for an interpreter that has many, many code paths and is supposed to run a wide variety of code.

You won't get a 30% FDO speedup in any practical scenario. It does little for most other benchmarks and it'll pessimize quite a few of them, for sure.

Ok, so feed it with a huge mix of benchmarks that simulate typical usage. But then the profile gets flatter and FDO becomes much less effective.

Anyway, my point still stands: a factor of 1.1x - 1.3x is doable. Fine. But we're talking about a 3x speedup for my hand-written machine vs. what the C compiler produces. And that's only a comparatively tiny speedup you get from applying domain-specific knowledge. Just ask the people writing video codecs about their opinion on C vector intrinsics sometime.

I write machine code, so you don't have to. The fact that I have to do it at all is disappointing. Especially from my perspective as a compiler writer.

But DJB is of course right: the key problem is not the compiler. We don't have a source language that's at the right level to express our domain-specific knowledge while leaving the implementation details to the compiler (or the hardware).

And I'd like to add: we probably don't have the CPU architectures that would fit that hypothetical language.

See my ramblings about preserving programmer intent, I made in the past: http://www.freelists.org/post/luajit/Ramblings-on-languages-...

> One thing that people advocating FDO often forget: this is statically tuning the code for a specific use case.

Yes, I meant to mention this but forgot. The numbers I posted are a best-case example for FDO, because the FDO is specific to the one benchmark I'm testing.

> Ok, so feed it with a huge mix of benchmarks that simulate typical usage. But then the profile gets flatter and FDO becomes much less effective.

True, though I think one clear win with FDO is helping the compiler tell fast-paths from slow-paths in each opcode. I would expect this distinction to be relatively universal regardless of workload.

The theory of fast-path/slow-path optimization would say that fast-paths are much more common than slow-paths. Otherwise optimizing fast-paths would be useless because slow-paths would dominate.

The fact that a compiler can't statically tell a fast-path from a slow-path is a big weakness. FDO does seem like it should be able to help mitigate that weakness.