← Back to context

Comment by haberman

10 years ago

> If his statement about what he thinks the advantages are is accurate , and what we do, i'm 100% positive that GCC + FDO either already does, or could be tuned without a ton of work, to beat Mike Pall for this kind of loop.

I love the gauntlet-throwing fire in this statement, and I only wish that it was practical to arrange for a showdown on this to actually happen. Especially since it would do nothing but improve the state of the art on both sides, and be a great case study in the capabilities and limitations of compilers.

Unfortunately I think it is impractical to actually have an apples-to-apples showdown because of the fact that the two interpreters use different byte-code. Writing a pure-C interpreter for LuaJIT would be non-trivial.

> I love the gauntlet-throwing fire in this statement, and I only wish that it was practical to arrange for a showdown on this to actually happen.

Yes, but it's already available in a simpler case, as eluded to above and in discussion on the preview of this talk. There wasn't an answer to the question about which compiler is competitive with OpenBLAS on the linpack benchmark in Fortran (dominated by the triply-nested matrix multiplication loop).

A typical computational chemistry program of the type that consumes many cycles on HPC systems might use three types of kernel in assembler for good reason, e.g. OpenBLAS, FFTW, and ELPA (scalable eigenvalues). This is unfortunate, but at least a few people doing the work can have a large benefit; many thanks to them.