← Back to context

Comment by immibis

4 days ago

Indeed, scheduling instructions into parallel-compatible aligned blocks is menial work that's usually best done by a machine; each CPU has different preferences, so it only works well if the machine knows which kind of CPU the code will actually run on.

Eigen certainly uses a bunch of optimizations, including SIMD, but also things like FFTs and matrix decompositions.