← Back to context

Comment by epolanski

1 year ago

I remember a series of lectures from an Intel engineer that went into how difficult it was writing assembly code for x86. He basically stated that the number of cases you can really write code that is faster than what a compiler would do is close to none.

Essentially people think they are writing low level code, in reality that's not how CPUs interpret that code, so he explained how writing manual assembly kills performance pretty much always (at least on modern x86).

That's for random "I know asm so it must be faster".

If you know it really well, have already optimized everything on an algorithmic level and have code that can benefit from simd, 10x is real.

You have to consider that modern CPUs don't execute code in-order, but speculatively, in multiple instruction pipelines.

I've used Intel's icc compiler and profiler tools in an iterative fashion. A compiler like Intel's might be made to profile cache misses, pipeline utilization, branches, stalls, and supposedly improve in the next compilation.

The assembly programmer has to consider those factors. Sure would be nice to have a computer check those things!

In the old days, we only worried about cycle counts, wait states, and number of instructions.

That's assembly by people who learned it in 1990. Intel very much does want you writing assembly for their processors and in many ways the only way to push them hard is by doing so.