Comment by simias

9 years ago

Optimizing for size is easier because you only have exactly one metric to consider: how many bytes your instructions take.

When optimizing for speed you have to consider many factors like the relative speed of each instruction, cache behavior (including size of the cachelines, associativity, number of layers, relative speed of the layers...), pipelining, branch prediction, prefetching, whether moving your data to SIMD registers could be worth it, what to inline and what not to inline, what to unroll and what not to unroll, constraint solving to optimize things that can be computed or asserted statically etc...

And that's for a single processor! There's myriad CPUs the end user could be using.

  • x86 and x64 are the only two that someone could be using on desktop, right?

    • Instruction set makes much less of a difference than the actual microarchitecture. For an extreme example, see Pentium 4 vs Core. Something that runs fast on one could be dramatically different on the other.

      The only time ISA really influences optimization is for unique ones like IA-64/Itanium. Otherwise, optimizing for e.g. a modern Xeon vs a POWER8 is not terribly different.

      2 replies →