Comment by simias
9 years ago
Optimizing for size is easier because you only have exactly one metric to consider: how many bytes your instructions take.
When optimizing for speed you have to consider many factors like the relative speed of each instruction, cache behavior (including size of the cachelines, associativity, number of layers, relative speed of the layers...), pipelining, branch prediction, prefetching, whether moving your data to SIMD registers could be worth it, what to inline and what not to inline, what to unroll and what not to unroll, constraint solving to optimize things that can be computed or asserted statically etc...
And that's for a single processor! There's myriad CPUs the end user could be using.
x86 and x64 are the only two that someone could be using on desktop, right?
Timing rules can be very different even between different models of the same processor, let alone between different ranges (i3 vs i7) or generations (Skylake etc). An example: https://gmplib.org/~tege/x86-timing.pdf
2 replies →
Different microarchitectures can have big differences in performance between different instructions.
Instruction set makes much less of a difference than the actual microarchitecture. For an extreme example, see Pentium 4 vs Core. Something that runs fast on one could be dramatically different on the other.
The only time ISA really influences optimization is for unique ones like IA-64/Itanium. Otherwise, optimizing for e.g. a modern Xeon vs a POWER8 is not terribly different.
2 replies →
How many cores? How big is each cache?