Comment by simias

9 years ago

Optimizing for size is easier because you only have exactly one metric to consider: how many bytes your instructions take.

When optimizing for speed you have to consider many factors like the relative speed of each instruction, cache behavior (including size of the cachelines, associativity, number of layers, relative speed of the layers...), pipelining, branch prediction, prefetching, whether moving your data to SIMD registers could be worth it, what to inline and what not to inline, what to unroll and what not to unroll, constraint solving to optimize things that can be computed or asserted statically etc...

10 comments

simias

TazeTSchnitzel 9 years ago

And that's for a single processor! There's myriad CPUs the end user could be using.

ReverseCold 9 years ago
x86 and x64 are the only two that someone could be using on desktop, right?
- pjc50 9 years ago
  
  Timing rules can be very different even between different models of the same processor, let alone between different ranges (i3 vs i7) or generations (Skylake etc). An example: https://gmplib.org/~tege/x86-timing.pdf
  
  2 replies →
- Retr0spectrum 9 years ago
  
  Different microarchitectures can have big differences in performance between different instructions.
- PeCaN 9 years ago
  
  Instruction set makes much less of a difference than the actual microarchitecture. For an extreme example, see Pentium 4 vs Core. Something that runs fast on one could be dramatically different on the other.
  The only time ISA really influences optimization is for unique ones like IA-64/Itanium. Otherwise, optimizing for e.g. a modern Xeon vs a POWER8 is not terribly different.
  
  2 replies →
- gjjrfcbugxbhf 9 years ago
  
  How many cores? How big is each cache?