Timing rules can be very different even between different models of the same processor, let alone between different ranges (i3 vs i7) or generations (Skylake etc). An example: https://gmplib.org/~tege/x86-timing.pdf
Instruction set makes much less of a difference than the actual microarchitecture. For an extreme example, see Pentium 4 vs Core. Something that runs fast on one could be dramatically different on the other.
The only time ISA really influences optimization is for unique ones like IA-64/Itanium. Otherwise, optimizing for e.g. a modern Xeon vs a POWER8 is not terribly different.
I tend to have opinions that run contrary to Y Combinator's business interests. Combined with me being entirely too provocative sometimes, my comments are (understandably) flagkilled by default.
Timing rules can be very different even between different models of the same processor, let alone between different ranges (i3 vs i7) or generations (Skylake etc). An example: https://gmplib.org/~tege/x86-timing.pdf
I don't think there are any differences between an i3 and i7 of the same generation in terms of instruction timings.
True, caches are a whole different story though.
Different microarchitectures can have big differences in performance between different instructions.
Instruction set makes much less of a difference than the actual microarchitecture. For an extreme example, see Pentium 4 vs Core. Something that runs fast on one could be dramatically different on the other.
The only time ISA really influences optimization is for unique ones like IA-64/Itanium. Otherwise, optimizing for e.g. a modern Xeon vs a POWER8 is not terribly different.
Why was this flag killed?
I tend to have opinions that run contrary to Y Combinator's business interests. Combined with me being entirely too provocative sometimes, my comments are (understandably) flagkilled by default.
How many cores? How big is each cache?