Comment by MobiusHorizons
2 days ago
> like there is an inherent flaw with the x6-64 ISA that means a chip that provides it can never be competitive with ARM on power consumption.
This is only one of many factors, but I know that high performance instruction decoding doesn't scale nearly as well on x86-64 due to the variable width instructions as it does on ARM. Any reasonable performance OoO core needs to read multilpe instructions ahead in order for the other OoO tricks to work. x86-64 is typically limited to about 5 instructions, and the complexity and power required to do that does not scale linearly since x86-64 instructions can be anywhere from 1 byte to 15 bytes making it very hard to guess where to start reading the second instruction before the first has been decoded. Arm cores have at most 2 widths to deal with and with ARV v8 I think there is only one leading to cores like M1 firestorm that can read 8 instructions ahead in a single cycle. Intel's E cores are able to read 3 instructions at two different addresses (6 total, just not sequential) that can help the core look at predicted branches but doesn't help as much in fast optimized code with fewer branches.
so at the low end of performance where mobile gaming sits you really need an OoO core in order to be able to keep up, but ARM really has a big leg up for that use-case because of the instruction encoding.
> x86-64 is typically limited to about 5 instructions
Intel Lion-cove decodes 8 instructions per cycle and can retire 12. Intel Skymont's triple decoder can even do 9 instructions per cycle and that's without a cache.
AMD's Zen 5 on the other hand has a 6K cache for instruction decoding allowing for 8 instructions per cycle, but still only a 4-wide decoder for each hyper-thread.
And yet AMD is still ahead of intel in both performance and performance-per-watt. So maybe this whole instruction decode thing is not as important as people are saying.