Comment by babypuncher

2 days ago

My problem with this take is that it takes ARM > x86 as some kind of given, like there is an inherent flaw with the x6-64 ISA that means a chip that provides it can never be competitive with ARM on power consumption.

We've already seen Intel and AMD narrow the gap considerably, in part by adopting designs pioneered by ARM manufacturers like hybrid big-little cores.

Another aspect that I think gets forgotton in the Steam Deck conversation is the fact that AMD graphics performance is well ahead of Qualcomm, and that is extremely important for a gaming device. I'm willing to bet that the next Steam Deck goes with another custom AMD chip, but the generation after that is more of a question mark.

RISC-V is another wildcard that could end up threatening ARM's path to total dominance.

> My problem with this take is that it takes ARM > x86 as some kind of given, like there is an inherent flaw with the x6-64 ISA that means a chip that provides it can never be competitive with ARM on power consumption.

It's a distinction without a difference. x86 is not currently competitive in anything smaller than a laptop. Even in a laptop, the only reason it hasn't eaten the market is Microsoft is uninterested and Apple doesn't tell the Joker where it gets its wonderful toys.

Market forces are at play here, exactly like they were in the 90s with Intel's massive gains. ARM is making money hand over fist while x86 is getting squeezed. There will come a time where it won't make economic sense to invest in x86, technical merits be damned.

> like there is an inherent flaw with the x6-64 ISA that means a chip that provides it can never be competitive with ARM on power consumption.

This is only one of many factors, but I know that high performance instruction decoding doesn't scale nearly as well on x86-64 due to the variable width instructions as it does on ARM. Any reasonable performance OoO core needs to read multilpe instructions ahead in order for the other OoO tricks to work. x86-64 is typically limited to about 5 instructions, and the complexity and power required to do that does not scale linearly since x86-64 instructions can be anywhere from 1 byte to 15 bytes making it very hard to guess where to start reading the second instruction before the first has been decoded. Arm cores have at most 2 widths to deal with and with ARV v8 I think there is only one leading to cores like M1 firestorm that can read 8 instructions ahead in a single cycle. Intel's E cores are able to read 3 instructions at two different addresses (6 total, just not sequential) that can help the core look at predicted branches but doesn't help as much in fast optimized code with fewer branches.

so at the low end of performance where mobile gaming sits you really need an OoO core in order to be able to keep up, but ARM really has a big leg up for that use-case because of the instruction encoding.

  • > x86-64 is typically limited to about 5 instructions

    Intel Lion-cove decodes 8 instructions per cycle and can retire 12. Intel Skymont's triple decoder can even do 9 instructions per cycle and that's without a cache.

    AMD's Zen 5 on the other hand has a 6K cache for instruction decoding allowing for 8 instructions per cycle, but still only a 4-wide decoder for each hyper-thread.

    And yet AMD is still ahead of intel in both performance and performance-per-watt. So maybe this whole instruction decode thing is not as important as people are saying.

> like there is an inherent flaw with the x6-64 ISA that means a chip that provides it can never be competitive with ARM on power consumption.

It doesn't matter if there's an inherent, fundamental flaw in the ISA, if Intel can't, for whatever reason(s), develop an x86 chip that actually beats ARM on performance per watt in a broadly-applicable way.