← Back to context

Comment by mort96

4 days ago

I don't understand what the difference is between "an ARM chip with native x86 translation" and a dual-ISA x86 and ARM chip.

And I don't understand why you'd want a dual-ISA x86 and ARM rather than just an x86 chip. You wouldn't get whatever CPU front-end simplicity advantages there are from ARM, since your front-end would get significantly more complex and consume significantly more transistors than with a normal x86 chip. And I don't think there's a market of people who want ARM for compatibility reason; any Windows software which supports ARM also supports x86.

What they could do is to release an ARM chip with a slightly extended ISA to add the select features which are difficult to emulate in software, such as loads and stores with the memory ordering guarantees x86 provides but ARM doesn't. Apple does this AFAIK, and it's one part of why Rosetta 2 is so good. But any ARM CPU maker could do this.

I think the core question is whether hardware-accelerated translation could be meaningfully faster than software like Rosetta 2/Prism while avoiding the full dual-ISA complexity you're describing. Rather than literally implementing both instruction sets, it might be more like an ARM chip with specialized translation units and the extended ISA features you mentioned (memory ordering, etc.).

Intel's unique position with x86 IP could make this feasible where others can't, but whether the engineering effort is worth it for what might be a short-term market advantage is debatable.

Fujitsu and Nvidia also implement (at least) TSO.

https://threedots.ovh/blog/2021/02/cpus-with-sequential-cons...

  • Denver does it because it was supposed to be an x86 CPU, but they couldn't get an agreement with Intel for patent licensing, so they pivoted into being the first available aarch64 CPU since decode was happening entirely in software.

    • Well, it has a simple hardware decoder for what would normally be the first stage of the jit.

I wonder if ARM instructions could be translated to Intel’s uOps. Then everything except that translation could be shared. And, since programs consist entirely of one type of instruction for the most part, we could imagine that the chip should be able to stick to just doing one type of translation for the duration of a program run, rather than having to figure it out for each instruction.

I’m not saying I want this, but it might be surprisingly not totally impractical.

A chunk of what you'd want (x86 alu flag generation) seems to be an extension that is incompatible with most of the arm architectural licenses which don't allow for custom extensions to user visible space. Apple is special here for reasons that probably aren't replicable.

> I don't understand what the difference is between "an ARM chip with native x86 translation" and a dual-ISA x86 and ARM chip.

Look at Apple's Rosetta 2 for an example. M-series Apple Silicon has special undocumented modes that mirror x86 architectural quirks that don't usually exist in ARM, in order to support AOT-translated machine code. The chip doesn't support x86 instructions, but it has the amenities to support x86 code. That could be what "native x86 translation" meant?

  • That's what I suggested in my comment's last paragraph. I don't think that counts as "an ARM chip with native x86 translation", but really the only person who can say whether that's what dlojudice meant is dlojudice.