Comment by pizlonator

3 months ago

Folks like to say that, but that's not what's happening.

The key difference is: what is an instruction set? Is it a Turing-complete thing with branches, calls, etc? Or is it just data flow instructions (math, compares, loads and stores, etc)?

X86 CPUs handle branching in the frontend using speculation. They predict where the branch will go, issue data flow instructions from that branch destination, along with a special "verify that I branched to the right place" instruction, which is basically just the compare portion of the branch. ARM CPUs do the same thing. In both X86 and ARM CPUs, the data flow instructions that the CPU actually executes look different (are lower level, have more registers) than the original instruction set.

This means that there is no need to translate branch destinations. There's never a place in the CPU that has to take a branch destination (an integer address in virtual memory) in your X86 instruction stream and work out what the corresponding branch destination is in the lower-level data flow stream. This is because the data flow stream doesn't branch; it only speculates.

On the other hand, a DBT has to have a story for translating branch destinations, and it does have to target a full instruction set that does have branching.

That said, I don't know what the Transmeta CPUs did. Maybe they had a low-level instruction set that had all sorts of hacks to help the translation layer avoid the problems of branch destination translation.

6 comments

pizlonator

monocasa 3 months ago

> That said, I don't know what the Transmeta CPUs did. Maybe they had a low-level instruction set that had all sorts of hacks to help the translation layer avoid the problems of branch destination translation.

Fixed guest branches just get turned into host branches and work like normal.

Indirect guest branches would get translated through a hardware jump address cache that was structured kind of like TLB tag lookups are.

pizlonator 3 months ago
Thank you for sharing!
> Fixed guest branches just get turned into host branches and work like normal.
How does that work in case of self-modifying code, or skewed execution (where the same x86 instruction stream has two totally different interpretations based on what offset you start at)?
- monocasa 3 months ago
  
  Skewed execution are just different traces. Basic blocks don't have a requirement that they don't partially overlap with other basic blocks. You want that anyway for optimization reasons even without skewed execution.
  Self modifying code is handled with MMU traps on the writes, and invalidation of the relevant traces. It is very much a slow path though. Ideally heavy self modfying code is able to stay in the interpreter though and not thrash in and out of the compiler.
  
  2 replies →

stinkbeetle 3 months ago

This is not true. x86 CPUs have long had micro-op caches that support taken branches which do not result in icache fetches. Probably started with Pentium4's trace cache which was perhaps a little more similar to Transmeta's design, but modern x86 CPUs from Intel and AMD both do dynamic translation from x86 to an internal instruction format that includes branches and likely has some transformation (e.g., some fusion and perhaps cracking).

The motivations and mechanics and performance characteristics are all very different than what Transmeta did, but still it is difficult to argue that modern x86 CPUs do not translate x86-64 into their own internal instruction sets even if you have this branching requirement.