Comment by croemer
1 year ago
That's super impressive. I remember being astonished that the x86 executable of Python running through Rosetta 2 on my M1 was just a factor of 2 slower than the native version.
QEMU was something like a factor of 5-10x slower than native, IIRC.
QEMU probably had to account for differences in memory models. A fork with that stuff removed might be able to easily catch up.
QEMU loses a bit from being a generic translator instead of being specialized for x86->ARM like Rosetta 2, Box64 or FEXEmu. It does a lot of spilling for example even though x86 has a lot fewer registers than aarch64.
Flags are also tricky, though they're pretty well optimized. In the end the main issue with them is also the spilling, but QEMU's generic architecture makes it expensive to handle consecutive jump instructions for example.
I found this blog post reverse engineering Rosetta 2 translated code: https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-f...
2 replies →