Comment by fooblaster
13 hours ago
How does fex deal with the fact that the memory model on arm is weak and x86 is total store ordering. It seems like would need to hammer performance by putting memory barriers everywhere to handle all cases. Perhaps fex only works when there are well defined mutexes it can gain visibility into? anyone know?
Looks like they do expensive conservative TSO emulation by default, but they're able to piggyback on compiler work that Microsoft did to make newer Windows x86 binaries easier to emulate. Since MSVC 2019 they annotate the executable with metadata that informs an emulator of when TSO is or isn't needed for correctness.
https://fex-emu.com/FEX-2510/
FEX also has settings which weaken or disable TSO altogether, favoring performance over correctness. You wouldn't want to rely on those for anything important but a game possibly crashing isn't the end of the world.
So that optimization only works on executables produced by MSVC? Are those annotations documented and/or produced by other compilers?
No.
It would be nice to see more Arm chips adopt Apple's approach (which fixes this problem) for Rosetta 2. Basically, Apple's chips can be switched into a TSO mode and a few other minor tweaks that make x86 code run much, much faster.
I think that's right, there is no better way than just adding barriers. On Apple hardware it can probably make use of the special memory ordering mode, but on normal ARM64 there's probably nothing it can do.