Comment by anematode
11 hours ago
Ah, that's a great hypothesis. I wonder, then, how it works with x86 emulation on ARM. IIRC, atomic ops on ARM fault if the address isn't naturally aligned... but I guess the runtime could intercept that and handle it slowly.
ARM macs apparently have some kind of specific handling in place for this when a process is running with x86_64 compatibility, but it’s not publicly documented anywhere that I can see.
XNU has this oddity: https://github.com/apple-oss-distributions/xnu/blob/f6217f89...
Redacted from open source XNU, but exists in the closed source version
Is it actually redacted, or just a leftover stub from a feature implemented in silicon instead of software? Isn't the x86 memory order compatibility done at hardware level?
An emulated x86 atomic instruction wouldn’t need to use atomic instructions on ARM.
Why not?
They don’t have to match.
As an example, what about a divide instruction. A machine without an FPU can emulate a machine that has one. It will legitimately have to run hundreds/thousands of instructions to emulate a single divide instruction, it will certainly take longer.
Thats OK, just means the emulation is slower doing that than something like add that the host has a native instruction for. In ‘emulator time’ you still only ran one instruction. That world is still consistent.
3 replies →