Comment by anematode
13 hours ago
Ah, that's a great hypothesis. I wonder, then, how it works with x86 emulation on ARM. IIRC, atomic ops on ARM fault if the address isn't naturally aligned... but I guess the runtime could intercept that and handle it slowly.
ARM macs apparently have some kind of specific handling in place for this when a process is running with x86_64 compatibility, but it’s not publicly documented anywhere that I can see.
XNU has this oddity: https://github.com/apple-oss-distributions/xnu/blob/f6217f89...
Redacted from open source XNU, but exists in the closed source version
Is it actually redacted, or just a leftover stub from a feature implemented in silicon instead of software? Isn't the x86 memory order compatibility done at hardware level?
1 reply →
An emulated x86 atomic instruction wouldn’t need to use atomic instructions on ARM.
Why not?
They don’t have to match.
As an example, what about a divide instruction. A machine without an FPU can emulate a machine that has one. It will legitimately have to run hundreds/thousands of instructions to emulate a single divide instruction, it will certainly take longer.
Thats OK, just means the emulation is slower doing that than something like add that the host has a native instruction for. In ‘emulator time’ you still only ran one instruction. That world is still consistent.
3 replies →