Comment by toast0

11 hours ago

> For the life of me I don't understand why you'd ever want to do an atomic operation that's not naturally aligned, let alone one split across cache lines....

I assume they force packed their structure and it's poorly aligned, but x86 doesn't fault on unaligned access and Windows doesn't detect and punish split locks, so while you probably would get better performance with proper alignment, it might not be a meaningful improvement on the majority of the machines running the program.

10 comments

toast0

anematode 11 hours ago

Ah, that's a great hypothesis. I wonder, then, how it works with x86 emulation on ARM. IIRC, atomic ops on ARM fault if the address isn't naturally aligned... but I guess the runtime could intercept that and handle it slowly.

omcnoe 9 hours ago
ARM macs apparently have some kind of specific handling in place for this when a process is running with x86_64 compatibility, but it’s not publicly documented anywhere that I can see.
- my123 8 hours ago
  
  XNU has this oddity: https://github.com/apple-oss-distributions/xnu/blob/f6217f89...
  Redacted from open source XNU, but exists in the closed source version
  
  1 reply →
BobbyTables2 11 hours ago
An emulated x86 atomic instruction wouldn’t need to use atomic instructions on ARM.
- dooglius 11 hours ago
  
  Why not?
  
  4 replies →