Comment by repiret

25 days ago

> CPUs fetch data from memory in fixed-size blocks of so-many bytes, and performance degrades when data is misaligned.

A memory bus supports memory transactions of various sizes, with the largest size supported being a function of how many data lines there are. The following two statements are true of every memory bus with which I'm familiar, and I probably every bus in popular use: (1) only power-of-two sizes are supported; (2) only aligned transactions are supported.

Arm, x86, and RISC-V are relatively unique among the multitude of CPU architectures in that if they are asked to make an unaligned memory transaction, they will compose that transaction from multiple aligned transactions. Or maybe service it in cache and it never has to hit a memory bus.

Most CPU architectures, including PPC, MIPS, Sparc, and ColdFire/68k, will raise an exception when asked to perform a misaligned memory transaction.

The tradition of aligning data originated when in popular CPU architectures, if you couldn't assume that data was aligned, you would need to use many CPU instructions to simulate misalinged access in software. It continued in compilers for Arm and x86 because even though those CPUs could make multiple bus transactions in response to a single mis-aligned memory read, that takes time and so it was much slower.

I don't know for sure, but I would expect that on modern x86 and high performance Arm, the performance penalty is quite small, if there's any at all.

1 comment

repiret

Joker_vD 25 days ago

It's small, but not unnoticeable... depending on the exact size of the workload and the amount of computation per element. In fact, for huge arrays it may be beneficial to have structs packed if that leads to less memory traffic.

[0] https://jordivillar.com/blog/memory-alignment

[1] https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

[2] https://lemire.me/blog/2025/07/14/dot-product-on-misaligned-...