← Back to context

Comment by inkyoto

1 year ago

Frankly, there is no advantage to compressed instructions in a high performance CPU core as a misaligned instruction can span a memory page boundary, which will generate a memory fault, potentially a TLB flush, and, if the memory page is not resident in memory, will require an I/O operation. Which is much worse than crossing a cache line. It is a double whammy when both occur simultaneously.

One suggested solution has been filling in gaps with NOP's, but then the compiler would have to track the page alignment, which would not work anyway if a system supports pages of varying sizes (ordinary vs huge pages).

The best solution is perhaps to ignore compressed instructions when targeting high performance cores and confine their usage to where they belong: power efficient or low performance microcontrollers.

Page crossing affects a minuscule amount of cases - with 4096B pages and 100% non-compressed instructions (but still somehow 50% of the time misaligned), it affects only one in 2048 instructions.

The possibility of I/O is in no way exclusive to compressed instructions. If the page-crossing instruction was padded, the second page would need to be faulted in required anyway. All that matters is number of pages of code needed for the piece of code, which is simply just code size.

The only case that actually has a chance of mattering simply is just crossing cachelines.

And I would imagine high-performance cores would have some internal instruction buffer anyway, for doing cross-fetch-block instruction fusion and whatnot.

> One suggested solution has been filling in gaps with NOP's, but then the compiler would have to track the page alignment, which would not work anyway if a system supports pages of varying sizes (ordinary vs huge pages).

If it's in the linker then tracking pages sounds pretty doable.

You don't need to care about multiple page sizes. If you pad at the minimum page size, or even at 1KB boundaries, that's a miniscule number of NOPs.