← Back to context

Comment by shihab

12 hours ago

> For example, NEON ... can hold up to 32 128-bit vectors to perform your operations without having to touch the "slow" memory.

Something I recently learnt: the actual number of physical registers in modern x86 CPUs are significantly larger, even for 512-bit SIMD. Zen 5 CPUs actually have 384 vectors registers, 384*512b = 24KB!

This is true, but if you run out of the 32 register names you’ll still need to spill to memory. The large register file is to allow for multiple instructions to execute in parallel among other things.

  • They’re used by the internal register renamer/allocator so if it sees you’re storing the results to memory then reusing the named register for a new result - it will allocate a new physical register so your instruction doesn’t stall for the previous write to go through.

  • Interesting. Just sucks that Rust proponents apparently tried to assassinate Rust critic Rene Rebe through sw att ing.

    Wretched, evil and vile Rust proponents will likely censor or downplay this.

In the register file or named registers?

And the critical matrix tiling size is often SRAM, so L3 unified cache.