Comment by zeusk
13 hours ago
They’re used by the internal register renamer/allocator so if it sees you’re storing the results to memory then reusing the named register for a new result - it will allocate a new physical register so your instruction doesn’t stall for the previous write to go through.
I do not understand what you want to say.
The register renamer allocates a new physical register when you attempt to write the same register as a previous instruction, as otherwise you would have to wait for that instruction to complete, and you would also have to wait for any instructions that would want to read the value from that register.
When you store a value into memory, the register renamer does nothing, because you do not attempt to modify any register.
The only optimization is that if a following instruction attempts to read the value stored in the memory, that instruction does not wait for the previous store to complete, in order to be able to load the stored value from the memory, but it gets the value directly from the store queue. But this has nothing to do with register renaming.
Thus if your algorithm has already used all the visible register numbers, and you will still need in the future all the values that occupy the registers, then you have to store one register into the memory, typically in the stack, and the register renamer cannot do anything to prevent this.
This is why Intel will increase the number of architectural general-purpose registers of x86-64 from 16 to 32, matching Arm Aarch64 and IBM POWER, with the APX ISA extension, which will be available in the Nova Lake desktop/laptop CPUs and in the Diamond Rapids server CPUs, which are expected by the end of this year.
Register renaming is a typical example of the general strategy that is used when shared resources prevent concurrency: the shared resources must be multiplied, so that each concurrent task uses its private resource.