Comment by adrian_b
3 hours ago
I do not understand what you want to say.
The register renamer allocates a new physical register when you attempt to write the same register as a previous instruction, as otherwise you would have to wait for that instruction to complete, and you would also have to wait for any instructions that would want to read the value from that register.
When you store a value into memory, the register renamer does nothing, because you do not attempt to modify any register.
The only optimization is that if a following instruction attempts to read the value stored in the memory, that instruction does not wait for the previous store to complete, in order to be able to load the stored value from the memory, but it gets the value directly from the store queue. But this has nothing to do with register renaming.
Thus if your algorithm has already used all the visible register numbers, and you will still need in the future all the values that occupy the registers, then you have to store one register into the memory, typically in the stack, and the register renamer cannot do anything to prevent this.
This is why Intel will increase the number of architectural general-purpose registers of x86-64 from 16 to 32, matching Arm Aarch64 and IBM POWER, with the APX ISA extension, which will be available in the Nova Lake desktop/laptop CPUs and in the Diamond Rapids server CPUs, which are expected by the end of this year.
Register renaming is a typical example of the general strategy that is used when shared resources prevent concurrency: the shared resources must be multiplied, so that each concurrent task uses its private resource.
> When you store a value into memory, the register renamer does nothing, because you do not attempt to modify any register.
you are of course correct about everything. But the extreme pendant in me can't avoid pointing out that there are in fact a few mainstream CPUs[1] that can rename memory to physical registers, at least in some cases. This is done explicitly to mitigate the cost of spilling. edit: this is different from the store-forwarding optimization you mentioned.
[1] Ryzen for example: https://www.agner.org/forum/viewtopic.php?t=41