Comment by brucedawson
5 years ago
Making the instruction not speculatable would indeed be a hardware change, which there was not time for. So that was not an option.
And, let's say they did that. All other loads/prefetches are done in the early stages of the pipeline, when execution is speculative. I think they would need new logic at a later stage of the pipeline just for this instruction, in order to initiate a "late prefetch". That is potentially a lot of extra transistors and wires. And, at that point you have a prefetch instruction that doesn't start prefetching until potentially dozens (or more) cycles later. At that point using xdcbt instead of dcbt may just make your code run slower.
What about, then, an xdcbt which is seen in a context where it is known early on that it will definitely be executed - a context where it is not speculative. Well, there really is no such context. Practically speaking there are so many branches that when an instruction is decoded there is almost always a conditional branch in front of it in the pipeline. And, architecturally speaking, any earlier instruction could trigger an exception which would stop execution flow from reaching the xdcbt. Pipelines are really really deep.
TL;DR - On heavily pipelined CPUs (even in-order ones) you don't know for sure that an instruction is "real" until it is time to commit its results, and that is way too late for a "prefetch"
No comments yet
Contribute on Hacker News ↗