← Back to context

Comment by hmpc

4 hours ago

This is explicitly called out in the post as well as the Intel instruction manual. Every codebase I've ever seen that reads the TSC either issues an LFENCE or uses RDTSCP.

In my benchmarks RDTSCP has a slight advantage, despite the slower latency on paper, because it doesn't fully serialise the instruction stream (later instructions can start executing, unlike with LFENCE). Whether the ECX clobber will outweigh that will depend on the situation.