Comment by mananaysiempre

12 hours ago

> Many common [x86] instructions translate to single "RISCy" instruction for the internal microarchitecture

And then there are read-modify-write instructions, which on modern CPUs need two address-generation μops in addition to the load one, the store one, and the ALU one. So the underlying load-store architecture is very visible.

There’s also the part where we’ve trained ourselves out of using the more CISCy parts of x86 like ENTER, BOUND, or even LOOP, because they’ve been slow for ages, and thus they stay slow.

2 comments

mananaysiempre

p_l 11 hours ago

Even many of the more complex instructions often can translate into surprisingly short sequences - all sorts of loop structures have now various kinds of optimizations including instruction fusion that probably would not be necessary if we didn't stop using higher level LOOP constructs ;-)

But for example REP MOVS now is fused into equivalent of using SSE load-stores (16 bytes) or even AVX-512 load stores (64 bytes).

And of course equivalent of LEA by using ModRM/SIB prefixes is pretty much free with it being AFAIK handled as pipeline step

monocasa 7 hours ago

There's levels of microcode.

It's not too uncommon for each pipeline stage or so to have their own uop formats as each stage computes what it was designed to and culls what later stages don't need.

Because of this it's not that weird to see both a single rmw uops at, says the initial decode and microcode layer, that then gets cracked into the different uops for the different functional units later on.