← Back to context

Comment by Joker_vD

1 year ago

x86 instruction lengths range from 1 to 15.

> a line of cache and 4 byte instructions you could start decoding 32 instructions in parallel

In practice, ARM processors decode up to 4 instructions in parallel; so do Intel and AMD.

Apple's m1 chips are 8 wide. and AMD and Intel's newest chips are also doing more fancy things than 4 wide

  • Any reading resources? I’d love to learn better the techniques they’re using to get better parsllelism. The most obvious solution I can imagine is that they’d just try to brute force starting to execute every possible boundary and rely on it either decoding an invalid instruction or late latching the result until it got confirmed that it was a valid instruction boundary. Is that generally the technique or are they doing more than even that? The challenge with this technique of course is that you risk wasting energy & execution units on phantom stuff vs an architecture that didn’t have as much phantomness potential in the first place.