Comment by Joker_vD
1 year ago
x86 instruction lengths range from 1 to 15.
> a line of cache and 4 byte instructions you could start decoding 32 instructions in parallel
In practice, ARM processors decode up to 4 instructions in parallel; so do Intel and AMD.
Apple's m1 chips are 8 wide. and AMD and Intel's newest chips are also doing more fancy things than 4 wide
Any reading resources? I’d love to learn better the techniques they’re using to get better parsllelism. The most obvious solution I can imagine is that they’d just try to brute force starting to execute every possible boundary and rely on it either decoding an invalid instruction or late latching the result until it got confirmed that it was a valid instruction boundary. Is that generally the technique or are they doing more than even that? The challenge with this technique of course is that you risk wasting energy & execution units on phantom stuff vs an architecture that didn’t have as much phantomness potential in the first place.
https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5... is a pretty good overview of the microarchitecture. I don't think they say how they get there, because trade secrets.