Comment by buildbot
5 months ago
The PA-RISC processor are really cool - the C8900 has a 64MB external shared L2, and a 768KB I & D cache per core. That's more I & D cache than any modern processor I am aware of, and more last level cache than basically anything but an AMD X3D or the last 5-10 years of X86 server chips. It's much slower of course than any modern cache.
The older 8500 has an article available with a die shot: https://ardent-tool.com/CPU/docs/MPR/19971117/111505.pdf It's like 75% cache even back then. (Fast SRAM With Integrated CPU is extremely accurate, lol).
> The PA-RISC processor are really cool
I'm glad someone else thinks so!
There's a very interesting vid about the design of the ISA here https://www.youtube.com/watch?v=C53tGHzp1PI and I think it's pretty clear they learned from early MIPS/Sparc. It's a shame it got abandoned in the Itanium push.
The Alpha was also a performance king in that era, but tbh I don't have the same nostalgia for it, although clearly it was executed very well.
(author) You can also count me in that list - working on a PA-RISC system was my first job out of college. I found the ISA very clean and they were strong performers. How HP got the wrong idea about VLIW, I'll never understand.
That's cool you worked on a PA-RISC system as a job! The ISA seems clean and the later superscalar designs were very advanced for their time.
I think an updated PA-RISC design be awesome for modern workloads; huge caches with prefetch, a good branch predictor + a 8-10 wide dispatch, and some sort of vector extension. A Mix of AMD Zen+X3D & Apple ARM. To be fair, ISA doesn't matter as much really these days, any core with similar features probably would perform well.
There's always someone who thinks VLIW or a similar is a good idea. So far that's been a bit tricky for a general purpose CPU, or even some parallel designs.
* 100% personal opinion, I've never actually worked on HW design directly *
> 768KB I & D cache per core
PA-RISC has mostly always had large L1 caches ( that used to be off-chip), and usually no L2 cache.
I know this bit of trivia, but I don't know the technical reasons/trade-offs for it.