Comment by aseipp
5 days ago
Knights Landing is a major outlier; the cores there were extremely small and had very few resources dedicated to them (e.g. 2-wide decode) relative to the vector units, so of course that will dominate. You aren't going to see 40% of the die dedicated to vector register files on anything looking like a modern, wide core. The entire vector unit (with SRAM) will be in the ballpark of like, cumulative L1/L2; a 512-bit register is only a single 64 byte cache line, after all.
Also, the Knights Landing/Mill implementation is completely different from modern AVX-512. It's Ice Lake and Zen 4 that introduced modern AVX-512.
True! But even if only 20% of the die area goes to AVX-512 in larger cores, that makes a big difference for high core count CPUs.
That would be like having a 50-core CPU instead of a 64-core CPU in the same space. For these cloud native CPU designs everything that takes significant die area translates to reduced core count.
You're still grossly overestimating the area required for AVX-512. For example, on AMD Zen4, the entire FPU has been estimated as 25% of the core+L2 area, and that's including AVX-512. If you look at the extra area required for AVX-512 vs 256-bit AVX2, as a fraction of total die area including L3 cache and interconnect between cores, it's definitely not going to be a double digit percentage.