Comment by aseipp

5 days ago

Knights Landing is a major outlier; the cores there were extremely small and had very few resources dedicated to them (e.g. 2-wide decode) relative to the vector units, so of course that will dominate. You aren't going to see 40% of the die dedicated to vector register files on anything looking like a modern, wide core. The entire vector unit (with SRAM) will be in the ballpark of like, cumulative L1/L2; a 512-bit register is only a single 64 byte cache line, after all.

3 comments

aseipp

dlcarrier 5 days ago

Also, the Knights Landing/Mill implementation is completely different from modern AVX-512. It's Ice Lake and Zen 4 that introduced modern AVX-512.

Aurornis 5 days ago

True! But even if only 20% of the die area goes to AVX-512 in larger cores, that makes a big difference for high core count CPUs.

That would be like having a 50-core CPU instead of a 64-core CPU in the same space. For these cloud native CPU designs everything that takes significant die area translates to reduced core count.

wtallis 5 days ago

You're still grossly overestimating the area required for AVX-512. For example, on AMD Zen4, the entire FPU has been estimated as 25% of the core+L2 area, and that's including AVX-512. If you look at the extra area required for AVX-512 vs 256-bit AVX2, as a fraction of total die area including L3 cache and interconnect between cores, it's definitely not going to be a double digit percentage.