Comment by mlyle

3 months ago

The software side is limited, somewhat intrinsically (there tend to be a lot of things we want to do in order--- Amdahl's law wins).

And even when you aren't intrinsically limited by that, optimal placement doesn't reduce contention that much (assuming you're not ping-ponging a single cache line every operation or something dumb like that).

But the hardware side, too: we're not getting transistors that quickly anymore, and we don't want anything too much smaller than an Intel E-core. Even if we stack 3D, all that net wafer area is not cheap and isn't cheapening quickly.