← Back to context

Comment by tliltocatl

1 year ago

Larrabee was something like that, didn't took off.

IMHO, the real issue is cache coherence. GPUs are spared from doing a lot of extra work by relaxing coherence guarantees quite a bit.

Regarding the vendor situation - that's basically how most of computing hardware is, save for the PC platform. And this exception is due to Microsoft successfully commoditizing their complements (which caused quite some woe on the software side back then).

Is cache coherence a real issue, absent cache contention? AIUI, cache coherence protocols are sophisticated enough that they should readily adapt to workloads where the same physical memory locations are mostly not accessed concurrently except in pure "read only" mode. So even with a single global address space, it should be possible to make this work well enough if the programs are written as if they were running on separate memories.

  • It is because cache coherence requires extra communication to make sure that the cache is coherent. There's cute stratgies for reducing the traffic, but ultimately you need to broadcast out reservations to all of the other cache coherent nodes, so there's an N^2 scaling at play.

I miss, not exactly Larrabee, but what it could have become. I want just an insane number of very fast, very small cores with their own local memory.

In the field usually nothing takes off on the first attempt, so this is just a reason to ask "what's different this time" on the following attempts.