Comment by dekhn

1 year ago

There are many intertwined issues here. One of the reasons we can't have a good parallel computer is that you need to get a large number of people to adopt your device for development purposes, and they need to have a large community of people who can run their code. Great projects die all the time because a slightly worse, but more ubiquitous technology prevents flowering of new approaches. There are economies of scale that feed back into ever-improving iterations of existing systems.

Simply porting existing successful codes from CPU to GPU can be a major undertaking and if there aren't any experts who can write something that drive immediate sales, a project can die on the vine.

See for example https://en.wikipedia.org/wiki/Cray_MTA when I was first asked to try this machine, it was pitched as "run a million threads, the system will context switch between threads when they block on memory and run them when the memory is ready". It never really made it on its own as a supercomputer, but lots of the ideas made it to GPUs.

AMD and others have explored the idea of moving the GPU closer to the CPU by placing it directly onto the same memory crossbar. Instead of the GPU connecting to the PCI express controller, it gets dropped into a socket just like a CPU.

I've found the best strategy is to target my development for what the high end consumers are buying in 2 years - this is similar to many games, which launch with terrible performance on the fastest commericially available card, then runs great 2 years later when the next gen of cards arrives ("Can it run crysis?")