Comment by Smaug123
6 hours ago
By the way, you've seen Cerebras? It's not gone as far as what you described - loads of cores and RAM but you still load up the weights onto it as software and they need to be streamed into the chip for large models - but it is a whole wafer.
Cerebras is a whole lot of SRAM, basically a ton more L1/L2 cache, hence increasing throughput.
They're pretty supply constrained right now though and their production costs seem prohibitive.
The interesting players at the moment are from Toronto: taalas (print the model onto the silicon) and tenstorrent (dataflow programming based hardware)
There is a huge downside to weights being modifiable - it means you need to have multipliers (not simply adders), and SRAM to store those weights.
I suspect for equal performance, that's probably a 5x increase in silicon area (and therefore cost).