Comment by touisteur

1 year ago

Seems to me there's a trend of applying explicit distributed systems (network of small-SRAM-ed cores each with some SIMD, explicit high-bandwidth message-passing between them, maybe some specialized ASICs such as tensor cores, FFT blocks...) looking at tenstorrent, cerebras, even kalray... out of the CUDA/GPU world, accelerators seem to be converging a bit. We're going to need a whole lot of tooling, hopefully relatively 'meta'.