Comment by imtringued

1 year ago

The problem isn't address space or program counters. It's that each processor is going to need instruction memory stored in SRAM or an extremely efficient multi port memory for a shared instruction cache.

GPUs get around this limitation by executing identical instructions over multiple threads.

Instructions are the problem, you have to have an architecture which just operates on data flows all in parallel and all at once, like an FPGA, but without all the fiddly special sauce parts.