Comment by mikewarot

1 year ago

Most practical parallel computing hardware had queues to handle the mismatch in compute speed for various CPUs to run different algorithms on part of the data.

Eliminating the CPU bound compute, and running everything truly in parallel eliminates the need for the queues and all the related hardware/software complexity.

Imagine a sea of LUTs (look up tables), that are all clocked and only connected locally to their neighbors. The programming for this, even as virtual machine, allows for exploration of a virtually infinite design space of hardware with various tradeoffs for speed, size, cost, reliability, security, etc. The same graph could be refactored to run on anything in that design space.

> Most practical parallel computing hardware had queues to handle the mismatch in compute speed for various CPUs to run different algorithms on part of the data.

> Eliminating the CPU bound compute, and running everything truly in parallel eliminates the need for the queues and all the related hardware/software complexity.

Modern parallel scheduling systems still have "queues" to manage these concerns; they're just handled in software, with patterns like "work stealing" that describe what happens when unexpected mismatches in execution time must somehow be handled. Even your "sea of LUTs (look up tables), that are all clocked and only connected locally to their neighbors" has queues, only the queue is called a "pipeline" and a mismatch in execution speed leads to "pipeline bubbles" and "stalls". You can't really avoid these issues.