Comment by zozbot234

1 year ago

> Most practical parallel computing hardware had queues to handle the mismatch in compute speed for various CPUs to run different algorithms on part of the data.

> Eliminating the CPU bound compute, and running everything truly in parallel eliminates the need for the queues and all the related hardware/software complexity.

Modern parallel scheduling systems still have "queues" to manage these concerns; they're just handled in software, with patterns like "work stealing" that describe what happens when unexpected mismatches in execution time must somehow be handled. Even your "sea of LUTs (look up tables), that are all clocked and only connected locally to their neighbors" has queues, only the queue is called a "pipeline" and a mismatch in execution speed leads to "pipeline bubbles" and "stalls". You can't really avoid these issues.