Comment by zozbot234

1 month ago

If you're running vastly different processes in different ALU lanes, the single master "program" that comprises them all is effectively an interpreter. And then it's hard to have the exact same control flow lead to vastly different effects in different processes, especially once you account for branches. This works well for inference batches since those are essentially about straight-line processing, but not much else.