Comment by kig

11 days ago

"We leverage APIs like CUDA streams to avoid blocking the GPU while the host processes requests.", so I'm guessing it would let the other GPU threads go about their lives while that one waits for the ACK from the CPU.

I once wrote a prototype async IO runtime for GLSL (https://github.com/kig/glslscript), it used a shared memory buffer and spinlocks. The GPU would write "hey do this" into the IO buffer, then go about doing other stuff until it needed the results, and spinlock to wait for the results to arrive from the CPU. I remember this being a total pain, as you need to be aware of how PCIe DMA works on some level: having your spinlock int written to doesn't mean that the rest of the memory write has finished.

0 comments

kig

No comments yet

Contribute on Hacker News ↗