Comment by the__alchemist

7 days ago

Et tu, GPU?

I am, bluntly, sick of Async taking over rust ecosystems. Embedded and web/HTTP have already fallen. I'm optimistic this won't take hold in GPU; well see. Async splits the ecosystem. I see it as the biggest threat to Rust staying a useful tool.

I use rust on the GPU for the following: 3d graphics via WGPU, cuFFT via FFI, custom kernels via Cudarc, and ML via Burn and Candle. Thankfully these are all Async-free.

6 comments

the__alchemist

notnullorvoid 7 days ago

I don't see the utility of async on the GPU.

> Async splits the ecosystem. I see it as the biggest threat to Rust staying a useful tool.

Someone somewhere convinced you there is a async coloring problem. That person was wrong, async is an inherent property of some operations. Adding it as a type level construct gives visibility to those inherent behaviors, and with that more freedom in how you compose them.

8note 7 days ago
itd be interesting to see a setup where there's only async and you have to specify when you actually want to block on a result.
flip the colouring problem on its head

lmeyerov 7 days ago

For graphistry at least, I care less about the surface syntax (async/await) and more about getting gpu-side work stealing, dynamic task scheduling, etc. Our code is written at such a much higher level that these are primitives needed by our runtime, not most of our developers & users. Imagine SQL, cypher, etc on GPUs and our implementations of those being able to use the gpu-side libraries when coordinating 1M+ threads.

winwang 7 days ago
Is it the performance benefits? Or being able to write concurrent code much more expressively? Though I suppose the latter might imply the former.
- lmeyerov 7 days ago
  
  Performance.
  Our code looks like pure pandas (fancier SQL) wrapped as HTTP service (arrow instead of json), so the expressivity is more of a step backwards. We already did the work of turning awkward irregular code into relational pipelines that GPUs love.
  Our problems are:
  - Multi-tenancy. Our users get to time share GPUs, so when getting many GPU tasks big & small, we want them co-scheduled across the many GPUs & their many cores. GPUs are already more cost effective per Watt than CPUs, but we think we can 2x+ here, which is significant.
  - Constant overheads. One job can be deep, with many operations, so round-tripping each step of the control plane, think each SQL subexpression, CPU<>GPU is silly and adds up. Small jobs are dominated by embarrassing overheads that are precluding certain use cases. We are thinking of doing CPU hot paths to avoid this, but rather just fix the GPU path.