Comment by lmeyerov
7 days ago
For graphistry at least, I care less about the surface syntax (async/await) and more about getting gpu-side work stealing, dynamic task scheduling, etc. Our code is written at such a much higher level that these are primitives needed by our runtime, not most of our developers & users. Imagine SQL, cypher, etc on GPUs and our implementations of those being able to use the gpu-side libraries when coordinating 1M+ threads.
Is it the performance benefits? Or being able to write concurrent code much more expressively? Though I suppose the latter might imply the former.
Performance.
Our code looks like pure pandas (fancier SQL) wrapped as HTTP service (arrow instead of json), so the expressivity is more of a step backwards. We already did the work of turning awkward irregular code into relational pipelines that GPUs love.
Our problems are:
- Multi-tenancy. Our users get to time share GPUs, so when getting many GPU tasks big & small, we want them co-scheduled across the many GPUs & their many cores. GPUs are already more cost effective per Watt than CPUs, but we think we can 2x+ here, which is significant.
- Constant overheads. One job can be deep, with many operations, so round-tripping each step of the control plane, think each SQL subexpression, CPU<>GPU is silly and adds up. Small jobs are dominated by embarrassing overheads that are precluding certain use cases. We are thinking of doing CPU hot paths to avoid this, but rather just fix the GPU path.