← Back to context

Comment by winwang

8 days ago

Is it the performance benefits? Or being able to write concurrent code much more expressively? Though I suppose the latter might imply the former.

Performance.

Our code looks like pure pandas (fancier SQL) wrapped as HTTP service (arrow instead of json), so the expressivity is more of a step backwards. We already did the work of turning awkward irregular code into relational pipelines that GPUs love.

Our problems are:

- Multi-tenancy. Our users get to time share GPUs, so when getting many GPU tasks big & small, we want them co-scheduled across the many GPUs & their many cores. GPUs are already more cost effective per Watt than CPUs, but we think we can 2x+ here, which is significant.

- Constant overheads. One job can be deep, with many operations, so round-tripping each step of the control plane, think each SQL subexpression, CPU<>GPU is silly and adds up. Small jobs are dominated by embarrassing overheads that are precluding certain use cases. We are thinking of doing CPU hot paths to avoid this, but rather just fix the GPU path.