Comment by vajrabum
3 days ago
The platforms I've seen live on top of kubernetes so I'm afraid it is possible. nvidia-docker, all the cuda libraries and drivers, nccl, vllm,... Large scale distributed training and inference are complicated beasties and the orchestration for them is too.
No comments yet
Contribute on Hacker News ↗