Comment by alfalfasprout
2 days ago
The infra does become pretty complex to get a SOTA LLM trained. People assume it's as simple as loading up the architecture and a dataset + using something like Ray. There's a lot that goes into designing the dataset, the eval pipelines, the training approach, maximizing the use of your hardware, dealing with cross-node latency, recovering from errors, etc.
But it's good to have more and more players in this space.
No comments yet
Contribute on Hacker News ↗