Comment by otabdeveloper4
5 days ago
> meager hardware
Qwen was made on a cluster about that size.
And this is before anybody ever thought about optimizing the training process. (Currently it's just pytorch analyst-as-coder slop, with extremely overprovisioned quantizations, etc.)
No comments yet
Contribute on Hacker News ↗