Comment by nico
2 months ago
> Why did it cost only $6? Because they used a small model and hardly any data.
> After sifting their dataset of 56K examples down to just the best 1K, they found that the core 1K is all that’s needed to achieve o1-preview performance on a 32B model. Adding data didn’t raise performance at all.
> 32B is a small model, I can run that on my laptop. They used 16 NVIDIA H100s for 26 minutes per training run, that equates to around $6.
No comments yet
Contribute on Hacker News ↗