Comment by janalsncm

3 months ago

> even the smartest people make hundreds of tiny experiments

This is the most important point, and why DeepSeek’s cheaper training matters.

And if you check the R1 paper, they have a section for “things that didn’t work”, each of which would normally be a paper of its own but because their training was so cheap and streamlined they could try a bunch of things.