Comment by suddenlybananas

8 hours ago

>Training on LLM outputs leads to catastrophic collapse. Every outlet led with this. But no-one red the fine-print, they were testing on small toy models, and were using everything that came out to re-train. Of course it's gonna fail. L3 / phi / gpt-oss models showed that you can absolutely train on synthetic datasets and have great results

You're conflating two very different things. Training on synthetic data one time is very different than cyclically training models on their own data. It has nothing to do with model size.

4 comments

suddenlybananas

NitpickLawyer 8 hours ago

Perhaps I worded it poorly. My main point was that articles focus on the wrong thing. Most coverage of that paper was "Using LLM generated data leads to CATASTROPHIC collapse". Without reading the fineprint.

> [...] cyclically training models on their own data. It has nothing to do with model size.

Of course it does. GRPO is basically "training models on their own data". You sample, you check for a known truth, you adapt the weights. Repeat. And before GRPO there was RLAIF which showed improving scores at 3 "stages" of generate - select - re-train. With diminishing returns after 3 stages, but no catastrophic collapse.

My main point was about articles and cherrypicking catchy phrases, not criticising research. We need the research. But we also need good articles that aren't written just for the negativity sells titles.

cheeky edit: see this thread [1]. I know slashdot has fallen a lot in the last years, but I skimmed the root comments. Not one addressing the "toy" model problem. Everyone reads the title, and reinforces their own biases. That's the main problem I was trying to address.

1 - https://slashdot.org/story/25/08/11/2253229/llms-simulated-r...

suddenlybananas 8 hours ago

If you have a ground truth that you're comparing to, that's not training on your own data.

tankenmate 8 hours ago

"Training on synthetic data one time is very different than cyclically training models on their own data.", but every one with even a modicum of understanding of feedback knows that cyclic training on its own output will end in tears; it's bordering on a tautologic inverse.

bee_rider 3 hours ago

Is there an actual general principle or theorem or anything that you can link on this? I’m skeptical because these “model collapse” ideas sound vaguely technical and intuitive, but mostly seem to be based on observations about things that happened to happen with current LLMs. It gets bandied about like it is the most obvious thing, but the support mostly seems to be… pseudo-technical vibes.