Comment by tankenmate
8 hours ago
"Training on synthetic data one time is very different than cyclically training models on their own data.", but every one with even a modicum of understanding of feedback knows that cyclic training on its own output will end in tears; it's bordering on a tautologic inverse.
Is there an actual general principle or theorem or anything that you can link on this? I’m skeptical because these “model collapse” ideas sound vaguely technical and intuitive, but mostly seem to be based on observations about things that happened to happen with current LLMs. It gets bandied about like it is the most obvious thing, but the support mostly seems to be… pseudo-technical vibes.