← Back to context

Comment by codr7

8 days ago

No worries, this won't last long.

Once the algorithms predominantly feed on their own shit the bazillion dollar clown party is over.

This has been debunked (to me) here: https://simonwillison.net/2024/Dec/31/llms-in-2024/#syntheti...

  • That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.

  • Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.

    Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.

    • Not at the rate people are giving up their creativity to serve the machine.

Even supposing the purported "model collapse" does occur, it doesn't destroy the LLMs we already have -- which are clearly already capable of fooling humans. I don't see the clown party being over, just reaching a stable equilibrium.

  • Exactly. It logically can't occur, even by the own flawed assumptions of the people that say this. Just freeze all training data at 2024 or keep existing models, the worse case scenario is the models will plateau.

    • So how much did you invest in AI?

      Because you're not making much sense.

      You're saying it's not a problem because people will happily keep using LLM's that don't contain any new information after 2024.

      Really?