Comment by vanschelven

8 days ago

This has been debunked (to me) here: https://simonwillison.net/2024/Dec/31/llms-in-2024/#syntheti...

That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.

Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.

Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.

  • Not at the rate people are giving up their creativity to serve the machine.