That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.
Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.
Disagreeing with something is not debunking.
Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
Not at the rate people are giving up their creativity to serve the machine.
[dead]