Comment by jrmg

11 hours ago

I have the same worry about LLMs in general - I know that ‘model collapse’ seems to be an unfashionable idea, but when the internet’s just full of garbage (soon?…), what are we going to train these things on?

1 comment

jrmg

tehjoker 3 hours ago

They moved away from raw text and are now working with verifiable synthetic data (eg math, games, code) to improve general reasoning.