← Back to context

Comment by jrmg

11 hours ago

I have the same worry about LLMs in general - I know that ‘model collapse’ seems to be an unfashionable idea, but when the internet’s just full of garbage (soon?…), what are we going to train these things on?

They moved away from raw text and are now working with verifiable synthetic data (eg math, games, code) to improve general reasoning.