Comment by andy12_

3 months ago

Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.