Comment by ianm218

4 hours ago

Reinforcement learning has become a huge portion of compute used during training runs [1] and synthetic data is letting us get lots more mileage out of the existing data. Additionally, there is lots of new, high quality data being created and collected each day. I think the "running out of data" thing was pretty poorly reported by mainstream media.

[1]. https://www.dwarkesh.com/p/dario-amodei-2