Comment by rmuratov

5 months ago

How did we use "all the data"? New knowledge appears on the internet every day, new scientific articles and videos are published.

1 comment

rmuratov

lend000 5 months ago

At the speeds AI is moving, we've effectively used it all; the high quality data you need to make smarter models is coming in at a trickle. We're not getting 10^5 Principia Mathematicas published every day. Maybe I just don't have the vision to understand it, but it seems like AI-generated synthetic data for training shouldn't be able to make a smarter model than whatever produced that data. I can imagine synthetic data would be useful for making models more efficient (that's what quantized models are, after all), but not pushing the frontier.