B) now we have "reasoning" models that can be used to analyse the data, create n rollouts for each data piece, and "argue" for / against / neutral on every piece of data going into the model. Imagine having every page of a "short story book" + 10 best "how to write" books, and do n x n on them. Huge compute, but basically infinite data as well.
We went from "a bunch of data" to "even more data" to "basically everything we got" to "ok, maybe use a previous model to sort through everything we got and only keep quality data" to "ok, maybe we can augment some data with synthetic datasets from tools etc" to "RL goes brrr" to (point B from above) "let's mix the data with quality sources on best practices".
Are you basically saying synthetic data and having a bunch of models argue with each other to distill the most agreeable of their various outputs solves the issue of peak data?
Because from my vantage point, those have not given step changes in AI utility the way crunching tons of data did. They have only incrementally improved things
A) We are out of the Internet-scale-for-free data. Of course the companies deploying LLM based systems at massive scale are of course ingesting a lot of human data from their users, that they are seeking to use to further improve their models.
B) Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs? What is the latest key research on this?
Certainly the models have orders of magnitude more data available to them than the smartest human being who ever lived does/did. So we can assume that if the goal is "merely" superhuman intelligence, data is not a problem.
It might be a constraint on the evolution of godlike intelligence, or AGI. But at that point we're so far out in bong-hit territory that it will be impossible to say who's right or wrong about what's coming.
Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs?
My understanding (which might be incorrect) is that this amounts to RLHF without the HF part, and is basically how DeepSeek-R1 was trained. I recall reading about OpenAI being butthurt^H^H^H^H^H^H^H^H concerned that their API might have been abused by the Chinese to train their own model.
The thing is, people claimed already a year or two ago that we'd reached peak data and progress would stall since there was no more high-quality human-written text available. Turns out they were wrong, and if anything progress accelerated.
The progress has come from all kinds of things. Better distillation of huge models to small ones. Tool use. Synthetic data (which is not leading to model collapse like theorized). Reinforcement learning.
I don't know exactly where the progress over the next year will be coming from, but it seems hard to believe that we'll just suddenly hit a wall on all of these methods at the same time and discover no new techniques. If progress had slowed down over the last year the wall being near would be a reasonable hypothesis, but it hasn't.
I'm loving it, can't wait to deploy this stuff locally. The mainframe will be replaced by commodity hardware, OpenAI will stare down the path of IBM unless they reinvent themselves.
> people claimed already a year or two ago that we'd reached peak data and progress would stall
The claim was we've reached peak data (which, yes we did) and that progress would have to come from some new models or changes. Everything you described has made incremental changes, not step changes. Incremental changes are effectively stalled progress. Even this model has no proof and no release behind it
there is also huge realm of private/commercial data which is not absorbed by LLMs yet. I think there are way more private/commercial data than public data.
> now that we've reached peak data?
A) that's not clear
B) now we have "reasoning" models that can be used to analyse the data, create n rollouts for each data piece, and "argue" for / against / neutral on every piece of data going into the model. Imagine having every page of a "short story book" + 10 best "how to write" books, and do n x n on them. Huge compute, but basically infinite data as well.
We went from "a bunch of data" to "even more data" to "basically everything we got" to "ok, maybe use a previous model to sort through everything we got and only keep quality data" to "ok, maybe we can augment some data with synthetic datasets from tools etc" to "RL goes brrr" to (point B from above) "let's mix the data with quality sources on best practices".
Are you basically saying synthetic data and having a bunch of models argue with each other to distill the most agreeable of their various outputs solves the issue of peak data?
Because from my vantage point, those have not given step changes in AI utility the way crunching tons of data did. They have only incrementally improved things
A) We are out of the Internet-scale-for-free data. Of course the companies deploying LLM based systems at massive scale are of course ingesting a lot of human data from their users, that they are seeking to use to further improve their models.
B) Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs? What is the latest key research on this?
Certainly the models have orders of magnitude more data available to them than the smartest human being who ever lived does/did. So we can assume that if the goal is "merely" superhuman intelligence, data is not a problem.
It might be a constraint on the evolution of godlike intelligence, or AGI. But at that point we're so far out in bong-hit territory that it will be impossible to say who's right or wrong about what's coming.
Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs?
My understanding (which might be incorrect) is that this amounts to RLHF without the HF part, and is basically how DeepSeek-R1 was trained. I recall reading about OpenAI being butthurt^H^H^H^H^H^H^H^H concerned that their API might have been abused by the Chinese to train their own model.
1 reply →
The thing is, people claimed already a year or two ago that we'd reached peak data and progress would stall since there was no more high-quality human-written text available. Turns out they were wrong, and if anything progress accelerated.
The progress has come from all kinds of things. Better distillation of huge models to small ones. Tool use. Synthetic data (which is not leading to model collapse like theorized). Reinforcement learning.
I don't know exactly where the progress over the next year will be coming from, but it seems hard to believe that we'll just suddenly hit a wall on all of these methods at the same time and discover no new techniques. If progress had slowed down over the last year the wall being near would be a reasonable hypothesis, but it hasn't.
I'm loving it, can't wait to deploy this stuff locally. The mainframe will be replaced by commodity hardware, OpenAI will stare down the path of IBM unless they reinvent themselves.
> people claimed already a year or two ago that we'd reached peak data and progress would stall
The claim was we've reached peak data (which, yes we did) and that progress would have to come from some new models or changes. Everything you described has made incremental changes, not step changes. Incremental changes are effectively stalled progress. Even this model has no proof and no release behind it
there is also huge realm of private/commercial data which is not absorbed by LLMs yet. I think there are way more private/commercial data than public data.
We're so far from peak data that we've barely even scratched the surface, IMO.
What changed from this announcement?
> “We’ve achieved peak data and there’ll be no more,” OpenAI’s former chief scientist told a crowd of AI researchers.