← Back to context

Comment by NitpickLawyer

4 days ago

> now that we've reached peak data?

A) that's not clear

B) now we have "reasoning" models that can be used to analyse the data, create n rollouts for each data piece, and "argue" for / against / neutral on every piece of data going into the model. Imagine having every page of a "short story book" + 10 best "how to write" books, and do n x n on them. Huge compute, but basically infinite data as well.

We went from "a bunch of data" to "even more data" to "basically everything we got" to "ok, maybe use a previous model to sort through everything we got and only keep quality data" to "ok, maybe we can augment some data with synthetic datasets from tools etc" to "RL goes brrr" to (point B from above) "let's mix the data with quality sources on best practices".

Are you basically saying synthetic data and having a bunch of models argue with each other to distill the most agreeable of their various outputs solves the issue of peak data?

Because from my vantage point, those have not given step changes in AI utility the way crunching tons of data did. They have only incrementally improved things

A) We are out of the Internet-scale-for-free data. Of course the companies deploying LLM based systems at massive scale are of course ingesting a lot of human data from their users, that they are seeking to use to further improve their models.

B) Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs? What is the latest key research on this?

  • Certainly the models have orders of magnitude more data available to them than the smartest human being who ever lived does/did. So we can assume that if the goal is "merely" superhuman intelligence, data is not a problem.

    It might be a constraint on the evolution of godlike intelligence, or AGI. But at that point we're so far out in bong-hit territory that it will be impossible to say who's right or wrong about what's coming.

    Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs?

    My understanding (which might be incorrect) is that this amounts to RLHF without the HF part, and is basically how DeepSeek-R1 was trained. I recall reading about OpenAI being butthurt^H^H^H^H^H^H^H^H concerned that their API might have been abused by the Chinese to train their own model.

    • Superhuman capability within the tasks that are well represented in the dataset, yes. If one takes the view that intelligence is the ability to solve novel problems (ref F. Chollet), then the amount of data alone might not take us to superhuman intelligence. At least without new breakthroughs in the construction of models or systems.

      R1 managed to replicate a model on the level of one one they had access to. But as far as I know they did not improve on its predictive performance? They did improve in inference time, but that is another thing. The ability to replicate a model is well demonstrated and quite common practice for some years already, see teacher-student distillation.