Comment by ACCount37

1 month ago

No. There are no architectural changes and no "second runtime learning algorithm". There's just the good old in-context learning that all LLMs get from pre-training. RLVR is a training stage that pressures the LLM to take advantage of it on real tasks.

"Runtime continual learning algorithm" is an elusive target of questionable desirability - given that we already have in-context learning, and "get better at SFT and RLVR lmao" is relatively simple to pull off and gives kickass gains in the here and now.

I see no reason why "behave in a more human/animal-like self-motivated agentic fashion" can't be obtained from more RLVR, if that's what you want to train your LLMs for.

7 comments

ACCount37

HarHarVeryFunny 1 month ago

I'm not sure what you are saying. There are LLMs as exist today, and there are any number of changes one could propose to make to them.

The less you change, the more they stay the same. If you just add "more" RLVR (perhaps for a new domain - maybe chemistry vs math or programming?), then all you will get is an LLM that is better at acing chemistry reasoning benchmarks.

ACCount37 1 month ago
I'm saying that the kind of changes you propose aren't made by anyone, and might generally not be worth making. Because "better RLVR" is an easier and better pathway to actual cross-domain performance gains.
If you could stabilize the kind of mess you want to make, you could put that effort into better RL objectives and get more return.
- HarHarVeryFunny 1 month ago
  
  The mainstream LLM crowd aren't making these sorts of major changes yet, although some like DeepMind (the OG pushers of RL for AGI!) do acknowledge that a few more "transformer level" breakthoughs are necessary to reach what THEY are calling AGI, and others like LeCun are calling for more animal-like architectures.
  Anyways, regardless of who is currently trying to move beyond LLMs or not, it should be pretty obvious what the problems are with trying to apply RL more generally, and what that would result in if successful, if that were the only change you made.
  LLMs still have room to get better, but they will forever be LLMs, not brains, unless someone puts in the work to make that happen.
  You started this thread talking about "AGI", without defining what you meant by that, and are now instead talking about "cross-domain performance gains". This is exactly why it makes no sense to talk about AGI without defining what you mean by it, since I think we talking about completely different things.
  
  4 replies →