Comment by ACCount37

21 hours ago

By now, I subscribe to "you're just training them wrong".

Pre-training a base model on text datasets teaches that model a lot, but it doesn't teach it to be good at agentic tasks and long horizon tasks.

Which is why there's a capability gap there - the gap companies have to overcome "in post" with things like RLVR.

0 comments

ACCount37

No comments yet