← Back to context

Comment by jeremyjh

3 hours ago

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.

RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.