Comment by manmal
5 days ago
I have no proof, but these deep thinking modes feel to me like an orchestrator agent + sub agents, the former being RL‘d to just keep going instead of being conditioned to stop ASAP.
5 days ago
I have no proof, but these deep thinking modes feel to me like an orchestrator agent + sub agents, the former being RL‘d to just keep going instead of being conditioned to stop ASAP.
No comments yet
Contribute on Hacker News ↗