Comment by code_biologist
2 months ago
What would a superior control approach be? It's not clear to me how to get an LLM to be an LLM if you're not doing stochastic next token prediction. Given that, the model itself is going to know best how to traverse its own concept space. The R1 chain of thought training encourages and develops exactly that capability. Still, you want that chain of thought to terminate and not navel gaze endlessly.
So how to externally prod it to think more when it does terminate? Replacing thought termination with a linguistic signifier of continued reasoning plus novel realization seems like a charmingly simple, principled, and general approach to continue to traverse concept space.
No comments yet
Contribute on Hacker News ↗