Comment by cakealert
2 months ago
With a pretrained LLM most of the work is done. RL just steers the model into a 'thinking' mode. There is enough signal for that to work and for the inefficiency to not matter.
The downside is that you are limiting the model to think in the same language it outputs. An argument could be made that this is not how all humans think. I know that I rarely think in language or even images, just concepts (probably isn't even the right word) mix and transform and often I don't even bother to make the transformation to language at the end, just action.
I strongly agree; in fact I think what best matches the thought process is something like the multiset tree/forest workspace approach as suggested by Marcolli, Chomsky, and Berwick - a Hopf algebra that can be externalized into (non-planar) embeddings of linearized strings, or alternately into semantic manifolds.