Comment by vessenes
3 months ago
I’ve been thinking a bit about this lately - reasoning in latent space - especially because it looks like that’s what R1-Zero does — the researchers mention that it’s <think> sections switch back and forth between Chinese and English, but the <say> sections are coherent.
The paper raises a few more questions than it answers, though.
Do they hard code a certain set of CoT token types upfront to train on? While the results are good, they are not ‘great’ - other methods seem to provide better outcomes, based on their own charts.
The interpretability does not seem ‘strong’ to me either - they train decoders on latent space encodings by sort of guessing what must be going on based on text prompts.
That said, this is a fairly sweet ‘hack’ in my mind - training hidden layers to do the reasoning. I guess I’m skeptical that it’s the way forward, though. It feels like until your CoT token can specify it needs more thinking time, you’re stuck without extensibility / deep thinking when needed.
Overall, very cool. Probably not “the future”. More research in latent space reasoning would be very welcome.
No comments yet
Contribute on Hacker News ↗