Comment by scribu
3 months ago
Would be curious to know how this stacks up against Coconut [1] which also uses latent space for reasoning.
3 months ago
Would be curious to know how this stacks up against Coconut [1] which also uses latent space for reasoning.
Definitely curious, this looks very similar to Coconut, even down to the CoT encoding process in Figure 2. They go into a lot more detail though, seems like parallel innovation.
I wonder whether even those models which emit thinking tokens in reality do most of the work within the latent space so the difference is only superficial
I'm behind on reading but don't all models use continuous embeddings to represent reasoning?
I believe the "continuous" in Coconut means that the CoT is in the continuous latent space, instead of being on output tokens (see Fig. 1).