Comment by scribu

3 months ago

Would be curious to know how this stacks up against Coconut [1] which also uses latent space for reasoning.

[1] https://arxiv.org/abs/2412.06769

Definitely curious, this looks very similar to Coconut, even down to the CoT encoding process in Figure 2. They go into a lot more detail though, seems like parallel innovation.

I wonder whether even those models which emit thinking tokens in reality do most of the work within the latent space so the difference is only superficial

I'm behind on reading but don't all models use continuous embeddings to represent reasoning?

  • I believe the "continuous" in Coconut means that the CoT is in the continuous latent space, instead of being on output tokens (see Fig. 1).