Comment by antirez
3 months ago
Cool, but isn't this encoding a potentially very long thinking process into a fixed embedding? Intuitively should not work as well.
3 months ago
Cool, but isn't this encoding a potentially very long thinking process into a fixed embedding? Intuitively should not work as well.
But don't words have a fixed size embedding? Causal models create a sequence of attended word embeddings. It's only about 300 dimensions per one word, so it seems counterintuitive you can compress an entire reasoning chain into such a small vector.
That's already the case with visible text. There's an embedding inside the model as it spits out the next token.
Sure, but you have multiple thoughts tokens in the context the model sees to process the next token.