Comment by antirez

5 months ago

Cool, but isn't this encoding a potentially very long thinking process into a fixed embedding? Intuitively should not work as well.

3 comments

antirez

pona-a 5 months ago

But don't words have a fixed size embedding? Causal models create a sequence of attended word embeddings. It's only about 300 dimensions per one word, so it seems counterintuitive you can compress an entire reasoning chain into such a small vector.

pishpash 5 months ago

That's already the case with visible text. There's an embedding inside the model as it spits out the next token.

antirez 5 months ago

Sure, but you have multiple thoughts tokens in the context the model sees to process the next token.