Comment by pona-a
3 months ago
But don't words have a fixed size embedding? Causal models create a sequence of attended word embeddings. It's only about 300 dimensions per one word, so it seems counterintuitive you can compress an entire reasoning chain into such a small vector.
No comments yet
Contribute on Hacker News ↗