Comment by froobius
3 months ago
(Just to expand on that, it's true not just the for the first token. There's a lot of computation, including potentially planning ahead, before each token outputted.)
That's why saying "it's just predicting the next word", is a misguided take.
No comments yet
Contribute on Hacker News ↗