Comment by majormajor
2 hours ago
> Complicated-enough LLMs also are aboslutely doing a lot more than "just trying to predict the next word", as Anthropic's papers investigating the internals of trained models show - there's a lot more decision-making going on than that.
Are there newer changes that are actually doing prediction of tokens out of order or such, or are this a case of immense internal model state tracking but still using it to drive the prediction of a next token, one at a time?
(Wrapped in a variety of tooling/prompts/meta-prompts to further shape what sorts of paragraphs are produced compared to ye olden days of the gpt3 chat completion api.)
No comments yet
Contribute on Hacker News ↗