Comment by ryougi
6 hours ago
>it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship
Do you, or anyone reading, have any worthwhile links that make a strong case for this (that there is a stronger semantic relationship than simply next token prediction)? I would like to read more about this.
No comments yet
Contribute on Hacker News ↗