Comment by quickthrower2
2 years ago
> I get more understanding out of "it's a kind of kernel smoothing" than "it's as though an associative array were continuous", that doesn't mean everyone will, or even that many people will. (My educational trajectory was weird.) But the sheer opacity of this literature is I think a real problem. (Cf. Phuong and Hutter 2022.)
As a non ML person but a programmer the key, value, query concepts made more sense to me. But I admit I don’t fully get why it works other than “lots of neurons training on how every combo of tokens relate to each other.
No comments yet
Contribute on Hacker News ↗