Comment by benreesman
2 years ago
word2vec is a very cool result, but suggestive at best for how a modern LLM works. in fact the king + country - queen + capital vector Mary you’re referring to is pretty much a direct consequence of excluding nonlinear ties from the SGD-driven matrix factorization process.
not everything done via SGD is a ReLU :)
No comments yet
Contribute on Hacker News ↗