Comment by benreesman

2 years ago

word2vec is a very cool result, but suggestive at best for how a modern LLM works. in fact the king + country - queen + capital vector Mary you’re referring to is pretty much a direct consequence of excluding nonlinear ties from the SGD-driven matrix factorization process.

not everything done via SGD is a ReLU :)