Comment by nerdponx

19 days ago

The difference is that Word2Vec "learned" these relationships auto-magically from the patterns in the surrounding words in the context in which they appear in written text. Don't forget that this was a revolutionary result at the time, and the actual techniques involved were novel. Word2Vec is the foundation of modern LLMs in many ways.

I can't edit my own post but there are two other big differences between the Prolog example and the Word2Vec example.

1. The W2V example is approximate. Not "fuzzy" in the sense of fuzzy logic. I mean that Man Woman Queen King are all essentially just arrows pointing in different directions (in a high dimensional space). Summing vectors is like averaging their angles. So subtracting "King - Man" is a kind of anti-average, and "King - Man + Woman" then averages that intermediate thing with "Woman", which just so happens to yield a direction very close to that of "Queen". This is, again, entirely emergent from the algorithm and the training data. It's also probably a non-representative cherry picked example, but other commenters have gone into detail about that and it's not the point I'm trying to make.

2. In addition to requiring hand-crafted rules, any old school logic programming system has to go through some kind of a unification or backtracking algorithm to obtain a solution. Meanwhile here we have vector arithmetic, which is probably one of the fastest things you can do on modern computing hardware, not to mention being linear in time and space. Not a big deal in this example, could be quite a big deal in bigger applications.

And yes you could have some kind of ML/AI thing emit a Prolog program or equivalent but again that's a totally different topic.