← Back to context

Comment by marcinzm

2 years ago

> Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.