← Back to context

Comment by marcinzm

2 years ago

How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.

Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

  • > Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

    So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.