Comment by jeffreyw128

2 years ago

The issue with traditional search engines is that keyword-first algorithms are extremely gameable.

Try https://search.metaphor.systems - it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

So in the mentioned example of searching for Youtube downloaders, with Metaphor you'll get only Youtube downloaders (https://search.metaphor.systems/search?q=This%20is%20the%20b...)

Full disclosure - I work there :p

How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.

  • Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

    For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

    Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

    • > Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

      So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.

      1 reply →

The first result vtubego.com is a 144MB downloader app. The page contains "Pricing Plans Lorem ipsum dolor sit amet, placerat verterem luptatum phaedrum vis, impetus mandamus id vix fabulas vim." above its 3 paid plans (there is no free plan).

I haven't installed the downloader app, so I'm not sure if it lets me download youtube videos for free.

The second result "ytder.com" is a redirect to "https://poperblocker.com/edge/" which seems to be a browser extension for Microsoft Edge that protects the user from the Holy See. I'm not using Edge and I'm trying to download a Youtube video.

The third result download-video.net says that it can download videos from a list of sites. Youtube is not in the list, but let's try anyway. If you put "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and click "download" you get "500 SyntaxError: Unexpected token '<', ""

At this point I gave up, but please let me know if any of the results work.

This is excellent!

Definitely excited to see how it holds up to daily use.

So far it gave me exactly what I wanted at the top for all of my test queries that were well formed.

As for asking “ignorant” questions both your service and the goog failed where phind gave me an actionable starting point (after a prodding follow up question: https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )

“max-height like css property for top and left”

Unsure if this sort of thing is even a goal of your project, but you won over a new user.

Wish you and your team all the best.

> with Metaphor you'll get only Youtube downloaders

I clicked into the top 5 results, none of them were real youtube downloaders that worked, so I clicked the next 5 results, then I finally got one single (really slow) downloader that worked. 1 out of 10 top results

https://getthatvideo.com/ Is the first result for downloading YouTube videos. Seems super sus (especially since the site doesn’t load).

Auto-prompted to: "Here's a helpful website for downloading YouTube videos:"

Also, this result is horrible:

“What does it mean if someone is not covered in nfl football?”

>it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

What prevents websites from gaming their embedding? Switching to a similarity search doesn't prevent the results from being gamed.

How do you deal with dynamically/contextually generated content? And how about paywalls and login-required content?

  • Do our best at getting the right content.

    For paywalls/login - we play pretty straight, always obey robots.txt, etc.