Comment by marginalia_nu
2 years ago
While I've made huge improvements to the algo recently, I do think Marginalia Search got a bit lucky with the sample queries, as it is still IMO far more hit and miss than many alternatives, but that also speaks for how hard evaluating search quality is.
Its efficacy is also strongly dependent on understanding that it's a keyword search engine with no semantic understanding.
> Its efficacy is also strongly dependent on understanding that it's a keyword search engine with no semantic understanding.
Good. I love keyword search.
"Semantic understanding" can be so biased and ... just shady sometimes.
It's tricky though. I think a lot of people think they want raw keyword search, but what they really want is a search experience that makes intuitive sense.
If you lean too much into embeddings and so on, it's easy to get errors that don't make sense to a human being. It's extremely frustrating when you experience "I typed X, why am I getting results about Y?!"
That said, I think there's a sweet spot with some magic, where it genuinely just makes search better. But it's like perfume, if it's immediately obvious that it's there, it's probably a fair bit too much.
Keyword search leads to things like every website would put meaningless words in the meta section of their website so it would be picked by Altavista
No, if it's done right. Source: I made my own search engine
> [...] but that also speaks for how hard evaluating search quality is.
Would you be able to share some of your personal highlights regarding this?
I've partially kept up-to-date with the DIY, non-corporate search space (YaCY and friends). I'd love to understand a bit more behind the engineering decisions made when creating a search engine; it seems like a very hard problem to solve.
P.S. Marginalia is a very impressive piece of work, overall -- I've heard nothing but positive remarks from users on here. I've been meaning to try it for a while, but time constraints have... well, constrained, thus far.
I just tested Mariginalia and it was completely unable to lead me to a Wikipedia or imdb page when searching for "driver ryan gosling" and variations. It just listed lots of random articles.
That.. is kind of the point of this particular search engine.
> This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed.
5 replies →
Honestly I understand it well enough that I see it is surprisingly hard, but not enough to have good solutions...
Just my feedback after trying to finally get to what it is exactly.
I tried to find marginalia on DDG, not on the first page. Google has it after some garbage. If I go to marginalia.nu I get a SSL error. search.marginalia.nu works
If i search on marginalia for duckduckgo there first link is somewhat relevant but is about the app, all the other links are related to DDG but of curious relevance.
If I search for ublacklist mentioned above, I do not see anything directly relevant.
Hmm, what's your browser? I renewed the cert today... Only thing I can think of is that it might not like a wildcard cert for the bare marginalia.nu domain.
Safari doesn't like https://marginalia.nu. Probably because *.marginalia.nu is not valid for the base domain. Add it as a Subject Alt Name
2 replies →
Hi, your encyclopedia experiment(?) is also very inspiring. I really think it works, it makes it much easier to read the articles.
1 reply →
Firefox android
2 replies →
I notice you completely avoid the question on how a single developer can do so well ;)
I do think that search has gotten much worse but my ability to know the magic words like “ublock origin” instead of “Adblock” and “yt-dlp” instead of “download YouTube” and phrase my search has gotten better.
We’ve all been doing prompt engineering against the Internet-wide LLM that is the spam houses.
> I notice you completely avoid the question on how a single developer can do so well ;)
As much as I enjoy the notion of somehow being a 10,000X developer, it's probably mostly that modern search is a filtering problem, and MS does filtering fairly well.