Comment by hootz

10 hours ago

That's why Kagi is the only subscription I don't actively think about cancelling. For the love of god, keep me away from Google and all of THAT. If Kagi goes down the same path, I'll selfhost something or just return to monkey and use link indexes and the favorites list + the native search of websites.

The boss man got a few of us Kagi gift-subscriptions/credits earlier this year, after we've been taking about wanting to try it. Before that I used Ecosia, which I also considered pretty good, but Kagi and everything else it just night and day.

I've been pretty sceptical about Kagi, feeling that it was a bit to expensive and perhaps just relying on other companies indexes to much and I spend to much time looking at how many searches I had left. After getting the subscription I just don't want to go back, the price is perfectly reasonable for the value. Being able to just search again and not sort through junk and spam and ads and just getting the pages I want and need is amazing.

Honestly it's a slightly weird feeling to look a the results from Kagi and notice it found exactly what you where looking for.

Once my gifted credits run out, that is going to be an easy renewal for me. I do not want to go back, even if I think Ecosia is a good option.

  • It's amazing how clear the manipulation and enshittification of Google's results are when you search the same thing with Kagi or even just another random search engine. Ecosia seems cool too, will keep an eye on it in case anything happens with Kagi.

Self hosting a web search engine is probably quite a feat

  • It's actually not that hard now, once you get useful content. When I worked on Search (~2009ish), the primary index was called 4BBase, because it was the top 4 billion webpages (actually more like 5.5B during my time, but it had been around for a few years). A typical webpage is about 100K, and HTML compresses at 80-90% compression rates, so you're looking at 10-20K/page. The index would take about 50-100 TB.

    Even after the recent AI run-up, disk prices are about $20/TB for a 20TB, so you can store this index on 3-5 hard disks that will cost you about $1200-2000. For self-hosted use you don't need to serve them in 50ms, so you don't need to put the whole thing in RAM like Google did, you can serve off of disk.

    ElasticSearch uses basically the same data structures and gives you the same infrastructure that Google's ~late-00s search stack did, and is actually more advanced in some respects (like ad-hoc queries, debuggability, and updateability), so software isn't much of an issue.

    The big part missing that can't really be replicated today is the huge web of authentic hyperlinks. The reason Google was so good at search was because many humans effectively "tagged" a given webpage with a series of short, descriptive words and phrases. When they went to search for a page, Google could mine this huge treasure trove of backlinks to identify exactly what the page was good for, even if those search terms never appeared on the page. SEO and link farms kinda killed this, as did the rise of social media walled gardens, and so the Google of 2009 basically wouldn't work today anyway. Maybe if you pulled old versions of Common Crawl or archive.org you could reconstruct it, but the relevant pages are often offline anyway today.

    • If an ex Googler compares Elastic Search to the old company then it mustbe something good.

  • You can self-host Marginalia [1] or Hister [2], for example. Takes up some space, but it's totally doable. Your biggest problem (assuming you have disk space) will be crawling.

    [1] : https://github.com/MarginaliaSearch/MarginaliaSearch

    [2] : https://github.com/asciimoo/hister

    • Emphasis on "doable".

      At least if we're speaking a more generalist web search it requires dedicated hardware, that's pretty costly. Marginalia's production server cost about $20k back when RAM and SSDs were cheap. It used to run on $5k of PC hardware before, but that was very limiting.

      So no data center, but at the same time, not everyone has that sort of cash to throw around.