← Back to context

Comment by direwolf20

18 hours ago

I hope they cache search results to further reduce the number of calls to Google.

And Marginalia Search was not mentioned? Marginalia Search says they are licensing their index to Kagi. Perhaps it's counted under "Our own small-web index" which is highly misleading if true.

There is a practical limit that we can't cache results for too long; Search engine users are particularly sensitive to stale data, especially around current events. Without a holistic and realiable way to know when the cache ought to be invalidated, our caching is mostly focused on mitigating "abuse", e.g., someone / bunch of people spamming the same search in a short timespan; no sense in repeating all those upstream calls.

Most "cost saving engineering" is involved in finding cases/hueristics where we only need to use a subset of sources and omitting calls in the first place, without compromising quality. For example, we probably don't need to fire all of our sources to service a query like "youtube" or "facebook".

Marginalia data is physically consolidated into the same infra that we use for small web results in our SERP, but also among other small scale sources besides those two. That line is simply referring directly to https://kagi.com/smallweb (https://github.com/kagisearch/smallweb).

  • To me, a lot of problems with "building a search engine" don't seem to be problems with "building a search engine," they seem to be problems with "building a Google."

    Nobody said a search engine needs to have fresh data, for example. Nor has anybody said a search engine needs to index the entire web. Yet these are two things every search engine tries to do, and then they usually fail to compare with Google.

    To put it in another way, the reason why TikTok succeeded against Youtube is exactly because TikTok wasn't trying to be a Youtube.

    • I don't think TikTok "succeeded" compared to Youtube? TikTok succeeded in popularizing short-form video, but I'd argue that's a different product. YouTube is still king for longform video.

      While there might be arguments for building a different product (and LLM-based search like Perplexity is trying it), there appears to be enough demand for a "good Google" that Kagi is trying to address.

The index is not necessarily the code, but the dataset. IMO it would be better to be more open about the technical stack, but I don't think this feels dishonest to me.