← Back to context

Comment by susam

4 hours ago

You can check: <https://github.com/kagisearch/smallweb/blob/main/smallweb.tx...>. I can see that your RSS URL is listed there.

But it currently does not appear in the search results here: <https://kagi.com/smallweb/?search=zahlman>. The reason appears to be this:

"If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site: • Blog has recent posts (<7 days old) [...]"

(Source: https://github.com/kagisearch/smallweb#criteria-for-posts-to...)

Why would you only include blogs in your small web index? That must be a minute fraction of what is out there?

I can't think of a single blog that I read these days (small or not), yet there are loads of small "old school" sites out there that are still going strong.

  • > Why would you only include blogs in your small web index?

    I am not associated with this project, so this would be a question for the project maintainer. As far as I understand, the project relies on RSS/Atom feeds to fetch new posts and display them in the search results. I believe, this is an easier problem to solve than using a full blown web crawler.

    However, as far as I know, Kagi does have its own full blown crawler, so I am not entirely sure why they could not use it to present the Small Web search results. Perhaps they rely on date metadata in RSS feeds to determine whether a post was published within the last seven days? But having worked on an open source web crawler myself, many years ago, I know that this is something a web crawler can determine too if it is crawling frequently enough.

    So yes, I think you have got a good point and only the project maintainer can provide a definitive answer.