Comment by zahlman
4 hours ago
Does this concept of "personal blog" include people periodically sharing, say, random knowledge on technical topics? Or is it specifically people writing about their day-to-day lives?
How would I check if my site is included?
You can check: <https://github.com/kagisearch/smallweb/blob/main/smallweb.tx...>. I can see that your RSS URL is listed there.
But it currently does not appear in the search results here: <https://kagi.com/smallweb/?search=zahlman>. The reason appears to be this:
"If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site: • Blog has recent posts (<7 days old) [...]"
(Source: https://github.com/kagisearch/smallweb#criteria-for-posts-to...)
Why would you only include blogs in your small web index? That must be a minute fraction of what is out there?
I can't think of a single blog that I read these days (small or not), yet there are loads of small "old school" sites out there that are still going strong.
> Why would you only include blogs in your small web index?
I am not associated with this project, so this would be a question for the project maintainer. As far as I understand, the project relies on RSS/Atom feeds to fetch new posts and display them in the search results. I believe, this is an easier problem to solve than using a full blown web crawler.
However, as far as I know, Kagi does have its own full blown crawler, so I am not entirely sure why they could not use it to present the Small Web search results. Perhaps they rely on date metadata in RSS feeds to determine whether a post was published within the last seven days? But having worked on an open source web crawler myself, many years ago, I know that this is something a web crawler can determine too if it is crawling frequently enough.
So yes, I think you have got a good point and only the project maintainer can provide a definitive answer.