← Back to context

Comment by dbrereton

3 years ago

> This is definitely very cool as I've been looking for something like this since technorati (which was originally a blog search engine).

Technorati was one of the inspirations here so that's great to hear.

> Would love to hear details about how you created the database, the infrastructure, etc if it's not a trade secret. Kudos on the launch!

Sure, it's actually fairly simple! The search backend itself is running on Typesense [0], which was very quick and easy to setup.

Due to the way ranking is calculated, I can actually avoid doing any real web crawling (though, I may add that in soon to help increase the index size). Ranking is based on submission to online communities, so all I really need is those submissions.

Using the Reddit, HN and Twitter APIs, I search for any submissions related to any blogs in the database, then those submissions give me the post URLs.

Once I have the post URLs, I just need to request those specific URLs to get the post data.

Then there's scripts for things like content extraction, inflation calculation, currency conversion etc.

All of those scripts are in python.

The frontend is a simple React app built with Next. All pages are statically generated.

Let me know if there's any more questions!

[0] https://typesense.org/

Any plans on open-sourcing the code? I'm not sure if your intention is to build a business using it (or, if you were, using AGPLv3 might help prevent third-parties from unfairly competing with you), but I'm sure a number of people would be interested in trying to run this on their own hardware, building their own personal index, hacking on it to add features they find interesting for themselves, or otherwise just learning something by taking a look under the hood (I'm probably in this category myself).

  • This is not a business, and I would like to open source it, but it would probably be better for everyone if I wait until I clean up my garbage code, which will take some time.

I tried searching "Will Smith" and was expecting hundreds of blog posts about his Oscar thing but all results are about programming jobs and joe biden. I even changed the date range to past week but still the same..

Any idea why?

  • Only 900 blogs are currently indexed, and they're mostly tech, business or politics, so I wouldn't be surprised if none of them have written about the will smith situation.

    I am working on increasing the amount of blogs significantly, but please bear with my modest index in the meantime.