← Back to context

Comment by nedrocks

19 days ago

Years ago I was building a search engine from scratch (back when that was a viable business plan). I was responsible for the crawler.

I built it using a distributed set of 10 machines with each being able to make ~1k queries per second. I generally would distribute domains as disparately as possible to decrease the load on machines.

Inevitably I'd end up crashing someone's site even though we respected robots.txt, rate limited, etc. I still remember the angry mail we'd get and how much we tried to respect it.

18 years later and so much has changed.