Comment by marginalia_nu
2 days ago
You limit the crawl time or number of requests per domain for all domains, and set the limit proportional to how important the domain is.
There's a ton of these types of of things online, you can't e.g. exhaustively crawl every wikipedia mirror someone's put online.
No comments yet
Contribute on Hacker News ↗