← Back to context

Comment by what

1 day ago

You expect the developers of a crawler to look at every site they crawl and develop a specialized crawler for them? That’s fine if you’re only crawling a handful of sites, but absolutely insane if you’re crawling the entire web.

Isn't the point of AI that it's good at understanding content written for humans? Why can't the scrapers run the homepage through an LLM to detect that?

I'm also not sure why we should be prioritizing the needs of scraper writers over human users and site operators.

if you are crawling the entire web, you should respect robots.txt and don't fetch anything disallowed. full stop.