← Back to context

Comment by Blackthorn

2 days ago

If it means it makes your own content safe when you deploy it on a corner of your website: mission accomplished!

>If it means it makes your own content safe

Not really? As mentioned by others, such tarpits are easily mitigated by using a priority queue. For instance, crawlers can prioritize external links over internal links, which means if your blog post makes it to HN, it'll get crawled ahead of the tarpit. If it's discoverable and readable by actual humans, AI bots will be able to scrape it.

[flagged]

  • You've got to be seriously AI-drunk to equate letting your site be crawled by commercial scrapers with "contributing to humanity".

    Maybe you don't want your your stuff to get thrown into the latest silicon valley commercial operation without getting paid for it. That seems like a valid position to take. Or maybe you just don't want Claude's ridiculously badly behaved scraper to chew through your entire budget.

    Regardless, scrapers that don't follow the rules like robots.txt pretty quickly will discover why those rules exist in the first place as they receive increasing amounts of garbage.