Comment by james2doyle

2 months ago

This is just using robots.txt and asking "pretty please, don’t scrape me".

Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...

If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.

5 comments

james2doyle

Aeolun 2 months ago

If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?

kentm 2 months ago

Respect the robots.txt and don’t do it?

cpncrunch 2 months ago

Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.

albedoa 2 months ago
Oh for sure. When he wrote of the AI companies that are "stealing/crawling/hammering", you thought he meant the legitimate ones that do honor robots.txt. That makes sense.
- cpncrunch 2 months ago
  
  Actually, it looks like all the major ones do honour robots.txt including perplexity. They seemingly get around it using google serps, so theyre not actually crawling or hammering the site servers (or even cloudflare).
  https://www.ailawandpolicy.com/2025/10/anti-circumvention-re...