Comment by james2doyle

2 months ago

This is just using robots.txt and asking "pretty please, don’t scrape me".

Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...

If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.

If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?

Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.