← Back to context

Comment by majorchord

3 days ago

> AI companies use residential proxies

Source:

Source: Cloudflare

https://blog.cloudflare.com/perplexity-is-using-stealth-unde...

Perplexity's defense is that they're not doing it for training/KB building crawls but for answering dynamic queries calls and this is apparently better.

  • I do not see the words "residential" or "proxy" anywhere in that article... or any other text that might imply they are using those things. And personally... I don't trust crimeflare at all. I think they and their MITM-as-a-service has done even more/lasting damage to the global Internet and user privacy in general than all AI/LLMs combined.

    However, if this information is accurate... perhaps site owners should allow AI/bot user agents but respond with different content (or maybe a 404?) instead, to try to prevent it from making multiple requests with different UAs.

    • I had 500,000 residential IPs make 1-4 requests each in the past couple of days.

      These had the same user agent (latest Safari), but previously the agent has been varied.

      Blocking this shit is much more complicated than any blocking necessary before 2024.

      The data is available for free download in bulk (it's a university) and this is advertised in several places, including the 429 response, the HTML source and the API documentation, but the AI people ignore this.

  • Well yes it is better. It's a page load triggered by a user for their own processing.

    If web security worked a little differently, the requests would likely come from the user's browser.