Comment by WesolyKubeczek

3 months ago

You should read more. AI companies use residential proxies and mask their user agents with legitimate browser ones, so good luck blocking that.

9 comments

WesolyKubeczek

rnhmjoj 3 months ago

Which companies are we talking about here? In my case the traffic was similar to what was reported here[1]: these are crawlers from Google, OpenAI, Amazon, etc. they are really idiotic in behaviour, but at least report themselves correctly.

[1]: https://pod.geraspora.de/posts/17342163

nemothekid 3 months ago
OpenAI/Anthropic/Perplexity aren't the bad actors here. If they are, they are relatively simply to block - why would you implement an Anubis PoW MITM Proxy, when you could just simply block on UA?
I get the sense many of the bad actors are simply poor copycats that are poorly building LLMs and are scraping the entire web without a care in the world
- rnhmjoj 3 months ago
  
  > why would you implement an Anubis PoW MITM Proxy, when you could just simply block on UA?
  That's in fact what I was asking: I've only seen traffic from these kind of companies and I've easily blocked them without an annoying PoW scheme.
  I have yet to see any of these bad actors and I'm interested in knowing who they actually are.
  
  1 reply →

majorchord 3 months ago

> AI companies use residential proxies

Source:

Macha 3 months ago
Source: Cloudflare
https://blog.cloudflare.com/perplexity-is-using-stealth-unde...
Perplexity's defense is that they're not doing it for training/KB building crawls but for answering dynamic queries calls and this is apparently better.
- ranger_danger 3 months ago
  
  I do not see the words "residential" or "proxy" anywhere in that article... or any other text that might imply they are using those things. And personally... I don't trust crimeflare at all. I think they and their MITM-as-a-service has done even more/lasting damage to the global Internet and user privacy in general than all AI/LLMs combined.
  However, if this information is accurate... perhaps site owners should allow AI/bot user agents but respond with different content (or maybe a 404?) instead, to try to prevent it from making multiple requests with different UAs.
  
  1 reply →
- Dylan16807 3 months ago
  
  Well yes it is better. It's a page load triggered by a user for their own processing.
  If web security worked a little differently, the requests would likely come from the user's browser.