Comment by skybrian

2 months ago

As far as I know, Google respects robots.txt and doesn't obfuscate their crawlers, so you can easily block them if you want. It seems like an important distinction?

11 comments

skybrian

Nextgrid 2 months ago

Google can afford to respect robots.txt because it has a monopoly on search and nobody would consider actually blocking them in said robots.txt anyway.

SerpApi doesn't have that privilege.

skybrian 2 months ago

Some domains do block Google, often partially. There are some statistics here:
https://radar.cloudflare.com/ai-insights#ai-user-agents-foun...
xnx 2 months ago

Google has respected robots.txt from the start.
bitpush 2 months ago
but SerpApi is not scraping websites, it is sending malicoius requests to google.com.
- Nextgrid 2 months ago
  
  SerpApi is scraping Google. The "maliciousness" if the requests is a matter of perspective. Of course Google considers it malicious; that doesn't necessarily make it true.

throw-12-16 2 months ago

robots.txt is not a legally binding document, nobody needs to actually respect it

immibis 2 months ago

There's no law that says you have to do that. It used to be a sensible thing to do, in the early internet. In the current internet, obeying robots.txt is a self-handicap and you shouldn't do it.

DDoS remains illegal regardless of robots.txt.

skybrian 2 months ago
It's rather odd to use words like "should" when you're advocating for disrespecting other people's wishes. There are sometimes reasons not to cooperate, but it seems like a good default.
- immibis 2 months ago
  
  The web is now hostile. If you're starting a search engine, everyone else has written a robots.txt that bans you from starting a search engine. You either ignore that, or you abandon your plan to make a search engine.
  
  2 replies →