Comment by skybrian
2 months ago
As far as I know, Google respects robots.txt and doesn't obfuscate their crawlers, so you can easily block them if you want. It seems like an important distinction?
2 months ago
As far as I know, Google respects robots.txt and doesn't obfuscate their crawlers, so you can easily block them if you want. It seems like an important distinction?
Google can afford to respect robots.txt because it has a monopoly on search and nobody would consider actually blocking them in said robots.txt anyway.
SerpApi doesn't have that privilege.
Some domains do block Google, often partially. There are some statistics here:
https://radar.cloudflare.com/ai-insights#ai-user-agents-foun...
Google has respected robots.txt from the start.
but SerpApi is not scraping websites, it is sending malicoius requests to google.com.
SerpApi is scraping Google. The "maliciousness" if the requests is a matter of perspective. Of course Google considers it malicious; that doesn't necessarily make it true.
robots.txt is not a legally binding document, nobody needs to actually respect it
There's no law that says you have to do that. It used to be a sensible thing to do, in the early internet. In the current internet, obeying robots.txt is a self-handicap and you shouldn't do it.
DDoS remains illegal regardless of robots.txt.
It's rather odd to use words like "should" when you're advocating for disrespecting other people's wishes. There are sometimes reasons not to cooperate, but it seems like a good default.
The web is now hostile. If you're starting a search engine, everyone else has written a robots.txt that bans you from starting a search engine. You either ignore that, or you abandon your plan to make a search engine.
2 replies →