← Back to context

Comment by bluGill

2 months ago

Google publishes how to control their bot - with robots.txt. They then obey those instructions. Google also takes some effort to not use all your bandwidth. Google isn't perfect, but they are at least making a "good faith" effort to be nice and this does count in court. Overall most will agree that in general what google does to allow people to find their website is worth the things that google is doing.

You can of course argue a lot of edge cases if you really want. For the most part I want to say "it isn't worth the argument". In some cases I will take your side if I really have to think about it, but in general the system google has been using mostly works and is mostly an acceptable compromise.

But their robots are enabled by default. So it is a form of unsolicited scraping. If I spam millions of email addresses without asking for permission but provide a link to opt-out form, am I the good guy?

  • At this point everyone knows about robots.txt, so if you didn't opt-out that is your own fault. Opting out of everyone at once is easy, and you get fine grained control if you want it.

    Also most people would agree they are fine with being indexed in general. That is different from email spam where people don't want it.

    • Looking at SerpApi clients, looks like most companies would agree they are fine with scraping Google. That is different from having your website content stolen and summarized by AI on Google search, which people don't want.

      3 replies →

Who says robots.txt is legally binding? Where's the Sherman Antitrust analysis?I'm more confused than before.

What's nice about scraping all the content for their own good while killing off websites left and right? Google needs to be sued also.

Along with all the other AI companies out there, the've committed the biggest theft in human history.