Comment by Animats

1 year ago

It's time for a lawyer letter. See the Computer Fraud and Abuse Act prosecution guidelines.[1] In general, the US Justice Department will not consider any access to open servers that's not clearly an attack to be "unauthorized access". But,

"However, when authorizers later expressly revoke authorization—for example, through unambiguous written cease and desist communications that defendants receive and understand—the Department will consider defendants from that point onward not to be authorized."

So, you get a lawyer to write an "unambiguous cease and desist" letter. You have it delivered to Amazon by either registered mail or a process server, as recommended by the lawyer. Probably both, plus email.

Then you wait and see if Amazon stops.

If they don't stop, you can file a criminal complaint. That will get Amazon's attention.

[1] https://www.justice.gov/jm/jm-9-48000-computer-fraud

19 comments

Animats

Aurornis 1 year ago

> Then you wait and see if Amazon stops.

That’s if the requests are actually coming from Amazon, which seems very unlikely given some of the details in the post (rotating user agents, residential IPs, seemingly not interpreting robots.txt). The Amazon bot should come from known Amazon IP ranges and respect robots.txt. An Amazon engineer confirmed it in another comment: https://news.ycombinator.com/item?id=42751729

The blog post mentions things like changing user agent strings, ignoring robots.txt, and residential IP blocks. If the only thing that matches Amazon is the “AmazonBot” User Agent string but not the IP ranges or behavior then lighting your money on fire would be just as effective as hiring a lawyer to write a letter to Amazon.

sigmoid10 1 year ago
I wonder how the author hasn't reached this conclusion. The official Amazon Crawler docs literally tell you how to distinguish between legit Amazonbots and malicious copycats via DNS lookup: https://developer.amazon.com/amazonbot
- YetAnotherNick 1 year ago
  
  Why would someone copycat amazonbot?
  
  1 reply →

xena 1 year ago

Honestly, I figure that being on the front page of Hacker News like this is more than shame enough to get a human from the common sense department to read and respond to the email I sent politely asking them to stop scraping my git server. If I don't get a response by next Tuesday, I'm getting a lawyer to write a formal cease and desist letter.

Aurornis 1 year ago

Someone from Amazon already responded: https://news.ycombinator.com/item?id=42751729
> If I don't get a response by next Tuesday, I'm getting a lawyer to write a formal cease and desist letter.
Given the details, I wouldn’t waste your money on lawyers unless you have some information other than the user agent string.
gazchop 1 year ago

No one gives a fuck in this industry until someone turns up with bigger lawyers. This is behaviour which is written off with no ethical concerns as ok until that bigger fish comes along.
Really puts me off it.
amarcheschi 1 year ago

It's computer science, nothing changes on corpo side until they get a lawyer letter.
And even then, it's probably not going to be easy
DrBenCarson 1 year ago
Lol you really think an ephemeral HN ranking will make change?
- JadeNB 1 year ago
  
  It did yesterday!
  https://news.ycombinator.com/item?id=42740516
- usefulcat 1 year ago
  
  It's not unheard of. But neither would I count on it.
- xena 1 year ago
  
  There's only one way to find out!

Reason077 1 year ago

Do we need a "robots must respect robots.txt" law?

genter 1 year ago

Corporations expect average people to read and abide by a ten thousand line EULA, yet it's too much work for them to respect a trivially parseable text file.
WhyNotHugo 1 year ago
If we did, bot authors would comply by just changing their User-Agent to something different that’s not expressly forbidden.
(Disallowing * isn’t usually an option since it makes you disappear from search engines).
- danaris 1 year ago
  
  Any such law would absolutely have to include a requirement that all bots use at least some common element in their User-Agent strings to identify themselves as bots.

aitchnyu 1 year ago

Whats a process server in this context?

andreareina 1 year ago

Someone who delivers notice of legal stuff, to provide proof that so-and-so did receive these documents.