Comment by neilv
16 hours ago
Can demonstrable ignoring of robots.txt help the cases of copyright infringement lawsuits against the "AI" companies, their partners, and customers?
16 hours ago
Can demonstrable ignoring of robots.txt help the cases of copyright infringement lawsuits against the "AI" companies, their partners, and customers?
Probably not copyright infringement. But it is probably (hopefully?) a violation of CFAA, both because it is effectively DDoSing you, and they are ignoring robots.txt.
Maybe worth contacting law enforcement?
Although it might not actually be Amazon.
Big thing worth asking here. Depending on what 'amazon' means here (i.e. known to be Amazon specific IPs vs Cloud IPs) it could just be someone running a crawler on AWS.
Or, folks failing the 'shared security model' of AWS and their stuff is compromised with botnets running on AWS.
Or, folks that are quasi-spoofing 'AmazonBot' because they think it will have a better not-block rate than anonymous or other requests...
From the information in the post, it sounds like the last one to me. That is, someone else spoofing an Amazonbot user agent. But it could potentially be all three.
On what legal basis?
In the UK, the Computer Misuse Act applies if:
* There is knowledge that the intended access was unauthorised
* There is an intention to secure access to any program or data held in a computer
I imagine US law has similar definitions of unauthorized access?
`robots.txt` is the universal standard for defining what is unauthorised access for bots. No programmer could argue they aren't aware of this, and ignoring it, for me personally, is enough to show knowledge that the intended access was unauthorised. Is that enough for a court? Not a goddamn clue. Maybe we need to find out.
> `robots.txt` is the universal standard
Quite the assumption, you just upset a bunch of alien species.
5 replies →
robots.txt isn't a standard. It is a suggestion, and not legally binding AFAIK. In US law at least a bot scraping a site doesn't involve a human being and therefore the TOS do not constitute a contract. According to the Robotstxt organization itself: “There is no law stating that /robots.txt must be obeyed, nor does it constitute a binding contract between site owner and user, but having a /robots.txt can be relevant in legal cases.”
The last part basically means the robots.txt file can be circumstantial evidence of intent, but there needs to be other factors at the heart of the case.
I wind up in jail for ten years if I download an episode of iCarly; Sam Altman inhales every last byte on the internet and gets a ticker tape parade. Make it make sense.
Terms of use contract violation?
Robots.txt is completely irrelevant. TOU/TOS are also irrelevant unless you restrict access to only those who have agreed to terms.
good thought but zippy chance this holds up in Court