Comment by saaaaaam
4 days ago
Sorry, I don’t understand the point you’re making. I know that these are publicly available - the point I was making, drawing off the parent comment, is that where it has been deemed fair use in copyright to use books to train LLMs when the content has been legitimately obtained then a similar assessment might apply for this sort of ingestion.
If content is publicly available that does not necessarily mean it’s free of copyright control: the justification for using the reviews to train an LLM would be based on the fact that fair use means it is not an infringement of copyright. But if the publisher has terms that forbid scraping then that may mean the fair use argument is undermined if it is precedent in the content being legitimately obtained. I’m not a lawyer but it’s quite easy to see how “books can be used for LLM training under fair use but not if you pirate them” extends to “content on the web can be used for LLM training under fair use but not if you’ve breached the terms set out by the publisher”.
No comments yet
Contribute on Hacker News ↗