Comment by freetime2
1 day ago
I'm aware of the fact that LLMs can reproduce IP used in training data, and consider the example NYT article in your link to be "a very cut-and-dry case" of copyright infringment. And commercial AI companies especially should be held liable for damages if they can't or won't implement effective guardrails to prevent this from happening.
I'm somewhat optimistic this problem can be solved, though, with filters and usage policies. YouTube, another platform with basically unlimited potential for copyright infringement, has managed to implement a system that is good enough at preventing infringement to keep lawsuits at bay.
It's also not clear if that's what Yomiuri Shimbun is alleging here. In their 2023 "Opinion on the Use of News Content by Generative AI" [1] they give this example:
> Newspaper companies have long provided databases containing past newspaper pages and articles for a fee, and in recent years, they have also sold article data for AI development. If AI imports large quantities of articles, photos, images, and other data from news organizations’ digital news sites without permission, commercial AI services for third parties developing it could conflict with the existing database sales market and “unreasonably prejudice the interests of the copyright owner” (Article 30-4 of the Act). Also, even if all or part of a particular article communicates nothing further than facts and hardly constitutes a copyright, many contents deserve legal protection because of the effort and cost invested by the newspaper companies. Even if an AI collects and uses only the factual part, it does not mean it will always be legal.
So basically arguing that 2018 amendment which allows the use of copyrighted works to train AI models without permission from the copyright holder is not applicable because the use would "would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation". [2]
... which I think is a much more nuanced argument. I don't think we can just lump all of these cases together and say "it's infringement" or "it's fair use" without actually considering the details in each case. Or the specific laws in each country.
No comments yet
Contribute on Hacker News ↗