Comment by JoshTriplett
6 days ago
> but the plagiarism
This entire section reads like, oddly, the reverse of the "special pleading" argument that I usually see from artists. Instead of "Oh, it's fine for other fields, but for my field it's a horrible plagiarism machine", it's the reverse: "Oh, it's a problem for those other fields, but for my field get over it, you shouldn't care about copyright anyway".
I'm all for eliminating copyright. The day I can ignore the license on every single piece of proprietary software as I see fit, I'll be all for saying that AIs should be able to do the same. What I will continue to complain about is the asymmetry: individual developers don't get to violate individual licenses, but oh, if we have an AI slurp up millions of codebases and ignore their licenses, that's fine.
No. No, it isn't. If you want to ignore copyright, abolish it for everyone. If it still applies to everyone else, it should still apply to AIs. No special exceptions for mass-scale Open Source license violations.
I think where tptacek is right, though, is that if we're going to hold this position without hypocrisy, then we need to respect copyright as long as it exists. He's right that many of us have not done that; it's been very common to violate copyright for mere entertainment. If we want the licenses of our own work to be respected, then we need to extend that respect to others as well, regardless of the size of the copyright holder.
There are things that "modulate" this. Violating copyright is never right, of course, some questions are however scale, and purpose. Taking others' creative output, unlicensed, for large-scale commercial gain, is about the worst.
The whataboutism of that section was odd. The only non-handwavy argument presented is that due to the scale of LLM training that models' output should be treated like US-specific typeface forms' copyright, ie: non-applicable.
It's interesting as typeface plagiarism became rampant beginning in the 70s when more accurate photo reproductions made it trivial. This was problematic for designers wanting to make a livelihood, which is something ITC sought to mitigate by better up-front payments (IIRC from U&lc's coverage) to incentivize quality typeface creation.
There's a distinction though between literal plagiarism and just inspiration from elements. US copyright law doesn't protect either for typeface forms but ironically it does allow copyright for the code used in font files.
I've seen OpenAI's o3-mini (their reasoning model) suggest verbatim code and comments that I found on Github predating LLMs by years. It seems the more times the same code and comment appears online the more likely this is to occur. I'd imagine there would be studies looking into the scope and frequency this occurs and how much is considered fair use.