← Back to context

Comment by omnimus

6 days ago

It’s more question of how it does what it does. By making statistical model out of work of humans that it now aims to replace.

I think graphic designers would be a lot less angry if AIs were trained on licensed work… thats how the system worked up until now after all.

I don't think most artists would be any less angry & scared if AI was trained on licensed work. The rhetoric would just shift from mostly "they're breashing copyright!" to more of the "machine art is soulless and lacks true human creativity!" line.

I have a lot of artist friends but I still appreciate that diffusion models are (and will be with further refinement) incredibly useful tools.

What we're seeing is just the commoditisation of an industry in the same way that we have many, many times before through the industrial era, etc.

  • It actually doesn't matter how would they feel. In currently accepted copyright framework if the works were licensed they couldn't do much about it. But right now they can be upset because suddenly new normal is massive copyright violation. It's very clear that without the massive amount of unlicensed work the LLMs simply wouldn't work well. The AI industry is just trying to run with it hoping nobody will notice.

    • It isn’t clear at all that there’s any infringement going on at all, except in cases where AI output reproduces copyrighted content or content that is sufficiently close to copyrighted content to constitute a derivative work. For example, if you told an LLM to write a Harry Potter fanfic, that would be infringement - fanfics are actually infringing derivative works that usually get a pass because nobody wants to sue their fanbase.

      It’s very unlikely simply training an LLM on “unlicensed” work constitutes infringement. It could possibly be that the model itself, when published, would represent a derivative work, but it’s unlikely that most output would be unless specifically prompted to be.

      8 replies →

I get where you're coming from, but given that LLMs are trained on every available written word regardless of license, there's no meaningful distinction. Companies training LLMs for programming and writing show the same disregard for copyright as they do for graphic design. Therefore, graphic designers aren't owed special consideration that the author is unwilling to extend to anybody else.

  • Of course i think the same about text, code, sound or any other LLMs output. The author is wrong if they are unwilling to give same measure to everything. The fact this is new normal now for everything does not make it right.

FWIW Adobe makes a lot of noise about how their specific models were indeed trained on only licensed work. Not sure if that really matters however

  • Yes Adobe and Shutterstock/Getty might be in position to do this.

    But there is a reason why nobody cares about Adobe AI and everybody uses midjourney…

I like this argument, but it does somewhat apply to software development as well! The only real difference is that the bulk of the "licensed work" the LLMs are consuming to learn to generate code happened to use some open source license that didn't specifically exclude use of the code as training data for an AI.

For some of the free-er licenses this might mostly be just a lack-of-attribution issue, but in the case of some stronger licenses like GPL/AGPL, I'd argue that training a commercial AI codegen tool (which is then used to generate commercial closed-source code) on licensed code is against the spirit of the license, even if it's not against the letter of the license (probably mostly because the license authors didn't predict this future we live in).