Comment by barbazoo
9 hours ago
> Using unowned training data (e.g., celebrity faces, copyrighted art)
How would one ever know that the GenAI output is not influenced or based on copyrighted content.
9 hours ago
> Using unowned training data (e.g., celebrity faces, copyrighted art)
How would one ever know that the GenAI output is not influenced or based on copyrighted content.
Getty and Adobe offer models that were trained only on images that they have the rights to. Those models might meet Netflix’s standards?
I kind of wonder if that even works.
If you take a model trained on Getty and ask it for Indiana Jones or Harry Potter, what does it give you? These things are popular enough that it's likely to be present in any large set of training data, either erroneously or because some specific works incorporated them in a way that was licensed or fair use for those particular works even if it isn't in general.
And then when it conjures something like that by description rather than by name, how are you any better off than something trained from random social media? It's not like you get to make unlicensed AI India Jones derivatives just because Getty has a photo of Harrison Ford.
I work in this space. In traditional diffusion-based regimes (paired image and text), one can absolutely check the text to remove all occurrences of Indiana Jones. Likewise, Adobe Stock has content moderation that ensures (up to human moderation limit) no dirty content. It is a world without Indiana Jones to the model
3 replies →
It comes down to who is liable for the edge cases, I suspect. Adobe will compensate the end user if they get sued for using a Firefly-generated image (probably up to some limit).
Getting sued occasionally is a cost of doing business in some industries. It’s about risk mitigation rather than risk elimination.
2 replies →
Adobe Firefly absolutely has a spider man problem.
I think it would be very, very difficult - almost impossible - to create a dataset to train an image generator that doesn't contain any copyrighted material that you don't have the rights to. There's the obvious stuff like Mickey Mouse or Superman, you just run some other tool over it to filter them out, but there are so many ridiculous things that can be copyrighted (depictions of buildings, tattoos), things like crowd shots, pictures of cities that have ads in the background, that I don't know how you could do it. I'm sure even Adobe's stock library would have a lot of violations like that.
Whistleblowers, corporate leaks, output resembling copyrighted content etc. Basically it feels it's the same as the companies who unlawfully use licensed code as their own (e.g. without respecting GPL license)
Netflix could also use or provide their own TV/movie productions as training data.
Lionsgate tried that and found that even their entire archive wasn't nearly enough to produce a useful model: https://www.thewrap.com/lionsgate-runway-ai-deal-ip-model-co... and https://futurism.com/artificial-intelligence/lionsgate-movie...
This amuses me.
Consumers have long wanted a single place to access all content. Netflix was probably the closest that ever got, and even then it had regional difficulties. As competitors rose, they stopped licensing their content to netflix, and netflix is now arguably just another face in the crowd.
Now they want to go and leverage AI to produce more content and bam, stung by the same bee. No one is going to license their content for training, if the results of that training will be used in perpetuity. They will want a permanent cut. Which means they either need to support fair use, or more likely, they will all put up a big wall and suck eggs.
Maybe now all that product placement is finally coming back to haunt them.