← Back to context

Comment by Lerc

4 months ago

I think it is generally accepted that AI should not reproduce works that others have the right to. I think most people developing AI consider it a mode of failure when it does reproduce a copy of its training data.

It remains an open question on who's responsibility it is to not distribute infringing AI works. The developer or the the user of the AI. Legally it is unclear due to a lack of cases providing precedent in such a new situation. Morally I think AI developers do consider it a duty to reduce such behaviour to a minimum, but also believe that the benefits of the AI are significant enough that it would be unreasonable to block access to them because of the existence of failure modes.

When it comes to being "slopped up" which is a weird phrasing in itself, but I gather you are trying to repurpose the term "slop" to add additional pejorative tone to you words. I'm not really a fan of 'slop' as a term for AI output because it is used specifically as a term for AI output. Should it be used as a blanket term for low effort, mass generated content it would be reasonable, but when it seems to apply specifically to AI it carries the implication of prejudice. Choosing to move it to a verb describing input removes all of the meaningful aspect of the term leaving only the prejudice. Just go with "slurped up"

That brings us to what training actually is, Reading. There is no requirement for attribution to read something. There is no requirement for attribution to learn from something. The restrictions on reproduction are there in recognition of your work representing the ideas. The ideas themselves are not copyrightable, This is widely recognised legally and morally. Scholars have written volumes on why this should be the case and how bad it would be if the alternative, a world where people could own ideas themselves, were true. Imagine the wealth imbalance that exist in today's world, now extend that imbalance from money to the very ideas that you use to express yourself.

AI should not reproduce your work by terms you have not agreed to. You have a valid complaint when it does that. My concern is that people appear to be extending their claims to suggest that they control the right to be learned from. That is not true, right, or moral.

> That brings us to what training actually is, Reading. There is no requirement for attribution to read something.

> My concern is that people appear to be extending their claims to suggest that they control the right to be learned from.

Some would claim that training actually is not reading / learning but embedding / encoding. This take creates arguments like the following;

If I were to take his work and gzip it; does that mean I should be able to use it?

Why? Because this is an automated system. You are anthropomorphizing it unwarrantedly.

Not to mention usual copyright arguments like "If I memorize his code and write it on my computer by hand; can I do it now? What if I only remember 90%? 80%? What if I just change variable names?"

This isn't as cut and dry as you make it out to be, in my humble opinion.

  • I think your points were mostly addressed in my post. Your issue is with reproduction. I'm not sure if there is any legal ruling on encoding without reproduction. I would expect hashes to be safe.

    With regard to percentage change, that is not a factor of whether or not a reproduction should be allowed, but rather if something can be considered a reproduction. That should be a task for domain experts, and I think historically that has been done, Even Philip Yorke did that.

    It is certainly not cut and dry, generally I tend to argue precisely that point when it comes to the details, but the guiding principles are clear.

    Ideas are not copyrightable.

    The laws involved are supposed to benefit society as a whole.

    Patents provide some protection in the area of ideas. While heavily misused they are intended to incentivise development to make things actually work granting some exclusivity as a reward. A pure idea should not be considered enough (but often is) it ought to be the application of the idea.

    Copyright is intended to promote creativity by offering a means to generate revenue from a creative work. The goal was to produce more (or at the least avoid inhibiting) works providing rewards to those who produce them.

    There are issues today with laws being influenced to benefit a minority. A lot of IP laws were created in this way. I think the solution is to advocate and work towards laws that benefit society as a whole. Unfortunately people seem to see the imbalance and it normalizes the view that you should leverage the notion of intellectual property to get something for yourself (or your tribe).

    The world is changing, when the notion of copyright was first Expressed, there was no ability to create a thing and then sell a million of them with no extra effort. When mass production and then mass media enabled this it allowed more creative works by allowing the cost of the work to be distributed across many. It was never the intention for copyright to enable a few people to get wealthy from a few popular works. As soon as the dominance of those few popular works made it harder for other creatives to make things themselves, copyright law started acting against the principles that it was founded upon