Comment by Bjorkbat

6 months ago

Personally I think a more effective analogy would be if someone used a textbook and created an online course / curriculum effective enough that colleges stop recommending the purchase of said textbook. It's honestly pretty difficult to imagine a movie having a meaningful impact on the sale of textbooks since they're required for high school / college courses.

So here's the thing, I don't think a textbook author going against a purveyor of online courseware has much of a chance, nor do I think it should have much of a chance, because it probably lacks meaningful proof that their works made a contribution to the creation of the courseware. Would I feel differently if the textbook author could prove in court that a substantial amount of their material contributed to the creation of the courseware, and when I say "prove" I mean they had receipts to prove it? I think that's where things get murky. If you can actually prove that your works made a meaningful contribution to the thing that you're competing against, then maybe you have a point. The tricky part is defining meaningful. An individual author doesn't make a meaningful contribution to the training of an LLM, but a large number of popular and/or prolific numbers can.

You bring up a good point, interpretation of fair use is difficult, but at the end of the day I really don't think we should abolish copyright and IP altogether. I think it's a good thing that creative professionals have some security in knowing that they have legal protections against having to "compete against themselves"

> An individual author doesn't make a meaningful contribution to the training of an LLM, but a large number of popular and/or prolific numbers can.

That's a point I normally use to argue against authors being entitled to royalties on LLM outputs. An individual author's marginal contribution to an LLM is essentially nil, and could be removed from the training set with no meaningful impact on the model. It's only the accumulation of a very large amount of works that turns into a capable LLM.

  • Yeah, this is something I find kind of tricky. I definitely believe that AI companies should get permission from rightsholders to train on their works, but actually compensating them for their works seems pointless. To make the royalties worthwhile you'd have to raise the cost per query to an absolutely absurd level

    • The amounts are not the only problem; there's no good way to measure which input in the training contributed to what degree to the output. I wouldn't be surprised if it turns out it's fundamentally impossible.

      Paying everyone a flat rate per query is probably the only way you could do it; any other approach is either going to be contested as unfair in some way, or will be too costly to implement. But then, a flat rate is only fair if it covers everyone in proportion to the contribution, which will get diluted by the portion of training data that's not obviously attributable, like Internet comments or Wikipedia or public domain stuff or internally generated data, so I doubt authors would see any meaningful royalties from this anyway. The only thing it would do, is to make LLMs much more expensive for the society to use.

> it's a good thing that creative professionals have some security in knowing that they have legal protections

This argument would make sense if it was across the board, but it's impossible (and pretty ridiculous) to enforce in basically anything except very narrow types of media.

Let's say I come up with a groundbreaking workout routine. Some guy in the gym watches me for a while, adopts it, then goes on to become some sort of bodybuilding champion. I wouldn't be entitled to a portion of his winnings, that would be ridiculous.

Let's say I come up with a cool new fashion style. Someone sees my posts on insta and starts dressing similarly, then ends up with a massive following and starts making money in a modelling career. I wouldn't be entitled to a portion of their income, that would be ridiculous.

And yet, for some reason, media is special.