Comment by Bjorkbat

6 months ago

Yeah, this is something I find kind of tricky. I definitely believe that AI companies should get permission from rightsholders to train on their works, but actually compensating them for their works seems pointless. To make the royalties worthwhile you'd have to raise the cost per query to an absolutely absurd level

The amounts are not the only problem; there's no good way to measure which input in the training contributed to what degree to the output. I wouldn't be surprised if it turns out it's fundamentally impossible.

Paying everyone a flat rate per query is probably the only way you could do it; any other approach is either going to be contested as unfair in some way, or will be too costly to implement. But then, a flat rate is only fair if it covers everyone in proportion to the contribution, which will get diluted by the portion of training data that's not obviously attributable, like Internet comments or Wikipedia or public domain stuff or internally generated data, so I doubt authors would see any meaningful royalties from this anyway. The only thing it would do, is to make LLMs much more expensive for the society to use.