← Back to context

Comment by ben_w

7 months ago

> (I suppose it depends on whether you see the training process more like model learning, vs. more like model being a derived work. The latter feels absurd to me.)

It can be anywhere on the continuum between them; and the rules need to consider the gap between what happens by default vs. what is considered (economically and/or morally) desirable, which need not be a linear function of the position on that continuum.

The least creative AI model possible returns the nearest match in the training set verbatim. (e.g. Google).

The most creative model possible can from the training data construct a coherent set of vectors that span the n-dimensional space of concepts in that training data, including hypothesising about missing implicit dimensions in the way that we figured out non-Euclidian geometry by going "we can't prove this bit, what if it's wrong?"

I don't know where any given LLM is on this continuum, only that they're certainly not at either end.

I think that economically, we were already far beyond the point where copyright helps actual economic productivity (as opposed to rent extraction) even 50 years ago — easy mass production left us with a small number of massive hits each year, at the expense of most creative people making almost nothing. More recently, micro-payments and subscriptions, models like patreon etc. or YouTube ads, allow a lot of small people back into the market, but even then, it looks like copyright rules are often ignored as "fair use" (even when it isn't) or abused to attack rivals, or even just processed automatically (I think Tom Scott had an example of his own videos being claimed by someone else?)

But people don't only care about money, they do also care about morals — and a lot of people are very upset that human creativity is now SaaS.