← Back to context

Comment by rpdillon

8 hours ago

This a strawman, in the sense that it is not accurate to think about AI models as a compressed form of their training data, since the lossiness is so high. One of the insights from the trial is the LLMs are particularly poor at reproducing original texts (60 tokens was the max found in this trial, IIRC). This is taken into account when considering fair use based on the fourth fair use factor: how the work impacts the market for the original work. It's hard to make an argument that LLMs are replacing long-form text works, since they have so much trouble actually producing them.

There's a whole related topic here in the realm of news (since it's shorter form), but it also has a much shorter half-life. Not sure what I think there yet.