← Back to context

Comment by heavyset_go

8 days ago

The same should apply to LLMs. If you're going to train on the sum total of all of humanity's creative work, from the beginning of history into perpetuity, and train on the sum total of all current intellectual property, the result should exist for the public's education, research and benefit.

It would also be in the spirit of the fair use doctrine's first and fourth considerations:

> 1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

> 2. the nature of the copyrighted work;

> 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

> 4. the effect of the use upon the potential market for or value of the copyrighted work.

If that doesn't happen, increasing amounts information and human creativity will be siloed and never made publicly accessible in a way that it can be consumed and reproduced as slop.