Comment by voidUpdate

6 hours ago

My preference is that if you need to use terabytes of data to train an LLM, that data should be used according to its copyright, and with the consent of the copyright holder, not just hoovered up from wherever you can find just a few bytes more data

0 comments