← Back to context

Comment by plaidfuji

6 hours ago

It’s the exact same mental gymnastics that cause people to accuse model providers of large-scale plagiarism.

That is to say, not that much gymnastics. Like a cartwheel at most.

I don't really agree with those guys either.

The reason is fairly straightforward: there's no alternative if you need the dataset.

Copyright law makes it a huge amount of effort to get even an incomplete version.

And use in LLMs is transformative, so it would fall under fair use. The only reason they're in trouble with the courts at the moment from my understanding is that they pirated the content instead of idk, ripping it from Libby.

Anna's Archive aren't filing the serial numbers off the epubs they redistribute. Rightfully or wrongly distributed, the attribution is crystal clear.