Comment by literalAardvark
10 hours ago
I don't really agree with those guys either.
The reason is fairly straightforward: there's no alternative if you need the dataset.
Copyright law makes it a huge amount of effort to get even an incomplete version.
And use in LLMs is transformative, so it would fall under fair use. The only reason they're in trouble with the courts at the moment from my understanding is that they pirated the content instead of idk, ripping it from Libby.
No comments yet
Contribute on Hacker News ↗