Comment by ern_ave
7 hours ago
My guess is that AI training is the main issue.
Data that you can prove was generated by humans is now exceedingly valuable ...and most of that comes from the days before LLMs. The situation is a bit like how steel manufactured before the nuclear age is valuable.
But why would people train on excerpts from Google Books when whole books can be downloaded on libgen and such?
Google books is much bigger than libgen.
copyright reasons?
Both are a copyright violation