Comment by jaen
5 hours ago
Ugh, please don't read strawmen into other's arguments and try to follow the HN guidelines.
Also, how about making proper arguments yourself? The vast majority of the training data isn't generated by company-paid AI experts either.
Notably, books, even though they don't form a large part of the training data, significantly improve performance on some tasks (same way as expert-generated data).
Why do you think the AI labs are so eager about scanning (and then destroying) every book on the planet?
If you removed all copyrighted works from the training corpus, the model would be notably weaker.
No comments yet
Contribute on Hacker News ↗