Comment by GeoAtreides
9 hours ago
he doesn't have solid points, he conflates fair use with free use (?), ignores thousands of years of attribution history, and equates normal human to human learning with corporate LLMs training on original content (without consent). Great presentation, like you said, to cover the logical defects.
I did say "free use" instead of "fair use," yeah. That's my mistake, thank you for the correction. If I could edit my original comment, I would, mea culpa. Typos happen.
I see. I must congratulate you on your rhetorical prowess, it's nice seeing a professional at work.
Fair use of training data hasn’t yet been settled in court. People here are treating it like it has been. But no amount of wishful thinking or moral arguments will change a verdict saying it’s fine for training data to be used as it has been.
Until that question is settled, it’s disingenuous to dismiss his points out of hand as conflating fair use or ignoring consent.
Even beyond that, the initial legal opinion we do have did in fact point to training being fair use: https://www.reuters.com/legal/litigation/anthropic-wins-key-...
However, I don't feel comfortable suggesting that this is settled just yet, one district judge's opinion does not mean that other future cases may disagree, or we may at some point get explicit legislation one way or the other.
I think the court dropped the ball here. On the one hand, I think they were right that using existing works--copyrighted or otherwise--to train a model was transformable fair use. On the other hand, Anthropic and others trained their models on illicit copies of the works; they (more often than not) didn't pay the copyright holders.
There's a doctrine in Fifth Amendment law called "fruit of the poisonous tree." The general rule is that prosecutors don't get to present evidence in a criminal trial that they gained unlawfully. It's excluded. The jury never gets to see it even if it provides incontrovertible evidence of guilt. The point is to discourage law enforcement from violating the rights of the accused during the investigative process, and to obtain a warrant as the Amendment requires.
It seems to me that the same logic ought to be applied to these companies. They want to make money by building the best models they can. That's fine! They should be able to use all the source data they can legitimately obtain to feed their training process. But if they refuse to do so and resort to piracy, they mustn't be allowed to claim that they then used it fairly in the transformative process.
I was just enumerating some of the issues with the '''solid''' points OP made. Actually addressing them would take too long and be exercise in futility, here, in HN, in april 2026. Why would I put in the effort, for my comment to be flagged and sent to the void? or worse, persisted for ever and used for training without my consent?
And yes, you are right, the legal and moral question of fair use in training data hasn't been settled yet; we agree here.