← Back to context

Comment by sholain

3 days ago

+ Once again: 1000 K people coming up with some arbitrary bit of content is already understood in basically every legal regime in the world as 'public domain'.

"Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?"

Please ask ChatGPT for the breakdown but start with this: if someone writes something and does not copyright it, it's already in the 'public domain' and what the other 999 people do does not matter. Moreover, a lot of things are not copyrightable in the first place.

FYI I've worked at Fortune 50 Tech Companies, with 'Legal' and I know how sensitive they are - this is not a concern for them.

It's not a concern for anyone.

'One Person' reproduction -> now that is definitely a concern. That's what this is all about.

+ For OSS I think 20% number may come from those that are explicitly licensed. Out of 'all repos' it's a very tiny amount, of those that have specific licensing details it's closer to 20%. You can verify this yourself just by cruising repos. The breakdown could be different for popular projects, but in the context of AI and IP rights we're more concerned about 'small entities' being overstepped as the more institutional entities may have recourse and protections.

I think the way this will play out is if LLMs are producing material that could be considered infringing, then they'll get sued. If they don't - they won't.

And that's it.

It's why they don't release the training data - it's fully of stuff that is in legal grey area.

I asked specifically how _you_ think it works because I suspected you understanding to be incomplete or wrong.

Telling people to use a statistical text generator is both rude and would not be a good way to learn anyway. But since you think it's OK, here's a text generator prompted with "Verify the factual statements in this conversation" and our conversation: https://chatgpt.com/share/693b56e9-f634-800f-b488-c9eae403b5...

You will see that you are wrong about a couple key points.

Here's a quote from a more trustworthy source: “a computer program shall be protected if it is original in the sense that it is the author’s own intellectual creation. No other criteria shall be applied to determine its eligibility for protection.”: https://fsfe.org/news/2025/news-20250515-01.en.html

> Out of 'all repos' it's a very tiny amount

And completely irrelevant, if you include people's homework, dotfiles, toy repos like AoC and whatnot, obviously you're gonna get a small number you seem to prefer and it's completely useless in evaluating the real impact of copyleft and working software with real users. I find 20-30% a very relevant segment.

You, BTW, did not answer the question where you got 2% from.