Comment by heavyset_go

2 days ago

> This is a position that seems to be as unenforceable as AI can't be trained on code whose copyright owners have not given consent.

This depends on what courts find, at least one non-precedent setting case found model training on basically everyone's IP without permission to be fair use. If it's fair use, consent isn't needed, licenses don't matter and the only way to prevent training on your content is to withhold it and gate it behind contracts that forfeit your clients' rights to fair use.

But that is beside the point, even if what you claim was the case, my point is that AI output isn't property. It's not property whether its training corpus was full of licensed or unlicensed content. This is what the US Copyright Office determined.

If you include AI output in your product, that part of it isn't yours. It isn't anybody's, so anyone can copy it and anyone can do whatever they want with it, including the AI/cloud providers you allowed your code to get slurped up to as context to LLMs.

You want to own your IP, you don't want to say "we own 30% of the product we wrote, but 70% of it is non-property that anyone can copy/use/sell with no strings attached, even open source licenses". This matters if you're building a business or open source project. If you include AI code in your open source project, that part of the project isn't covered by your license. LLMs can't sign CLAs and they can't produce intellectual property that can be licensed or owned. The more of your project that is developed by AI, the more it is not yours, and the more of it cannot be covered by your open source license of choice.

> This is what the US Copyright Office determined.

There are hundreds of countries in the world. Whatever the "US Copyright Office" determines, applies to only one of them.