Comment by breve

10 hours ago

What are AI companies doing to respect open source licenses and copyright?

I'm sure they train their models on open source software, so how do I know that LLM generated code doesn't reproduce substantial chunks of, for example, GPL licensed code? If indeed there are GPL violations, what are AI companies doing to police themselves?

I wonder if open source licenses will start to include "not to be used for LLM training" clauses.

> I wonder if open source licenses will start to include "not to be used for LLM training" clauses

As if the LLM trainers would care. They've ignored every single license and copyright policy out there because "fair transformative use". It's undergoing litigation in various jurisdictions, and the chaotic side of me really wants to see what happens if a UK or California decide that training an LLM on pirated copyrighted material is not fair use, and the rights holders have to be compensated.