Comment by cycomanic

3 months ago

> Very true. Just the other day, another “copyright is bad” post on the front page. Today its copyright is good because otherwise people might get some use of material in LLMs. > > Considering this is hacker news, it seems to be such an odd dichotomy. Sometimes it feels like anti-hacker news. The halcyon days of 2010 after long gone. Now we need to apparently be angry at all tech. > > LLMs are amazing and I wish they could train on anything and everything. LLMs are the smartphone to the fax machines of Google search.

Sorry this such a (purposefully?) naive take. In reality the thoughts are much more nuanced. For one open source/free software doesn't exist without copyright. Then there is the whole issue that these companies use vast amount of copyrighted material to train their models, arguing that all this is fair use. But on the other hand they lock their models behind walls, disallow training on them, keep the training methods and data selection secret...

This tends to be what people disagree with. It feels very much different rules for thee and me. Just imagine how outraged Sam Altman would act if someone leaked the code for Gpt5 and all the training scripts.

If we agree that copyright does not apply to llms, then it should also not apply to llms and they should be required to release all their models and the way of training them.

2 comments

cycomanic

Ea-Nasir 3 months ago

Does that mean you would support open LLM model training on copyrighted data?

cycomanic 3 months ago

I think that opens several other cans of worms, but in principle I would support a solution that allows using copyrighted materials if it is for the common good (I.e the results are released fully open, means not just weights but everything else).
As a side note i am definitely not strong into IP rights, but I can see the benefits of copyright much more clearly than patents.