Comment by kevin42

12 days ago

I’m genuinely curious how you feel about LLMs being trained on pirated material. Not being snarky here.

Your comment reflects the old “information wants to be free” ideals that used to dominate places like HN, Slashdot, and Reddit. But since LLMs arrived, a lot of the loudest voices here argue the opposite position when it comes to training data.

I’ve been trying to understand whether people have actually changed their views, or whether it’s mostly a shift in who is speaking up now.

Personally, my opinion doesnt matter. I'm a nobody who doesnt work in AI fields.

But as a pirate, I specialize in finding hidden, hard to find, or otherwise lost sources. They're not making anybody any money, and I absolutely do not sell anything thats not mine (freely given).

But having every commercial work available for ingestion into an LLM is an amazing way to train an AI. However if you're going to use piracy at scale to train, you should also not be able to sell the LLM or access to it.

And yeah, that wrecks every corporate LLM strategy. Boo fucking hoo.

Do creators need paid for content they create? Ideally, yes! Do they deserve iron-fisted control of your hardware (DRM) to enact their demands? Fuck no!

Ideally, the LLMs would be FLOSS, full weights published, lists of content used to reproduce, etc. We could prune bad content and add more good. But the problem again is whoever does this must violate copyright cause copyright in the way its implemented is terrible.

In reality, I like the RIAA's congressional solution. You send a check for how many plays you did to BMI/ASCAP and you're good. That could be extended to books and shows. If that were done, you could have a New-Flix service that literally has every show and movie in existence. You just pay a reasonable cost per month to access the whole of video humanity.

Alas. Guess I'll have to build it myself.

why would that change anything? copyright is still a tax on the whole of society for the benefit of rich people and corporations. it opposes innovation, evolution and progress

maybe a short copyright would be fine (10 year fixed?) but copyright as-is seems indefensible to me

  • > copyright is still a tax on the whole of society for the benefit of rich people and corporations. it opposes innovation, evolution and progress

    The original reason for copyright, patents, and trademarks made sense.

    We want people to create and share. And unlike the old guild solutions from Europe, copyright and patents were a tradeoff to encourage the arts and science.

    But what's a good tradeoff? Thats a big copyright question. 17 years? 34 years? Life of author? 75 years? How about individual non-commercial use? Or abandoned works?

    And patents aren't even in scope, but we see similar abuses against the raison d'etra of them. Patents were supposed to entail a full reproduction of invention. Now, its a game of how incomplete can we make the filing while still getting protection. Or worse yet, really dumb shit has been patented like 1 click or the XOR patent, or that asshole Chakrabarty who patented living organisms.

    There were good reasons for a fair copyright and patent law for furtherance of the art and sciences. That narrative was lost long ago. Now, only the violators can really push ahead. And they can't talk about it.

    (Trademark law has never really had much complaints, aside trademarking a color. If you buy from XYZ company, you want to buy from them, not a counterfeit. And it relates back to coats of arms, again, representing a family or a charge.)

Personally, I'd like for copyright to be abolished, and then for LLM training to be made illegal for reasons entirely unrelated to copyright.