← Back to context

Comment by jobs_throwaway

9 hours ago

Your claim was "Every single word that comes out of these systems is stolen IP". This code was never in the corpus of training data. How could it be stolen?

Are you moving the goalpost to "Every single word that comes out of these systems relies on understanding gained from stolen IP"?

Yes, I am saying exactly that. I guess I wasn't clear enough in my previous comment.

  • Then every single human being is also guilty of what you accuse LLMs of. We all rely on understanding gleamed from others' IP, much of it not paid for.

    • I mean, it's a very common argument and it's simply flawed.

      You as a human are allowed to read the contents of say IMBD and summarise it to your friends free of charge. You can even be a paid movie critic and base your opinions on IMDB just fine. But if you build a website that says "I'll give you my opinion about a film for £5" and it's just based on the input from IMBD I'm sure we can both agree that you crossed the line - and that you're using another person's service to make your own business without compensating them. That's what LLMs are doing.

      Honestly I'm just so tired of the whole "yeah but humans are the same because we also learn by reading stuff". These companies have effectively "read" everything ever made, free of charge, and are selling it back to us packaged in stupid bots that can only function because they were given that data. It doesn't compare at all to how a human learns and then uses information, unless you know someone who can do it on that kind of scale. LLMs don't "gleam" - they consume wholesale.