Comment by CWuestefeld
6 hours ago
If we take the point of view that LLMs are tools (I agree), then people need to be absolutely certain that these tools don't contain (compressed) representations of copyrighted works.
I've pointed out elsewhere in this thread that this is the opposite of how the real world works.
In actual fact, people who need software built hire a tool (e.g., a software developer like me) to build it for them. That tool - me or you - has inside it a tremendous library of copyrighted works represented. I've worked on enough different projects over the decades that the next CRUD function, or rule-driven data-entry tool, or whatever, that I build is going to draw very significantly from the last ones I built. And those last ones were copyrighted, with those rights held by my employer at the time, and maybe even protected by NDA or defense-style classifications.
Is your position that this is OK so long as it's stuff that I can keep in my squishy brain, but the moment that mechanism moves to silicon, it somehow becomes fundamentally different?
The other major argument I see in this thread is that for LLMs it's different because there's a third party who is aggregating the data, and selling me (or my employer) use of that tool. But this doesn't change the overall picture at all. It just adds one more layer of dereferencing into it. The addition of that middleman hasn't altered the moral landscape: how is hiring me, along with what's in my memory, different from hiring the combination of me plus a helper to supplement my memory? There's an aspect of scale, I suppose. With that helper I can achieve greater quantities, but it's not changing the story in a qualitative way.
No comments yet
Contribute on Hacker News ↗