← Back to context

Comment by sholain

3 days ago

RAG and LLMs are not the same thing, but 'Agents' incorporate both.

Maybe we could resolve the bit of a conundrum by the op in requiring 'agents' to give credit for things if they did rag them or pull them off the web?

It still doesn't resolve the 'inherent learning' problem.

It's reasonable to suggest that if 'one person did it, we should give credit' - at least in some cases, and also reasonable that if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

But a couple of considerations:

- It may not be that common for an LLM to 'see one thing one time' and then have such an accurate assessment of the solution. It helps, but LLMs tend not to 'learn' things that way.

- Some people might consider this the OSS dream - any code that's public is public and it's in the public domain. We don't need to 'give credit' to someone because they solved something relatively arbitrary - or - if they are concerned with that, then we can have a separate mechanism for that, aka they can put it on Github or Wikipedia even, and then we can worry about 'who thought of it first' as a separate consideration. But in terms of Engineering application, that would be a bit of a detractor.

> if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

I think it should.

Sure, if you make a small amount of money and divide it among the 1000 people who deserve credit due to their work being used to create ("train") the model, it might be too small to bother.

But if actual AGI is achieved, then it has nearly infinite value. If said AGI is built on top of the work of the 1000 people, then almost infinity divided by 1000 is still a lot of money.

Of course, the real numbers are way larger, LLMs were trained on the work of at least 100M but perhaps over a billion of people. But the value they provide over a long enough timespan is also claimed to be astronomical (evidenced by the valuations of those companies). It's not just their employees who deserve a cut but everyone whose work was used to train them.

> Some people might consider this the OSS dream

I see the opposite. Code that was public but protected by copyleft can now be reused in private/proprietary software. All you need to do it push it through enough matmuls and some nonlinearities.

  • - I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit - or certainly 'financial remuneration' because they wrote some arbitrary piece of code.

    That scenario is already today very well accepted legally and morally etc as public domain.

    - Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical. Less than 2% of OSS projects are copyleft. It's a legit perspective obviously, but it hasn't bee representative for 20 years.

    Whatever we do with AI, we already have a basic understanding of public domain, at least we can start from there.

    • > I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit

      There's 8B people on the planet, probably ~100M can code to some degree[0]. Something only 1k people write is actually pretty rare.

      Where would you draw the line? How many out of how many?

      If I take a leaked bit of Google or MS or, god forbid, Oracle code and manage to find a variation of each small block in a few other projects, does it mean I can legally take the leaked code and use it for free?

      Do you even realize to what lengths the tech companies went just a few years ago to protect their IP? People who ever even glanced at leaked code were prohibited from working on open source reimplementations.

      > That scenario is already today very well accepted legally and morally etc as public domain.

      1) Public domain is a legal concept, it has 0 relevance to morality.

      2) Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?

      > Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical.

      This sentence seems highly ideological. Linux is GPL, in fact, probably most SW on my non-work computer is GPL. It is very practical and works much better than commercial alternatives for me.

      > Less than 2% of OSS projects are copyleft.

      Where did you get this number? Using search engines, I get 20-30%.

      [0]: It's the number of github users, though there's reportedly only ~25M professional SW devs, many more people can code but don't professionaly.

      1 reply →