← Back to context

Comment by worldsayshi

19 days ago

As long as the resulting PR is less than 100 lines and the AI is a bit more self sufficient (like actually making sure tests pass before "pushing") it would be ok I think. I think this process is intended for fixing papercuts rather than building anything involved. It just isn't good enough yet.

As a matter of principle I don't use any network which is trained on non-consensual data ripped of its source and license information.

Other than that, I don't think this is bad tech, however, this brings another slippery slope. Today it's as you say:

> I think this process is intended for fixing papercuts rather than building anything involved. It just isn't good enough yet.

After sufficient T somebody will rephrase it as:

> I think this process is intended for writing small, personal utilities rather than building enterprise software. It just isn't good enough yet.

...and we will iterate from there.

So, it looks like I won't touch it for the foreseeable future. Maybe if the ethical problems with training material is solved (i.e. trained with data obtained with consensus and with correct licenses), I can use as alongside other analysis and testing tools I use, for a final pass.

AI will never be a core and irreplaceable part of my development workflow.

  • > AI will never be a core and irreplaceable part of my development workflow.

    Unless AI use becomes a KPI in your annual review.

    Duolingo did that just recently, for example.

    I am developing serious regrets for conflating "computing as a medium for personal expression" with "computing for livelihood" early on.

  • I feel there's a fundamental flaw in this mindset which I probably don't understand enough layers of to explain properly. Maybe it's my thinking here that is fundamentally flawed? Off the top of my head:

    If we let intellectual property be a fundamental principle the line between idea (that can't be owned) and ip (that can be owned) will eventually devolve into a infinitely complex fractal that nobody can keep track of. Only lawyer AI's will eventually be able to tell the difference between idea and ip as the complexity of what we can encode become more complex. Why is weights not code when it clearly contain the ability to produce the code? Is a brain code? Are our experiences like code?

    What is the fundamental reason that a person is allowed to train on ip but a bot is not? I suspect that this comes down to the same issue with the divide between ip and idea. But there might be some additional dimension to it. At some point we will need to see some AI as conscious entities and to me it makes little sense that there would be some magical discrete moment where an AI becomes conscious and gets rights to it's "own ideas".

    Or maybe there's a simple explanation of the boundary between ip and idea that I have just missed? If not, I think intellectual property as a concept will not stand the test of time. Other principles will need to take its place if we want to maintain the fight for a good society. Until then IP law still has its place and should be followed but as an ethical principle it's certainly showing cracks.

    • I'll write a detailed answer to your comment, but I don't currently have time to do so, and probably post as another reply.

      I just don't want to type something away haphazardly, because your questions deserve more than 30 seconds to elaborate.

    • It took longer than I planned, sorry. But here we go:

      When you look at proper research, whether from academia or from private corporations, you can always keep track of ideas and intellectual property resulting from these ideas. Ideas are mature into documents, research reports, and proof of concepts. In some cases, you can find the process as Lab Notebooks. These notebooks are kept by respecting a protocol, and they’re more than a collection of ideas. It’s a “brain trail”. Then, you publish or patent these ideas. Ideally both. These artifacts (publications and patents) contain references and citations. As a result, you can track who did what and what came after what invention. In a patent case, you may even need to defend your patent to convince that it’s not the same invention that was patented before. In short, you have a trail. There are no blurry lines there.

      The thing is, copyright law and the idea of intellectual property are created by humans for humans. First, I’ll ask this question: If an instructor or academic is not allowed to teach a course without providing references, whether to the book itself or the scientist who invented something, why is a bot allowed? Try citing a piece of a book/publication in a course or paper or research without giving a reference, and you’re officially a fraud, and your whole career is in shambles. Why a bot is allowed to do this, let it be a book or a piece of code? For the second perspective, I’ll ask a pair of questions: 1) How many of the books you have read can be recalled by you exactly, or as a form of distilled summary? 2) For how long can you retain this information without any corruption whatsoever? 3) How many books can you read, understand, summarize, and internalize in an hour? A bot can do thousands without any corruption and without any time limit. As a result, an LLM doesn’t learn; it ingests, stores, and remixes.

      A human can’t do that to a book (or any artifact) if its license doesn’t allow it or its creator gives explicit consent. Why can a bot? An LLM is a large stochastic blender that tends to choose correct words due to its weighted graph. A human does it much differently. It reads, understands, and lets that idea cook by mixing with their own experience and other inputs (other people, emotions, experiences, and more) and creates something unique outside the graph. Yet this creation has its limits. No machine can create something more complex than itself. An LLM can never output something more complex than the knowledge encoded in its graph. It might light dark corners, but it can’t expand borders. The asymptotic limit is collective human intelligence, even if you give it tools.

      So, yes, the IP law is showing its cracks because it’s designed for humans, not bots. However, I value ethics above everything else. My ethics is not defined by laws but by something much higher. As I replied to someone, “I don’t need to be threatened to be burned for all eternity to be good.” Similarly, I don’t need a law to deem something (un)ethical. If what’s done is against the spirit of humanity, then it’s off-limits for me.

      I’d never take something without permission and milk it for my own benefit, esp. if the owner of that thing doesn’t consent. I bought all the software I pirated when I started to earn my own money, and I stopped using software that I couldn’t afford or didn’t want to buy. This is the standard I’m operating at, and I hold all the entities I interact with to that exact standard. Lower than this is unacceptable, so I don’t use LLMs and popular LLMs.

      On the other hand, not all AI is the same, and there are other good things that I support, but they are scientific tools, not for consumers directly.

      Hope this helps.