Comment by bayindirh

7 days ago

It took longer than I planned, sorry. But here we go:

When you look at proper research, whether from academia or from private corporations, you can always keep track of ideas and intellectual property resulting from these ideas. Ideas are mature into documents, research reports, and proof of concepts. In some cases, you can find the process as Lab Notebooks. These notebooks are kept by respecting a protocol, and they’re more than a collection of ideas. It’s a “brain trail”. Then, you publish or patent these ideas. Ideally both. These artifacts (publications and patents) contain references and citations. As a result, you can track who did what and what came after what invention. In a patent case, you may even need to defend your patent to convince that it’s not the same invention that was patented before. In short, you have a trail. There are no blurry lines there.

The thing is, copyright law and the idea of intellectual property are created by humans for humans. First, I’ll ask this question: If an instructor or academic is not allowed to teach a course without providing references, whether to the book itself or the scientist who invented something, why is a bot allowed? Try citing a piece of a book/publication in a course or paper or research without giving a reference, and you’re officially a fraud, and your whole career is in shambles. Why a bot is allowed to do this, let it be a book or a piece of code? For the second perspective, I’ll ask a pair of questions: 1) How many of the books you have read can be recalled by you exactly, or as a form of distilled summary? 2) For how long can you retain this information without any corruption whatsoever? 3) How many books can you read, understand, summarize, and internalize in an hour? A bot can do thousands without any corruption and without any time limit. As a result, an LLM doesn’t learn; it ingests, stores, and remixes.

A human can’t do that to a book (or any artifact) if its license doesn’t allow it or its creator gives explicit consent. Why can a bot? An LLM is a large stochastic blender that tends to choose correct words due to its weighted graph. A human does it much differently. It reads, understands, and lets that idea cook by mixing with their own experience and other inputs (other people, emotions, experiences, and more) and creates something unique outside the graph. Yet this creation has its limits. No machine can create something more complex than itself. An LLM can never output something more complex than the knowledge encoded in its graph. It might light dark corners, but it can’t expand borders. The asymptotic limit is collective human intelligence, even if you give it tools.

So, yes, the IP law is showing its cracks because it’s designed for humans, not bots. However, I value ethics above everything else. My ethics is not defined by laws but by something much higher. As I replied to someone, “I don’t need to be threatened to be burned for all eternity to be good.” Similarly, I don’t need a law to deem something (un)ethical. If what’s done is against the spirit of humanity, then it’s off-limits for me.

I’d never take something without permission and milk it for my own benefit, esp. if the owner of that thing doesn’t consent. I bought all the software I pirated when I started to earn my own money, and I stopped using software that I couldn’t afford or didn’t want to buy. This is the standard I’m operating at, and I hold all the entities I interact with to that exact standard. Lower than this is unacceptable, so I don’t use LLMs and popular LLMs.

On the other hand, not all AI is the same, and there are other good things that I support, but they are scientific tools, not for consumers directly.