← Back to context

Comment by pmarreck

8 hours ago

Related but not sure how much attention it's getting:

GPL is a dead man walking since you can have any LLM cleanroom a new implementation in a new language from a public spec with verifiable "never looked at the original source" and it can be more permissively-licensed however you wish (MIT, BSD etc).

case in point, check out my current deps on the project I'm currently working on with LLM assist: https://github.com/pmarreck/validate/tree/yolo/deps

"validate" is a project that currently validates over 100 file formats at the byte level; its goal is to validate as many formats as possible, for posterity/all time.

Why did I avoid GPL (which I am normally a fan of) since this is open-source? I have an even-higher-level project I'm working on, implementing automatic light parity protection (which can proactively repair data without a RAID/ZFS setup) which I want to make for sale, whose code will (initially) be private, and which uses this as a dependency (no sense in protecting data that is already corrupted).

Figured I'd give this to the world for free in the meantime. It's already found a bunch of actually-corrupt files in my collection (note that there's still some false-positive risk; I literally released this just yesterday and it's still actively being worked on) including some cherished photos from a Japan trip I took a few years ago that cannot be replaced.

It has Mac, Windows and Linux builds. Check the github actions page.

As long as you hold the copyright to your "validate" project, you can dual license it. So can release it under the (A)GPL and also use it in your closed-source commercial project. I think you would need to make contributors sign a contributor license agreement (CLA) though, to continue using their contributions to "validate" in your closed-source offering.

Did something change? Is LLM generated stuff now able to be protected with copyrights?

I was under the impression that copyright was only available for works created by people.

  • Copyright has never required the absence of tools, only the presence of human creative agency.

    LLMs don’t change that. They’re just tools. The relevant question is whether a human exercised creative judgment in selecting, shaping, or editing the output. Which I most certainly did. Otherwise my month of work here would have just been a minute and a single prompt and then just grabbing a coffee.

    We already handle this with cameras, compilers, spell-checkers, and procedural generation. If you push a button and accept whatever falls out, you may not have authorship. If you meaningfully direct and curate the result, you almost certainly do.

    Responsibility still lands on the human, which is a good sanity check: there is no such thing as “LLM-authored but human-unaccountable” work. Pretending otherwise is just outsourcing epistemic responsibility to a stochastic tool. Similarly, you can't both assign blame to me when things go wrong, but credit to the LLM when things go right. Or vice versa.

> which means full reads and scrubs touch more bits and inevitably brush against those error rates

Does this make sense at all? ZFS scrubs only reads the data you have, not the whole drive, and repairs data if possible. The more data you have, the more you have to validate regardless of the tools used. The BER is also just a terrible metric and is not reflective of how drives actually behave.

>verifiable "never looked at the original source"

...erm.

To adress the elephant in the room: Who exactly is supposed to be verifiable to never have looked at the original source? You or the LLM?

  • Both. Good question.

    Although it would be much harder to prove, of course, that I never saw the original code/implementation.