Comment by hmry

21 days ago

If I, a human, read the source code of $THING and then later implement my own version, that's not a "clean-room" re-implementation. The whole point of "clean-room" is that no single person has access to both the original code and the new code. (That way, you can legally prove that no copyright infringement took place.)

But when an AI does it, now it counts? Opus is trained on the source code of Clang, GCC, TCC, etc. So this is not "clean-room".

Agree, but to an even further degree.

At one point there were issues with LLMs regurgitating licensed code verbatim. I have no doubt that Claude could parrot a large portion of GCC given correct prompting.

Being able to memorize the various C compiler implementations, alongside the sum of human knowledge, is an incredible feat. However, this is in a distinctly different domain to what a human does when writing a clean-room compiler implementation in the absence of near perfect recall of all C compiler implementations. The way that Claude solved this is probably something a human can't do, the way a human would solve this is definitely something Claude can't do.

Copyright doesn't protect ideas, it protects writing. Avoiding reading LLVM or GCC is to protect you from other kinds of IP issues, but it's not a copyright issue. The same people contribute to both projects despite their different licenses.

  • They don't call Clang a "clean-room implementation". Unlike Anthropic, who are calling their project exactly that

    A clean-room implementation is when you implement a replacement by only looking at the behavior and documentation (possibly written by another person on your team who is not allowed to write code, only documentation).

That's not the only way to protect yourself from accusations of copyright infringement. I remember reading that the GNU utils were designed to be as performant as possible in order to force themselves to structure the code differently from the unix originals.

  • Yes, but Anthropic is specifically claiming their implementation is clean-room, while GNU never made that claim AFAIK.