Comment by raincole
2 hours ago
It's not a clean-room implementation, but not because it's trained on the internet.
It's not a clean-room implementation because of this:
> The fix was to use GCC as an online known-good compiler oracle to compare against
If you read the entire GCC source code and then create a compatible compiler, it's not clean room. Which Opus basically did since, I'm assuming, its training set contained the entire source of GCC. So even if they were actively referencing GCC I think that counts.
The classical definition of a clean room implementation is something that's made by looking at the output of a prior implementation but not at the source.
I agree that having a reference compiler available is a huge caveat though. Even if we completely put training data leakage aside, they're developing against a programmatic checker for a spec that's already had millions of man hours put into it. This is an optimal scenario for agentic coding, but the vast majority of problems that people will want to tackle with agentic coding are not going to look like that.