Comment by thesz
20 days ago
> This test sorta definitely proves that AI is legit.
This is an "in distribution" test. There are a lot of C compilers out there, including ones with git history, implemented from scratch. "In distribution" tests do not test generalization.
The "out of distribution" test would be like "implement (self-bootstrapping, Linux kernel compatible) C compiler in J." J is different enough from C and I know of no such compiler.
> This is an "in distribution" test. There are a lot of C compilers out there, including ones with git history, implemented from scratch. "In distribution" tests do not test generalization.
It's still really, really impressive though.
Like, economics aside this is amazing progress. I remember GPT3 not being able to hold context for more than a paragraph, we've come a long way since then.
Hell, I remember bag of words being state of the art when I started my career. We have come a really, really, really long way since then.
Do we know how many attempts were done to create such compiler before during previous tests? Would Anthropic report on the failed attempt? Can this "really, really impressive" thing be a result of a luck?
Much like quoting Quake code almost verbatim not so long ago.
> Do we know how many attempts were done to create such compiler before during previous tests? Would Anthropic report on the failed attempt? Can this "really, really impressive" thing be a result of a luck?
No we don't and yeah we would expect them to only report positive results (this is both marketing and investigation). That being said, they provide all the code et al for people to review.
I do agree that an out of distribution test would be super helpful, but given that it will almost certainly fail (given what we know about LLMs) I'm not too pushed about that given that it will definitely fail.
Look, I'm pretty sceptical about AI boosting, but this is a much better attempt than the windsurf browser thing from a few months back and it's interesting to know that one can get this work.
I do note that the article doesn't talk much about all the harnesses needed to make this work, which assuming that this approach is plausible, is the kind of thing that will be needed to make demonstrations like this more useful.
3 replies →
There are two compilers that can handle the Linux kernel. GCC and LLVM. Both are written in C, not Rust. It's "in distribution" only if you really stretch the meaning of the term. A generic C compiler isn't going to be anywhere near the level of rigour of this one.
There is tinycc, that makes it three compilers.
There is a C compiler implemented in Rust from scratch: https://github.com/PhilippRados/wrecc/commits/master/?after=... (the very beginning of commit history)
There are several C compilers written in Rust from scratch of comparable quality.
We do not know whether Anthropic has a closed source C compiler written in Rust in their training data. We also do not know whether Anthropic validated their models on their ability to implement C compiler from scratch before releasing this experiment.
That language J I proposed does not have any C compiler implemented in it at all. Idiomatic J expertise is scarce and expensive so that it would be a significant expense for Anthropic to have C compiler in J for their training data. Being Turing-complete, J can express all typical compiler tips and tricks from compiler books, albeit in an unusual way.
TinyCC can't compile a modern linux kernel. It doesn't support a ton of the extensions they use. That Rust compiler similarly can't do it.