Comment by ndesaulniers

21 days ago

I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel. https://clangbuiltlinux.github.io/

This LLM did it in (checks notes):

> Over nearly 2,000 Claude Code sessions and $20,000 in API costs

It may build, but does it boot (was also a significant and distinct next milestone)? (Also, will it blend?). Looks like yes!

> The 100,000-line compiler can build a bootable Linux 6.9 on x86, ARM, and RISC-V.

The next milestone is:

Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.

> The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

Still a really cool project!

291 comments

ndesaulniers

shakna 21 days ago

> Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

Does it really boot...?

ndesaulniers 21 days ago
> Does it really boot...?
They don't need 16b x86 support for the RISCV or ARM ports, so yes, but depends on what 'it' we're talking about here.
Also, FWIW, GCC doesn't directly assemble to machine code either; it shells out to GAS (GNU Assembler). This blog post calls it "GCC assembler and linker" but to be more precise the author should edit this to "GNU binutils assembler and linker." Even then GNU binutils contains two linkers (BFD and GOLD), or did they excise GOLD already (IIRC, there was some discussion a few years ago about it)?
- shakna 21 days ago
  
  Yeah, didn't mention gas or ld, for similar reasons. I agree that a compiler doesn't necessarily "need" those.
  I don't agree that all the claims are backed up by their own comments, which means that there's probably other places where it falls down.
  Its... Misrepresentation.
  Like Chicken is a Scheme compiler. But they're very up front that it depends on a C compiler.
  Here, they wrote a C compiler that is at least sometimes reliant on having a different C compiler around. So is the project at 50%? 75%?
  Even if its 99%, thats not the same story as they tried to write. And if they wrote that tale instead, it would be more impressive, rather than "There's some holes. How many?"
  
  2 replies →
TheCondor 20 days ago
The assembler seems like nearly the easiest part. Slurp arch manuals and knock it out, it’s fixed and complete.
- jakewins 20 days ago
  
  I am surprised by the number of comments that say the assembler is trivial - it is admittedly perhaps simpler than some other parts of the compiler chain, but it’s not trivial.
  What you are doing is kinda serialising a self-referential graph structure of machine code entries that reference each others addresses, but you don’t know the addresses because the (x86) instructions are variable-length, so you can’t know them until you generate the machine code, chicken-and-egg problem.
  Personally I find writing parsers much much simpler than writing assemblers.
  
  9 replies →
- shakna 20 days ago
  
  Huh. A second person mentioning the assembler. Don't think I ever referred to one...?

brundolf 20 days ago

One thing people have pointed out is that well-specified (even if huge and tedious) projects are an ideal fit for AI, because the loop can be fully closed and it can test and verify the artifact by itself with certainty. Someone was saying they had it generate a rudimentary JS engine because the available test suite is so comprehensive

Not to invalidate this! But it's toward the "well-suited for AI" end of the spectrum

HarHarVeryFunny 20 days ago
Yes - the gcc "torture test suite" that is mentioned must have been one of the enablers for this.
It's notable that the article says Claude was unable to build a working assembler (& linker), which is nominally a much simpler task than building a compiler. I wonder if this was at least in part due to not having a test suite, although it seems one could be auto generated during bootstrapping with gas (GNU assembler) by creating gas-generated (asm, ELF) pairs as the necessary test suite.
It does beg the question of how they got the compiler to point of correctness of generating a valid C -> asm mapping, before tackling the issue of gcc compatibility, since the generated code apparently has no relation to what gcc generates. I wonder which compilers' source code Claude has been trained on, and how closely this compiler's code generation and attempted optimizations compares to those?
- spullara 20 days ago
  
  i'm sure claude has been trained on every open source compiler

qarl 21 days ago

> Still a really cool project!

Yeah. This test sorta definitely proves that AI is legit. Despite the millions of people still insisting it's a hoax.

The fact that the optimizations aren't as good as the 40 year gcc project? Eh - I think people who focus on that are probably still in some serious denial.

PostOnce 21 days ago
It's amazing that it "works", but viability is another issue.
It cost $20,000 and it worked, but it's also totally possible to spend $20,000 and have Claude shit out a pile of nonsense. You won't know until you've finished spending the money whether it will fail or not. Anthropic doesn't sell a contract that says "We'll only bill you if it works" like you can get from a bunch of humans.
Do catastrophic bugs exist in that code? Who knows, it's 100,000 lines, it'll take a while to review.
On top of that, Anthropic is losing money on it.
All of those things combined, viability remains a serious question.
- ryanjshaw 20 days ago
  
  > You won't know until you've finished spending the money whether it will fail or not.
  How do you conclude that? You start off with a bunch of tests and build these things incrementally, why would you spend 20k before realizing there’s a problem?
  
  6 replies →
- qarl 20 days ago
  
  > It cost $20,000
  I'm curious - do you have ANY idea what it costs to have humans write 100,000 lines of code???
  You should look it up. :)
  
  81 replies →
- tumdum_ 21 days ago
  
  > On top of that, Anthropic is losing money on it.
  It seems they are *not* losing money on inference: https://bsky.app/profile/steveklabnik.com/post/3mdirf7tj5s2e
  
  6 replies →
- chamomeal 20 days ago
  
  That's a good point! Here claude opus wrote a C compiler. Outrageously cool.
  Earlier today, I couldn't get opus to replace useEffect-triggered-redux-dispatch nonsense with react-query calls. I already had a very nice react-query wrapper with tons of examples. But it just couldn't make sense of the useEffect rube goldberg machine.
  To be fair, it was a pretty horrible mess of useEffects. But just another data point.
  Also I was hoping opus would finally be able to handle complex typescript generics, but alas...
- georgeven 20 days ago
  
  it's 20,000 in 2026, with the price of tokens halving every year (at a given perf level), this will be around 1,000 dollars in 2030
- RA_Fisher 20 days ago
  
  Progress can be reviewed over time, and I'd think that'd take a lot of the risk out.
- nly 18 days ago
  
  Also, heaven knows if the result in maintainable or easy to change.
- bdangubic 21 days ago
  
  > On top of that, Anthropic is losing money on it
  This has got to be my favorite one of them all that keeps coming up in too many comments… You know who also was losing money in the beginning?! every successful company that ever existed! some like Uber were losing billions for a decade. and when was the last time you rode in a taxi? (I still do, my kid never will). not sure how old you are and if you remember “facebook will never be able to monetize on mobile…” - they all lose money, until they do not
  
  25 replies →
thesz 20 days ago
> This test sorta definitely proves that AI is legit.
This is an "in distribution" test. There are a lot of C compilers out there, including ones with git history, implemented from scratch. "In distribution" tests do not test generalization.
The "out of distribution" test would be like "implement (self-bootstrapping, Linux kernel compatible) C compiler in J." J is different enough from C and I know of no such compiler.
- disgruntledphd2 20 days ago
  
  > This is an "in distribution" test. There are a lot of C compilers out there, including ones with git history, implemented from scratch. "In distribution" tests do not test generalization.
  It's still really, really impressive though.
  Like, economics aside this is amazing progress. I remember GPT3 not being able to hold context for more than a paragraph, we've come a long way since then.
  Hell, I remember bag of words being state of the art when I started my career. We have come a really, really, really long way since then.
  
  5 replies →
- Rudybega 20 days ago
  
  There are two compilers that can handle the Linux kernel. GCC and LLVM. Both are written in C, not Rust. It's "in distribution" only if you really stretch the meaning of the term. A generic C compiler isn't going to be anywhere near the level of rigour of this one.
  
  2 replies →
LinXitoW 20 days ago

How does 20K to replicate code available in the thousands online (toy C compilers) prove anything? It requires a bunch of caveats about things that don't work, it requires a bunch of other tools to do stuff, and an experienced developer had to guide it pretty heavily to even get that lackluster result.
soperj 20 days ago

Only if we take them at their word. I remember thinking things were in a completely different state when Amazon had their shop and go stores, but then finding out it was 1000s of people in Pakistan just watching you via camera.
cardanome 20 days ago

If will write you an C compiler by hand for 19k and it will be better than what Claude made.
Writing a toy C compiler isn't that hard. Any decent programmer can write one in a few weeks or months. The optimizations are the actually interesting part and Claude fails hard at that.
kvemkon 21 days ago
> optimizations aren't as good as the 40 year gcc project
with all optimizations disabled:
> Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
- qarl 21 days ago
  
  That distinction doesn't change my point. I am not surprised that a 40 year old project generates better code than this brand new one.
  
  4 replies →
dwaite 18 days ago

It is legit - with some pretty severe caveats. I am pressed to come up with an example that has more formal specification, published source implementations, and public unit test coverage than a C compiler.
It is not feasible that someone will use AI to tackle genuinely new software and provide a tenth of the level of guide-rails Anthropic had for this project. They were able to keep the million monkeys on their million typewriters on an extremely short leash, and able to have it do the vast majority of iteration without human intervention.
byzantinegene 20 days ago
it costs $20,000 to reinvent the wheel, that it probably trained on. If that's your definition of legit, sure
- organicUser 20 days ago
  
  well, if in this period it is a matter of cost, tomorrow won't be anymore. 4GB of RAM in the 80s would have cost tens of millions of dollars, now even your car runs 4 gb memory only for the infotainment systems, and runs dozens GBs of RAM for the most complex assistants. So i would see this achievement more as a warning, the final result is not what's concerning, it is the premonition behind it
wqaatwt 18 days ago

The full source of several compilers being in its training set is somewhat helpful though. It’s not exactly a novel problem and those optimizations and edge cases which it seemingly is struggling are the overwhelming majority of the work anyway.
Do we know it just didn’t shuffle gcc’s source code around a bit?
miohtama 20 days ago

GCC had 40 years headstart
qarl 20 days ago

[flagged]

ip26 20 days ago

I’m excited and waiting for the team that shows with $20k in credits they can substantially speed up the generated code by improving clang!

byzantinegene 20 days ago

i'm sorry but that will take another $20 billion in AI capex to train our latest SOTA model so that it will cost $20k to improve the code.

9rx 20 days ago

> I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel.

How much of that time was spent writing the tests that they found to use in this experiment? You (or someone like you) were a major contributor to this. All Opus had to do here was keep brute forcing a solution until the tests passed.

It is amazing that it is possible at all, but remains an impossibly without a heavy human hand. One could easily still spend a good part of their career reproducing this if they first had to rewrite all of the tests from scratch.

beambot 21 days ago

This is getting close to a Ken Thompson "Trusting Trust" era -- AI could soon embed itself into the compilers themselves.

bopbopbop7 21 days ago
A pay to use non-deterministic compiler. Sounds amazing, you should start.
- Aurornis 21 days ago
  
  Application-specific AI models can be much smaller and faster than the general purpose, do-everything LLM models. This allows them to run locally.
  They can also be made to be deterministic. Some extra care is required to avoid computation paths that lead to numerical differences on different machines, but this can be accomplished reliably with small models that use integer math and use kernels that follow a specific order of operations. You get a lot more freedom to do these things on the small, application-specific models than you do when you're trying to run a big LLM across different GPU implementations in floating point.
  
  1 reply →
- ndesaulniers 21 days ago
  
  Some people care more about compile times than the performance of generated code. Perhaps even the correctness of generated code. Perhaps more so than determinism of the generated code. Different people in different contexts can have different priorities. Trying to make everyone happy can sometimes lead to making no one happy. Thus dichotomies like `-O2` vs `-Os`.
  EDIT (since HN is preventing me from responding):
  > Some people care more about compiler speed than the correctness?
  Yeah, I think plenty of people writing code in languages that have concepts like Undefined Behavior technically don't really care as much about correctness as they may claim otherwise, as it's pretty hard to write large volumes of code without indirectly relying on UB somewhere. What is correct in such case was left up to interpretation of the implementer by ISO WG14.
  
  10 replies →
ndesaulniers 21 days ago
We're already starting to see people experimenting with applying AI towards register allocation and inlining heuristics. I think that many fields within a compiler are still ripe for experimentation.
https://llvm.org/docs/MLGO.html
int_19h 20 days ago

What I want to know is when we get AI decompilers
Intuitively it feels like it should be a straightforward training setup - there's lots of code out there, so compile it with various compilers, flags etc and then use those pairs of source+binary to train the model.
jojobas 21 days ago

Sorry, clang 26.0 requires an Nvidia B200 to run.
psychoslave 20 days ago

Hmm, well, there are already embedded in fonts: https://hackaday.com/2024/06/26/llama-ttf-is-ai-in-a-font/
sandinmyjoints 20 days ago

Reminds me of https://www.teamten.com/lawrence/writings/coding-machines/
greenavocado 21 days ago

Then i'll be left wondering why my program requires 512TB of RAM to open
andai 21 days ago

The asymmetry will be between the frontier AI's ability to create exploits vs find them.
dnautics 20 days ago

would be hard to miss gigantic kv cache matrix multiplications

iberator 20 days ago

Claude did not wrote it. you wrote it with PREVIOUS EXPERIENCE with 20.000 long commandshyellihg him exactly what to do.

Real usable AI would create it with simple: 'make c compilers c99 faster than GCC'.

AI usage should be banned in general. It takes jobs faster than creating new ones ..

arcanemachiner 20 days ago
That's actually pretty funny. They're patting it on the back for using, in all likelihood, some significant portions of code that they actually wrote, which was stolen from them without attribution so that it could be used as part of a very expensive parlour trick.
- whynotminot 20 days ago
  
  Did you do diffs to confirm the code as stolen or are you just speculating.
embedding-shape 20 days ago

> AI usage should be banned in general. It takes jobs faster than creating new ones ..
I don't have an strong opinion about that in either direction, but curious: Do you feel the same about everything, or is just about this specific technology? For example, should the nail gun have been forbidden if it was invented today, as one person with a nail gun could probably replace 3-4 people with normal "manual" hammers?
You feel the same about programmers who are automating others out of work without the use of AI too?
wiseowise 20 days ago

> It takes jobs faster than creating new ones ..
You think compiler engineer from Google gives a single shit about this?
They’ll automate millions out of career existence for their amusement while cashing out stock money and retiring early comfortably.
benterix 20 days ago

> It takes jobs faster than creating new ones ..
I have no problems with tech making some jobs obsolete, that's normal. The problem is, the job being done with the current generation of LLMs are, at least for now, mostly of inferior quality.
The tools themselves are quite useful as helpers in several domains if used wisely though.
7thpower 20 days ago
Businesses do not exist to create jobs; jobs are a byproduct.
- jaccola 20 days ago
  
  Even that is underselling it; jobs are a necessary evil that should be minimised. If we can have more stuff with fewer people needing to spend their lives providing it, why would we NOT want that?
  
  8 replies →
unglaublich 20 days ago
Jobs are a means, not a goal.
- sc68cal 20 days ago
  
  Jobs are the only way that you survive in this society (food, shelter). Look how we treat unhoused people without jobs. AI is taking jobs away and that is putting people's survival at risk.

MaskRay 21 days ago

I want to verify the claim that it builds the Linux kernel. It quickly runs into errors, but yeah, still pretty cool!

make O=/tmp/linux/x86 ARCH=x86_64 CC=/tmp/p/claudes-c-compiler/target/release/ccc -j30 defconfig all

``` /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:44:184: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) ~0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (~0x80000000); (void)pto_tmp__; } asm ("and" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:49:183: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) 0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (0x80000000); (void)pto_tmp__; } asm ("or" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:61:212: error: expected ';' after expression before 'pao_tmp__' ```

silver_sun 20 days ago

They said it builds Linux 6.9, maybe you are trying to compile a newer version there?

MaskRay 20 days ago

git switch v6.9

The riscv build succeeded. For the x86-64 build I ran into

    % make O=/tmp/linux/x86 ARCH=x86_64 CC=/tmp/p/claudes-c-compiler/target/release/ccc-x86 HOSTCC=/tmp/p/claudes-c-compiler/target/release/ccc-x86 LDFLAGS=-fuse-ld=bfd LD=ld.bfd -j30 vmlinux -k
    make[1]: Entering directory '/tmp/linux/x86'
    ...
      CC      arch/x86/platform/intel/iosf_mbi.o
    ccc: error: lgdtl requires memory operand
      AR      arch/x86/platform/intel-mid/built-in.a
    make[6]: *** [/home/ray/Dev/linux/scripts/Makefile.build:362: arch/x86/realmode/rm/wakeup_asm.o] Error 1
    ld.bfd: arch/x86/entry/vdso/vdso32/sigreturn.o: warning: relocation in read-only section `.eh_frame'
    ld.bfd: error in arch/x86/entry/vdso/vdso32/sigreturn.o(.eh_frame); no .eh_frame_hdr table will be created
    ld.bfd: warning: creating DT_TEXTREL in a shared object
    ccc: error: unsupported pushw operand

There are many other errors.

tinyconfig and allnoconfig have fewer errors.

    RELOCS  arch/x86/realmode/rm/realmode.relocs
    Invalid absolute R_386_32 relocation: real_mode_seg

Still very impressive.

2 replies →

the_jends 21 days ago

Being just a grunt engineer in a product firm I can't imagine being able to spend multiple years on one project. If it's something you're passionate about, that sounds like a dream!

ndesaulniers 20 days ago

This work originally wasn't my 100% project, it was my 20% project (or as I prefer to call it, 120% project).
I had to move teams twice before a third team was able to say: this work is valuable to us, please come work for us and focus just on that.
I had to organize multiple internal teams, then build an external community of contributors to collaborate on this shared common goal.
Having carte blanche to contribute to open source projects made this feasible at all; I can see that being a non-starter at many employers, sadly. Having low friction to change teams also helped a lot.

HarHarVeryFunny 20 days ago

> I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel

Did this come down to making Clang 100% gcc compatible (extensions, UDB, bugs and all), or were there any issues that might be considered as specific to the linux kernel?

Did you end up building a gcc compatability test suite as a part of this? Did the gcc project themselves have a regression/test suite that you were able to use as a starting point?

ndesaulniers 20 days ago

> extensions
Some were necessary (asm goto), some were not (nested functions, flexible array members not at the end of structs).
> UDB, bugs and all
Luckily, the kernel didn't intentionally rely on GCC specifics this way. Where it did unintentionally, we fixed the kernel sources properly with detailed commit messages explaining why.
> or were there any issues that might be considered as specific to the linux kernel?
Yes, https://github.com/ClangBuiltLinux/linux/issues is our issue tracker. We use tags extensively to mark if we triage the issue to be kernel-side vs toolchain-side.
> Did you end up building a gcc compatability test suite as a part of this?
No, but some tricky cases LLVM got wrong were distilled from kernel sources using either:
- creduce - cvise (my favorite) - bugpoint - llvm-reduce
and then added to LLVM's existing test suite. Many such tests were also simply manually written.
> Did the gcc project themselves have a regression/test suite that you were able to use as a starting point?
GCC and binutils have their own test suites. Folks in the LLVM community have worked on being able to test clang against GCC's test suite. I personally have never run GCC's test suite or looked at its sources.

TZubiri 20 days ago

>Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.

It's worth noting that this was developed by compiling Linux and running tests, so at least that is part of the training set and not the testing set.

But at least for linux, I'm guessing the tests are very robust and I'm guessing that will work correctly. That said, if any bugs pop up, it will show weak points in the linux tests.

VladVladikoff 20 days ago

>$20,000 of tokens. >less efficient than existing compilers

what is the ecological cost of producing this piece of software that nobody will ever use?

ryanjshaw 20 days ago

If you evaluate the cost/benefit in isolation? It’s net negative.
If you see this as part of a bigger picture to improve human industrial efficiency and bring us one step closer to the singularity? Most likely net positive.
thefounder 20 days ago

With that way of thinking you would just move in a cave.

grey-area 20 days ago

Isn't the AI basing what it does heavily on the publicly available source code for compilers in C though? Without that work it would not be able to generate this would it? Or in your opinion is it sufficiently different from the work people like you did to be classed as unique creation?

I'm curious on your take on the references the GAI might have used to create such a project and whether this matters.

zaphirplane 21 days ago

What were the challenges out of interest. Some of it is the use of gcc extensions? Which needed an equivalent and porting over to the equivalent

ndesaulniers 21 days ago

`asm goto` was the big one. The x86_64 maintainers broke the clang builds very intentionally just after we had gotten x86_64 building (with necessary patches upstreamed) by requiring compiler support for that GNU C extension. This was right around the time of meltdown+spectre, and the x86_64 maintainers didn't want to support fallbacks for older versions of GCC (and ToT Clang at the time) that lacked `asm goto` support for the initial fixes shipped under duress (embargo). `asm goto` requires plumbing throughout the compiler, and I've learned more about register allocation than I particularly care...
Fixing some UB in the kernel sources, lots of plumbing to the build system (particularly making it more hermetic).
Getting the rest of the LLVM binutils substitutes to work in place of GNU binutils was also challenging. Rewriting a fair amount of 32b ARM assembler to be "unified syntax" in the kernel. Linker bugs are hard to debug. Kernel boot failures are hard to debug (thank god for QEMU+gdb protocol). Lots of people worked on many different parts here, not just me.
Evangelism and convincing upstream kernel developers why clang support was worth anyones while.
https://github.com/ClangBuiltLinux/linux/issues for a good historical perspective. https://github.com/ClangBuiltLinux/linux/wiki/Talks,-Present... for talks on the subject. Keynoting LLVM conf was a personal highlight (https://www.youtube.com/watch?v=6l4DtR5exwo).

m463 20 days ago

> getting Clang to build the linux kernel.

wonder if clang source is part of its model :)

ur-whale 20 days ago

> This LLM did it

You do realize the LLM had access (via his training set) and "reused" (not as is, of course) your own work, right?

phillmv 21 days ago

i mean… your work also went into the training set, so it's not entirely surprising that it spat a version back out!

underdeserver 21 days ago
Anthropic's version is in Rust though, so at least a little different.
- ndesaulniers 21 days ago
  
  There's parts of LLVM architecture that are long in the tooth (IMO) (as is the language it's implemented in, IMO).
  I had hoped one day to re-implement parts of LLVM itself in Rust; in particular, I've been curious if we can concurrently compile C (and parse C in parallel, or lazily) that haven't been explored in LLVM, and I think might be safer to do in Rust. I don't know enough about grammers to know if it's technically impossible, but a healthy dose of ignorance can sometimes lead to breakthroughs.
  LLVM is pretty well designed for test. I was able to implement a lexer for C in Rust that could lex the Linux kernel, and use clang to cross check my implementation (I would compare my interpretation of the token stream against clang's). Just having a standard module system makes having reusable pieces seems like perhaps a better way to compose a toolchain, but maybe folks with more experience with rustc have scars to disagree?
  
  2 replies →
- yoz-y 21 days ago
  
  One thing LLMs are really good at is translation. I haven’t tried porting projects from one language to another, but it wouldn’t surprise me if they were particularly good at that too.
  
  1 reply →
- rwmj 21 days ago
  
  It's not really important in latent space / conceptually.
  
  1 reply →
GaggiX 21 days ago
Clang is not written in Rust tho
- underdeserver 21 days ago
  
  jinx

jbjbjbjb 21 days ago

It’s cool but there’s a good chance it’s just copying someone else’s homework albeit in an elaborate round about way.

nomel 21 days ago
I would claim that LLMs desperately need proprietary code in their training, before we see any big gains in quality.
There's some incredible source available code out there. Statistically, I think there's a LOT more not so great source available code out there, because the majority of output of seasoned/high skill developers is proprietary.
To me, a surprising portion of Claude 4.5 output definitely looks like student homework answers, because I think that's closer to the mean of the code population.
- dcre 20 days ago
  
  This is dead wrong: essentially the entirety of the huge gains in coding performance in the past year have come from RL, not from new sources of training data.
  I echo the other commenters that proprietary code isn’t any better, plus it doesn’t matter because when you use LLMs to work on proprietary code, it has the code right there.
  
  11 replies →
- bearjaws 21 days ago
  
  I will say many closed source repos are probably equally as poor as open source ones.
  Even worse in many cases because they are so over engineered nobody understands how they work.
  
  2 replies →
- bhadass 21 days ago
  
  yeah, but isn't the whole point of claude code to get people to provide preference data/telemetry data to anthropic (unless you opt out?). same w/ other providers.
  i'm guessing most of the gains we've seen recently are post training rather than pretraining.
  
  1 reply →
- typ 21 days ago
  
  I'd bet, on average, the quality of proprietary code is worse than open-source code. There have been decades of accumulated slop generated by human agents with wildly varied skill levels, all vibe-coded by ruthless, incompetent corporate bosses.
  
  6 replies →
- andai 21 days ago
  
  Let's start with the source code for the Flash IDE :)
wvenable 21 days ago
This is cool and actually demonstrates real utility. Using AI to take something that already exists and create it for a different library / framework / platform is cool. I'm sure there's a lot of training data in there for just this case.
But I wonder how it would fare given a language specification for a non-existent non-trivial language and build a compiler for that instead?
- nmstoker 21 days ago
  
  If you come up with a realistic language spec and wait maybe six months, by then it'll probably be approach being cheap enough that you could test the scenario yourself!
luke5441 21 days ago
It looks like a much more progressed/complete version of https://github.com/kidoz/smdc-toolchain/tree/master/crates/s... . But that one is only a month old. So a bit confused there. Maybe that was also created via LLM?
- madmax911 21 days ago
  
  [dead]
nlawalker 21 days ago

I see that as the point that all this is proving - most people, most of the time, are essentially reinventing the wheel at some scope and scale or another, so we’d all benefit from being able to find and copy each others’ homework more efficiently.
computerex 21 days ago

And the goal post shifts.
kreelman 21 days ago

..A small thing, but it won't compile the RISCV version of hello.c if the source isn't installed on the machine it's running on.
It is standing on the shoulders of giants (all of the compilers of the past, built into it's training data... and the recent learnings about getting these agents to break up tasks) to get itself going. Still fairly impressive.
On a side-quest, I wonder where Anthropic is getting there power from. The whole energy debacle in the US at the moment probably means it made some CO2 in the process. Would be hard to avoid?

tdemin 20 days ago

[dead]

eek2121 21 days ago

Also: a large amount of folks seem to think Claude code is losing a ton of money. I have no idea where the final numbers land, however, if the $20,000 figure is accurate and based on some of the estimates I've seen, they could've hired 8 senior level developers at a quarter million a year for the same amount of money spent internally.

Granted, marketing sucks up far too much money for any startup, and again, we don't know the actual numbers in play, however, this is something to keep in mind. (The very same marketing that likely also wrote the blog post, FWIW).

willsmith72 21 days ago
this doesn't add up. the 20k is in API costs. people talk about CC losing money because it's way more efficient than the API. I.e. the same work with efficient use of CC might have cost ~$5k.
but regardless, hiring is difficult and high-end talent is limited. If the costs were anywhere close to equivalent, the agents are a no-brainer
- majormajor 21 days ago
  
  CC hits their APIs, And internally I'm sure Anthropic tracks those calls, which is what they seem to be referencing here. What exactly did Anthropic do in this test to have "inefficient use of CC" vs your proposed "efficient use of CC"?
  Or do you mean that if an external user replicated this experience they might get billed less than $20k due to CC being sold at lower rates than per-API-call metered billing?
- NitpickLawyer 20 days ago
  
  > hiring is difficult and high-end talent is limited.
  Not only that, but firing talent is also a pain. You can't "hire" 10 devs for 2 weeks, and fire them afterwards. At least you can't keep doing that, people talk and no one would apply.
GorbachevyChase 21 days ago

Even if the dollar cost for product created was the same, the flexibility of being able to spin a team up and down with an API call is a major advantage. That AI can write working code at all is still amazing to me.
bloaf 21 days ago

This thing was done in 2 weeks. In the orgs I've worked in, you'd be lucky to get HR approval to create a job posting within 2 weeks.