← Back to context

Comment by lateforwork

6 days ago

These days there's an even easier way to learn to write a compiler. Just ask Claude to write a simple compiler. Here's a simple C compiler (under 1500 lines) written by Claude: https://github.com/Rajeev-K/c-compiler It can compile and run C programs for sorting and searching. The code is very readable and very easy to understand.

For those of us that learn better by taking something and tinkering with it this is definitely the better approach.

Ive never been a good book learner but I love taking apart and tinkering with something to learn. A small toy compiler is way better than any book and its not like the LLM didnt absorb the book anyways during training.

  • Exactly! Writing a compiler is not rocket science if you know assembly language. You can pick up the gist in an hour or two by looking at a simple toy compiler.

why read that, vs an actually well-written compiler though?

  • Because an actual compiler would be tens of thousands of lines and most of it is going to be perf optimization. If you want to get the big picture first, read a simple working compiler that has all the key parts, such as a lexer, abstract syntax tree, parser, code generator and so on.

    • Is it less work than finding a human authored toy compiler of good quality. How long did it take to generate?

I did not and will not run this on my computer but it looks like while loops are totally broken; note how poor the test coverage is. This is just my quick skimming of the code. Maybe it works perfectly and I am dumber than a computer.

Regardless, it is incredibly reckless to ask Claude to generate assembly if you don't understand assembly, and it's irresponsible to recommend this as advice for newbies. They will not be able to scan the source code for red flags like us pros. Nor will they think "this C compiler is totally untrustworthy, I should test it on a VM."

  • Are you concerned that the compiler might generate code that takes over your computer? If so the provided Dockerfile runs the generated code in a container.

    Regarding test coverage, this is a toy compiler. Don't use it to compile production code! Regarding while loops and such, again, this is a simple compiler intended only to compile sort and search functions written in C.

    • No, the problem is much more basic than "taking over your computer," it looks like the compiler generates incorrect assembly. Upon visual inspection I found a huge class of infinite loops, but I am sure there are subtle bugs that can corrupt running user/OS processes... including Docker, potentially. Containerization does not protect you from sloppy native code.

      > Don't use it to compile production code!

      This is an understatement. A more useful warning would be "don't use it to compile any code with a while loop." Seriously, this compiler looks terrible. Worse than useless.

      If you really want AI to make a toy compiler just to help you learn, use Python or Javascript as a compilation target, so that the LLM's dumb bugs are mostly contained, and much easier to understand. Learn assembly programming separately.

      6 replies →