I appreciate how wonderfully simple and dependency free this is. More often than not people just want to write a compiler of sorts without bison or yacc, or LLVM, and just convert expressions into assembler or vm-like instructions that _just run_. This is a great starting point for that, and I wish I had something like it 10 years ago.
(Crenshaw's Let's Build a Compiler is an excellent source too if you want to go one step lower and go standard library free, focusing only on function call stacks:
https://compilers.iecc.com/crenshaw/)
I’d argue that you should start with doing an interpreter and push it as long as you can.
But once you’re ready to generate code, use llvm or something like that because you are going to hit very problematic roadblocks early in your effort that will kill your interest in compilers otherwise.
Some examples are: ensuring you can compare expressions for equality, ensuring that changed expressions don’t violate dominance, SSA conversion, avoiding infinite loops when analyzing cfgs. If you don’t start with knowledge of several things like this, you are looking at rewriting your compiler a dozen times. That is a great way to learn about compilers, but perhaps not the first compiler you make.
That's right. Crenshaw's LBAC can make you an interpreter too. LBAC's magic lies in single token read ahead, single character identifiers, and single character digits. It parses its BNF grammar only with recursive function calls (no stacks, no tables for operator precedence, etc). The final language you build is almost useless, but it is done without a standard library (eg. resizable strings, resizable arrays, stacks, queues, associative containers), and serves as an introduction to get one hooked on more advanced compiler theory like SSA. I love OP's work. I consider OP's implementation (and similarities) to be a modern day LBAC. It's just that LBAC is one of those things I'd take to the grave. I have never found such a concise introductory text that produces a working final example in such delightful piecewise prose.
But if otherwise anyone want to try with Yacc/Lex I have an online interpreter/editor with more than 250 examples to try/study at https://mingodad.github.io/parsertl-playground/playground/ and just included a grammar for tinycompiler there (select "Tinycompiler parser" from "Examples" then click "Parse" to see a parse tree for the content in "Input source").
lovely from-scratch implementation, without the magic and complexity of lex and yacc. brings back memories from 30+ years ago when I implemented a subset of C in C using lex and yacc in a compiler course.
in your blog, i would have liked to see the BNF grammar (without having to peek into the code).
now, heres your next assignment :) - do the same for functional programming (lambda calc) or logic programming (prolog). That would round out the education on languages and compilers for computation.
Only a happy customer, but if you want an amazing experience building a compiler I highly recommend David Beazley's Compiler Course (or anything else you can sign up for) https://www.dabeaz.com/
While I loved this week-long course, I was a bit disappointed that we offloaded all of the code generation to LLVM. I think it would be fantastic to stretch this into a two-week course and actually emit assembly.
When I took the course, each person did their own thing when it came to code generation. I don't recall anyone using LLVM. I think emitting assembly is easier than using LLVM.
The fire.wend code[1] made me smile! I love how oneself tries to push boundaries with a language that barely can do anything, just do prove it to yourself that is works.
Funnily enough, wend looks like what fun programming means to me. C like (prefixed types) syntax, strongly typed but without heartaches of pointers, and simple types.
Every features on top of that has either leaky abstractions and/or nightmare scenarios.
That said I'm not claiming the world should run on this type of code. Or should it.
I think OP is just trying to demystify the complexity of compliers for the sake of an educational resource. It's high quality work, compressed to a very small line count
Thanks! I appreciate your discussion of semantic analysis for wend. I've literally just embarked on creating a (yet another) "compiles to JSX" language[0][1]. I'm using ANTLR for my lexer/parser, but I've just hit a point where I can start doing semantic analysis :)
That sounds interesting! He seems to have four tiny renderers pinned on his GitHub page; is https://github.com/ssloy/tinyrenderer the one you're recommending? What do you like about it?
The "week-end" part refers to the fact that I wrote it in one week-end with little to no experience in compiler design.
You can check the commit history, 12 to 14 Jan, 2024 :)
The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
If all you do is construct an AST and interpret it it's not really a compiler is it. It doesn't compile from source to target. At most you're describing a compiler front end (arguably), or an interpreter.
I would expect:
lexer + parser + AST definition and construction + semantic analysis/type checking + X + codegen to ASM/WASM/C
where X includes
- definition of an intermediate representation (IR)
- lowering AST to IR
- static analysis
- improvers/optimisation passes (at least some simple stuff)
- code gen including register allocation
This can still be a simple "complicated" program, but there's more to a compiler than an AST interpreter.
Honestly, antlr made this pretty straightforward to me. I didn't want to work in java, but they have a ton of targets. You can definitely write it all yourself, and that's a great learning exercise. But I wanted to get a parser for a language idea I had in mind and it took a couple of days with antlr (https://github.com/chicory-lang/chicory)
All a compiler does is take a well defined data structure and transform it into another, sometimes by encoding or sometimes by decoding. Really it's bread and butter stuff that every engineer does everyday but with the benefit that you can say something is undefined/out of scope way easier than talking to APIs you don't control.
The fun part are the magic algorithms you sometimes get to use
> All a compiler does is take a well defined data structure and transform it into another
Sorry, having worked on language semantics topics, "well-defined" is not how I would describe either a programming language or the behavior of machine code.
On the same way than cproc+QBE I guess, 70% of gcc speed in my benchmarks.
The more real-life alternatives to those abominations of gcc and clang, the merrier.
I wish the linux kernel devs did care to keep the door _reasonably_ open for such compilers (with assembler source files as alternative to inline assembly, some extensions avoidance and niche expensive compiler features).
This is another case of why super complex syntax computer languages should be avoided like hell (c++, rust, etc).
QBE is ... problematic. The source has no comments, single letter variable names. It is not written to be extended by anyone else other than the author.
I would recommend libfirm over QBE by at least 10km.
It won't surprise you to learn that I'm always interested in hearing about alternatives to QBE, but from the documentation, libfirm sounds more like a heavyweight alternative to LLVM, not a lightweight alternative to QBE. How big is it?
On the topic of QBE, I've always felt that someone aiming to do the same scope of project ought to make their IR a syntactic subset of LLVM IR. If you do that, your test suite can involve invoking LLVM for a comparison.
As for QBE itself, many of the core transformations are fairly standard, which makes it somewhat more approachable for me (e.g. Cooper et al's dominance algorithm, Rideau et al's parallel moves algorithm, etc.). Of course, this doesn't negate the fact that it's not as "hackable" as they probably intended.
But it is written in plain and simple C99, so it is at least much less toxic than LLVM.
I wonder how cparser (did not check if it was plain and simple C)+libfirm compare in performance to cproc+QBE on my benchmarks. May have to take some time to check that.
Whatever the results, it is always good to have, again, a real life alternative for optimizing C toolchains.
The main issues are the heavy usage of gcc extensions by linux (and glibc, etc).
> Personally, I get a twisted pleasure wearing a T-shirt with this print in the 2020s, even though I hate being a walking billboard. But the number of angry comments from confused people makes up for any moral discomfort of advertising on my belly!
It's possible the 'angry comments' are from people who may have mistaken it for an antisemitic, white supremacist symbol[1].
I appreciate how wonderfully simple and dependency free this is. More often than not people just want to write a compiler of sorts without bison or yacc, or LLVM, and just convert expressions into assembler or vm-like instructions that _just run_. This is a great starting point for that, and I wish I had something like it 10 years ago.
(Crenshaw's Let's Build a Compiler is an excellent source too if you want to go one step lower and go standard library free, focusing only on function call stacks: https://compilers.iecc.com/crenshaw/)
I’d argue that you should start with doing an interpreter and push it as long as you can.
But once you’re ready to generate code, use llvm or something like that because you are going to hit very problematic roadblocks early in your effort that will kill your interest in compilers otherwise.
Some examples are: ensuring you can compare expressions for equality, ensuring that changed expressions don’t violate dominance, SSA conversion, avoiding infinite loops when analyzing cfgs. If you don’t start with knowledge of several things like this, you are looking at rewriting your compiler a dozen times. That is a great way to learn about compilers, but perhaps not the first compiler you make.
That's right. Crenshaw's LBAC can make you an interpreter too. LBAC's magic lies in single token read ahead, single character identifiers, and single character digits. It parses its BNF grammar only with recursive function calls (no stacks, no tables for operator precedence, etc). The final language you build is almost useless, but it is done without a standard library (eg. resizable strings, resizable arrays, stacks, queues, associative containers), and serves as an introduction to get one hooked on more advanced compiler theory like SSA. I love OP's work. I consider OP's implementation (and similarities) to be a modern day LBAC. It's just that LBAC is one of those things I'd take to the grave. I have never found such a concise introductory text that produces a working final example in such delightful piecewise prose.
Shameless plug for another relatively small compiler written in Python with no dependencies: http://cwerg.org
But if otherwise anyone want to try with Yacc/Lex I have an online interpreter/editor with more than 250 examples to try/study at https://mingodad.github.io/parsertl-playground/playground/ and just included a grammar for tinycompiler there (select "Tinycompiler parser" from "Examples" then click "Parse" to see a parse tree for the content in "Input source").
lovely from-scratch implementation, without the magic and complexity of lex and yacc. brings back memories from 30+ years ago when I implemented a subset of C in C using lex and yacc in a compiler course.
in your blog, i would have liked to see the BNF grammar (without having to peek into the code).
now, heres your next assignment :) - do the same for functional programming (lambda calc) or logic programming (prolog). That would round out the education on languages and compilers for computation.
Not to take away from the author or you, but we had things like this 10 years ago https://github.com/ymyzk/tinyc/tree/master/tinyc
Unless I am mistaken, this uses LLVM?
2 replies →
Only a happy customer, but if you want an amazing experience building a compiler I highly recommend David Beazley's Compiler Course (or anything else you can sign up for) https://www.dabeaz.com/
While I loved this week-long course, I was a bit disappointed that we offloaded all of the code generation to LLVM. I think it would be fantastic to stretch this into a two-week course and actually emit assembly.
When I took the course, each person did their own thing when it came to code generation. I don't recall anyone using LLVM. I think emitting assembly is easier than using LLVM.
I would love a a longer course.
The fire.wend code[1] made me smile! I love how oneself tries to push boundaries with a language that barely can do anything, just do prove it to yourself that is works.
Code is 10/10. Go for merge! :-)
- [1] https://github.com/ssloy/tinycompiler/blob/main/test-program...
Funnily enough, wend looks like what fun programming means to me. C like (prefixed types) syntax, strongly typed but without heartaches of pointers, and simple types.
Every features on top of that has either leaky abstractions and/or nightmare scenarios.
That said I'm not claiming the world should run on this type of code. Or should it.
I think OP is just trying to demystify the complexity of compliers for the sake of an educational resource. It's high quality work, compressed to a very small line count
Well, Python and Pascal were designed as a teaching language, and ended up being remarkably good for actual programming.
13 replies →
I'm aware people here frown upon paid content but I am currently taking this course and it's excellent:
https://pikuma.com/courses/create-a-programming-language-com...
Hey there. Thanks for the mention and for the kind words.
Thanks! I appreciate your discussion of semantic analysis for wend. I've literally just embarked on creating a (yet another) "compiles to JSX" language[0][1]. I'm using ANTLR for my lexer/parser, but I've just hit a point where I can start doing semantic analysis :)
[0]: https://chicory-lang.github.io/
[1]: https://github.com/chicory-lang/chicory
Great project, this is how language exploration should start, and then carry it from there.
I've been a big fan of the author's tiny renderer. Nice to see a tiny compiler too!
That sounds interesting! He seems to have four tiny renderers pinned on his GitHub page; is https://github.com/ssloy/tinyrenderer the one you're recommending? What do you like about it?
Pretty neat. I would have expected it to take a bit longer, but I suppose the optimizing step is the real magic in a compiler.
I don't think one can understand compilers in a "week-end".
The "week-end" part refers to the fact that I wrote it in one week-end with little to no experience in compiler design. You can check the commit history, 12 to 14 Jan, 2024 :)
I think the "week-end" reference up above was a little tongue in cheek, since the modern spelling is a compound word: https://linguix.com/english/common-mistake/week_end_hyphen
Not to take away from this awesome sharing.
Compilers are some of the simplest "complicated" programs out there if you start by throwing yacc/bison/antlr/parser generators in the garbage.
Production compilers are complicated because of the feature set of languages and performance requirements.
You can write a lexer + parser + treewalk interpreter for a simple language in a day if you know what you are doing.
The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
Get:
working and you’re 80% there.
15 replies →
> lexer + parser + treewalk interpreter
If all you do is construct an AST and interpret it it's not really a compiler is it. It doesn't compile from source to target. At most you're describing a compiler front end (arguably), or an interpreter.
I would expect:
lexer + parser + AST definition and construction + semantic analysis/type checking + X + codegen to ASM/WASM/C
where X includes
This can still be a simple "complicated" program, but there's more to a compiler than an AST interpreter.
EDIT: I notice that the author of the original article has also started work on an optimizing compiler: https://ssloy.github.io/tinyoptimizer/
8 replies →
Honestly, antlr made this pretty straightforward to me. I didn't want to work in java, but they have a ton of targets. You can definitely write it all yourself, and that's a great learning exercise. But I wanted to get a parser for a language idea I had in mind and it took a couple of days with antlr (https://github.com/chicory-lang/chicory)
1 reply →
5 days if you don't, but have the right tutor.
8 replies →
All a compiler does is take a well defined data structure and transform it into another, sometimes by encoding or sometimes by decoding. Really it's bread and butter stuff that every engineer does everyday but with the benefit that you can say something is undefined/out of scope way easier than talking to APIs you don't control.
The fun part are the magic algorithms you sometimes get to use
> All a compiler does is take a well defined data structure and transform it into another
Sorry, having worked on language semantics topics, "well-defined" is not how I would describe either a programming language or the behavior of machine code.
1 reply →
One weekend is enough to survey the general concepts. That’s extremely useful on its own, as now you have a better idea of how to orient yourself.
You set your own bar. you could understand "how a compiler" in a day, but how much do you want to know?
On the same way than cproc+QBE I guess, 70% of gcc speed in my benchmarks.
The more real-life alternatives to those abominations of gcc and clang, the merrier.
I wish the linux kernel devs did care to keep the door _reasonably_ open for such compilers (with assembler source files as alternative to inline assembly, some extensions avoidance and niche expensive compiler features).
This is another case of why super complex syntax computer languages should be avoided like hell (c++, rust, etc).
QBE is ... problematic. The source has no comments, single letter variable names. It is not written to be extended by anyone else other than the author.
I would recommend libfirm over QBE by at least 10km.
https://libfirm.github.io/ https://github.com/libfirm/libfirm
It won't surprise you to learn that I'm always interested in hearing about alternatives to QBE, but from the documentation, libfirm sounds more like a heavyweight alternative to LLVM, not a lightweight alternative to QBE. How big is it?
5 replies →
On the topic of QBE, I've always felt that someone aiming to do the same scope of project ought to make their IR a syntactic subset of LLVM IR. If you do that, your test suite can involve invoking LLVM for a comparison.
As for QBE itself, many of the core transformations are fairly standard, which makes it somewhat more approachable for me (e.g. Cooper et al's dominance algorithm, Rideau et al's parallel moves algorithm, etc.). Of course, this doesn't negate the fact that it's not as "hackable" as they probably intended.
Missing RISC-V code generation.
But it is written in plain and simple C99, so it is at least much less toxic than LLVM.
I wonder how cparser (did not check if it was plain and simple C)+libfirm compare in performance to cproc+QBE on my benchmarks. May have to take some time to check that.
Whatever the results, it is always good to have, again, a real life alternative for optimizing C toolchains.
The main issues are the heavy usage of gcc extensions by linux (and glibc, etc).
2 replies →
Circle frontend, that includes all C++17 (when it was current) and the Rust related extensions, was implemented by a single guy.
Relativity was discovered by a single guy.
8 replies →
You mean Circle can output correct intermediate code for the latest gcc code? Or valve mesa aco code?
2 replies →
Interesting off-topic—I noticed this paragraph:
> Personally, I get a twisted pleasure wearing a T-shirt with this print in the 2020s, even though I hate being a walking billboard. But the number of angry comments from confused people makes up for any moral discomfort of advertising on my belly!
It's possible the 'angry comments' are from people who may have mistaken it for an antisemitic, white supremacist symbol[1].
[1]: https://www.adl.org/resources/hate-symbol/not-equal
Hum. Did not anticipate that, thank you for pointing out.
Where is that paragraph?
I removed it, no need to offend people with racial slur, especially when it is not intended.
1 reply →
[dead]
[dead]