The "week-end" part refers to the fact that I wrote it in one week-end with little to no experience in compiler design.
You can check the commit history, 12 to 14 Jan, 2024 :)
The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
> The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”
This is why teaching material should be tailored towards teaching rather than implementing demented stuff from the real world just because it is "there." You can easily implement a parser that ignores precedence and tell the reader that we are using brackets to force precedence instead.
In a real-word parser, you might use something like this[1] or some other home-grown algorithm if you didn't know about it. Doesn't matter as long as the parser works.
Is that really the hard part? I figured that part out before I knew much programming: made a simple Excel type formula parser. (A horrible, horrible solution based around replacing strings and manually counting parentheses... but it worked!)
I never got much farther than that though, I've looked into making a compiler many times and got overwhelmed every time.
well, in my personal experience, parsing is the easiest part. it's what you do with the AST once you have one that is the hard part. but for some reason an overwhelming amount of writing is about frontends when you could teach the concepts to a layman with some boxes and arrows.
I think the hardest part is function calls. Implementing the standard calling convention is really fiddly and annoying when you have to save and restore registers around function calls if you want to do it efficiently (keeping track of which registers are dirty, doing parallel loads of registers into the right argument registers, spilling when appropriate, etc. It is fiddly.
The alternative is going to an IR and doing full register allocation but then you need to implement Lengauer-Tarjan to get into SSA, all the same parallel load stuff for phi functions/block arguments, out-of-SSA, reconstruct in optimisation passes all the information you discard by going to a linear IR, etc.
If all you do is construct an AST and interpret it it's not really a compiler is it. It doesn't compile from source to target. At most you're describing a compiler front end (arguably), or an interpreter.
I would expect:
lexer + parser + AST definition and construction + semantic analysis/type checking + X + codegen to ASM/WASM/C
where X includes
- definition of an intermediate representation (IR)
- lowering AST to IR
- static analysis
- improvers/optimisation passes (at least some simple stuff)
- code gen including register allocation
This can still be a simple "complicated" program, but there's more to a compiler than an AST interpreter.
> If all you do is construct an AST and interpret it it's not really a compiler is it.
What if I dumped the AST to disk, called it "bytecode" and updated the "interpreter" to run this "bytecode"? Would you consider that to be a compiler? :-)
> where X includes
I don't consider any of these things to be essential. There are entire languages out there which run on interpreters of some kind. There are others which compile down to C/JS. Many compilers use LLVM as their backend. By your definition, none of them use (or are) compilers.
Honestly, antlr made this pretty straightforward to me. I didn't want to work in java, but they have a ton of targets. You can definitely write it all yourself, and that's a great learning exercise. But I wanted to get a parser for a language idea I had in mind and it took a couple of days with antlr (https://github.com/chicory-lang/chicory)
If you know basic programming (js/python), a day is more than enough to get the concept across using a tiny language. Could probably be done in an hour.
The problem, always, is people are given terrible reading recommendations which makes the subject more complicated than it is.
All a compiler does is take a well defined data structure and transform it into another, sometimes by encoding or sometimes by decoding. Really it's bread and butter stuff that every engineer does everyday but with the benefit that you can say something is undefined/out of scope way easier than talking to APIs you don't control.
The fun part are the magic algorithms you sometimes get to use
> All a compiler does is take a well defined data structure and transform it into another
Sorry, having worked on language semantics topics, "well-defined" is not how I would describe either a programming language or the behavior of machine code.
The "week-end" part refers to the fact that I wrote it in one week-end with little to no experience in compiler design. You can check the commit history, 12 to 14 Jan, 2024 :)
I think the "week-end" reference up above was a little tongue in cheek, since the modern spelling is a compound word: https://linguix.com/english/common-mistake/week_end_hyphen
Not to take away from this awesome sharing.
Compilers are some of the simplest "complicated" programs out there if you start by throwing yacc/bison/antlr/parser generators in the garbage.
Production compilers are complicated because of the feature set of languages and performance requirements.
You can write a lexer + parser + treewalk interpreter for a simple language in a day if you know what you are doing.
The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
Get:
working and you’re 80% there.
> The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”
This is why teaching material should be tailored towards teaching rather than implementing demented stuff from the real world just because it is "there." You can easily implement a parser that ignores precedence and tell the reader that we are using brackets to force precedence instead.
In a real-word parser, you might use something like this[1] or some other home-grown algorithm if you didn't know about it. Doesn't matter as long as the parser works.
[1] Top-Down operator precedence (Pratt) parsing https://eli.thegreenplace.net/2010/01/02/top-down-operator-p...
Is that really the hard part? I figured that part out before I knew much programming: made a simple Excel type formula parser. (A horrible, horrible solution based around replacing strings and manually counting parentheses... but it worked!)
I never got much farther than that though, I've looked into making a compiler many times and got overwhelmed every time.
2 replies →
is not a simple language.
would be a simple and precise to parse language, and the typical compiler for this is written in a day. Search for SIOD
Pratt parsers are a really nice way to handle mathematical expressions like this vs. the usual recursive descent.
https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-...
well, in my personal experience, parsing is the easiest part. it's what you do with the AST once you have one that is the hard part. but for some reason an overwhelming amount of writing is about frontends when you could teach the concepts to a layman with some boxes and arrows.
I think the hardest part is function calls. Implementing the standard calling convention is really fiddly and annoying when you have to save and restore registers around function calls if you want to do it efficiently (keeping track of which registers are dirty, doing parallel loads of registers into the right argument registers, spilling when appropriate, etc. It is fiddly.
The alternative is going to an IR and doing full register allocation but then you need to implement Lengauer-Tarjan to get into SSA, all the same parallel load stuff for phi functions/block arguments, out-of-SSA, reconstruct in optimisation passes all the information you discard by going to a linear IR, etc.
7 replies →
> lexer + parser + treewalk interpreter
If all you do is construct an AST and interpret it it's not really a compiler is it. It doesn't compile from source to target. At most you're describing a compiler front end (arguably), or an interpreter.
I would expect:
lexer + parser + AST definition and construction + semantic analysis/type checking + X + codegen to ASM/WASM/C
where X includes
This can still be a simple "complicated" program, but there's more to a compiler than an AST interpreter.
EDIT: I notice that the author of the original article has also started work on an optimizing compiler: https://ssloy.github.io/tinyoptimizer/
> If all you do is construct an AST and interpret it it's not really a compiler is it.
What if I dumped the AST to disk, called it "bytecode" and updated the "interpreter" to run this "bytecode"? Would you consider that to be a compiler? :-)
> where X includes
I don't consider any of these things to be essential. There are entire languages out there which run on interpreters of some kind. There are others which compile down to C/JS. Many compilers use LLVM as their backend. By your definition, none of them use (or are) compilers.
7 replies →
Honestly, antlr made this pretty straightforward to me. I didn't want to work in java, but they have a ton of targets. You can definitely write it all yourself, and that's a great learning exercise. But I wanted to get a parser for a language idea I had in mind and it took a couple of days with antlr (https://github.com/chicory-lang/chicory)
Problem with these tools is, you have to spend time wrangling them instead of learning to write a lexer/parser yourself.
A recursive-descent parser is a beautiful thing and can be implemented very quickly.
5 days if you don't, but have the right tutor.
If you know basic programming (js/python), a day is more than enough to get the concept across using a tiny language. Could probably be done in an hour.
The problem, always, is people are given terrible reading recommendations which makes the subject more complicated than it is.
7 replies →
All a compiler does is take a well defined data structure and transform it into another, sometimes by encoding or sometimes by decoding. Really it's bread and butter stuff that every engineer does everyday but with the benefit that you can say something is undefined/out of scope way easier than talking to APIs you don't control.
The fun part are the magic algorithms you sometimes get to use
> All a compiler does is take a well defined data structure and transform it into another
Sorry, having worked on language semantics topics, "well-defined" is not how I would describe either a programming language or the behavior of machine code.
There should be a well-defined list of undefined behaviors.
One weekend is enough to survey the general concepts. That’s extremely useful on its own, as now you have a better idea of how to orient yourself.
You set your own bar. you could understand "how a compiler" in a day, but how much do you want to know?