Comment by whartung
2 days ago
The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
Get:
(a + c/2) * sqrt(b)
working and you’re 80% there.
> The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”
This is why teaching material should be tailored towards teaching rather than implementing demented stuff from the real world just because it is "there." You can easily implement a parser that ignores precedence and tell the reader that we are using brackets to force precedence instead.
In a real-word parser, you might use something like this[1] or some other home-grown algorithm if you didn't know about it. Doesn't matter as long as the parser works.
[1] Top-Down operator precedence (Pratt) parsing https://eli.thegreenplace.net/2010/01/02/top-down-operator-p...
Is that really the hard part? I figured that part out before I knew much programming: made a simple Excel type formula parser. (A horrible, horrible solution based around replacing strings and manually counting parentheses... but it worked!)
I never got much farther than that though, I've looked into making a compiler many times and got overwhelmed every time.
Good to know I wasn't alone in writing terrible expression parsers. Since I knew absolutely nothing about parsing, my parser/interpreter consisted of splitting on whitespace, followed by linear scanning to find the most high precedence operation, repeated until a single token remains (basically how a human would do it).
It's hard if:
1. You dont recognise the recursive nature of the problem,
Or
2. You don't recognise the stack-based nature in the problem (shunting yard algo, etc).
IOW, you did it the hard way :-).
is not a simple language.
would be a simple and precise to parse language, and the typical compiler for this is written in a day. Search for SIOD
Pratt parsers are a really nice way to handle mathematical expressions like this vs. the usual recursive descent.
https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-...
well, in my personal experience, parsing is the easiest part. it's what you do with the AST once you have one that is the hard part. but for some reason an overwhelming amount of writing is about frontends when you could teach the concepts to a layman with some boxes and arrows.
I think the hardest part is function calls. Implementing the standard calling convention is really fiddly and annoying when you have to save and restore registers around function calls if you want to do it efficiently (keeping track of which registers are dirty, doing parallel loads of registers into the right argument registers, spilling when appropriate, etc. It is fiddly.
The alternative is going to an IR and doing full register allocation but then you need to implement Lengauer-Tarjan to get into SSA, all the same parallel load stuff for phi functions/block arguments, out-of-SSA, reconstruct in optimisation passes all the information you discard by going to a linear IR, etc.
I'd argue that Lengauer-Tarjan is overrated. While it is neat, it's premature optimization IMHO. Fully maximal SSA (at the entry of each block insert a phi-node for each variable) + dead code elimination (we need it anyways) is much simpler.
Typical implementations of Lengauer-Tarjan are often taken verbatim from Andrew Appel's book and involve higher constant factors than alternative algorithms - such as Cooper et al's "engineered" version of the usual fixpoint algorithm, which is very neat, simple, and performs well.
If you are saying that constructing SSA following the classical approach is a premature optimisation, perhaps you would prefer Braun et al's retelling of Cliff Click's SSA construction algorithm - which works backwards from uses to place phis and requires no dominance information to be computed.
5 replies →