Comment by fuzztester
3 days ago
>On purely recreational grounds, one can get something small off the ground in an afternoon with LLVM. It's very enjoyable and has a low barrier to entry, really.
Is there something analogous for those wanting to create language interpreters, not compilers? And preferably for interpreters one wants to develop in Python?
Doesn't have to literally just an afternoon, it could be even a few weeks, but something that will ease the task for PL newbies? The tasks of lexing and parsing, I mean.
https://craftinginterpreters.com/introduction.html
AST interpreter in Java from scratch, followed by the same language in a tight bytecode VM in C.
Great book; very good introduction to the subject.
There's quite neat lexer and parser generators for Python that can ease the barrier to entry. For example, I've used PLY now and then for very small things.
On the non-generated side, lexer creation is largely mechanical - even if you write it by hand. For example, if you vaguely understand the idea of expressing a disjunctive regular expression as a state machine (its DFA), you can plug that into skeleton algorithms and get a lexer out (for example, the algorithm shown in Reps' "“Maximal-Munch” Tokenization in Linear Time " paper). For parsing, taking a day or two to really understand Pratt parsing is incredibly valuable. Then, recursive descent is fairly intuitive to learn and implement, and Pratt parsing is a nice way to structure your parser for the more expressive parts of your language's grammar.
Nowadays, Python has a match (pattern matching) construct - even if its semantics are somewhat questionable (and potentially error-prone). Overall, though, I don't find Python too unenjoyable for compiler-related programming: dataclasses (and match) have really improved the situation.
I am a big fan of Ragel[1]. That is a high performance parser generator. In fact, it can generate different types of parsers, very powerful. Unfortunately, it takes a lot of skill to operate. I wrote a parser generator generator to make it all smooth[2], but after 8 years I still can't call it effortless. A colleague of mine once "broke the internet" with a Ragel bug. So, think twice. Still, for weekend activities I highly recommend it, just for the way of thinking it embodies.
[1]: https://www.colm.net/open-source/ragel/
[2]: https://github.com/gritzko/librdx/blob/master/rdx/JDR.lex
Is this the same Ragel that Zed Shaw wrote about in one of his posts back in the day, during Ruby and Rails heydays? I vaguely remwmber that article. I think he used it for Mongrel, his web server.
https://github.com/mongrel/mongrel
The worst part of designing a language is the parsing stage.
Simple enough to do it by hand, but there’s a lot of boilerplate and bureaucracy involved that is painfully time-wasting unless you know exactly what syntax you are going for.
But if you adopt a parser-generator such as Flex/Bison you’ll find yourself learning and debugging and obtuse language that has to be forcefully bent to your needs, and I hope your knowledge of parsing theory is up-to-scratch when you’re facing with shift-reduce conflicts or have to decide whether LR or LALR(1) or whatever is most appropriate to your syntax.
Not even PEG is gonna come to your rescue.
Thanks to those who replied.