Comment by nethunters
5 years ago
Any similar resources on writing interpreters?
Also is there are a technical difference between an interpreter and a VM?
5 years ago
Any similar resources on writing interpreters?
Also is there are a technical difference between an interpreter and a VM?
I haven't read it, but Bob Nystrom's Crafting Interpreters has been on my list for a while:
https://craftinginterpreters.com/
It's a great book. Bob's not a CS teacher, and it's great: the book is grounded in practice and has been structured in a way that lets you get a tangible result at the end of every chapter.
Don't wait, read it. It's fun and informative, worth every minute! I haven't had as much fun reading a technical book in years!
I second this. It's a great resource, very well written. Opinionated enough to have character and humour, but not so dry as to make your eyes glaze over. I guess I'm saying that for a technical tome it's very, very readable (as was his Game Design Patterns book incidentally)
A virtual machine executes instructions while an interpreter takes source code as input. An interpreter could be built on top of a virtual machine, obviously, but not necessary. For example, SICP/PLAI/OOPLAI[1] explain how to implement an interpreter on top of Scheme where you directly interpret the s-expressions. These may be a worth read if you want to learn about the usual concepts and techniques used in programming language implementations from a more general point of view. Like, for example, how to implement a static type system or an object model based on classes. Interpreters based on a virtual machine are actually compilers on the fly; the source code is not interpreted but translated into instructions beforehand.
[1] http://sarabander.github.io/sicp/
http://cs.brown.edu/~sk/Publications/Books/ProgLangs/2007-04...
https://users.dcc.uchile.cl/~etanter/ooplai/
Check out Peter Norvig's neat BASIC Interpreter https://github.com/norvig/pytudes/blob/master/ipynb/BASIC.ip...
Interpreters usually execute by walking the abstract syntax tree at runtime, bytecode VMs usually execute actual bytecode similar to machine code. The bytecode for such VMs is often way simpler than the instruction set for an actual computer, so it's far more portable between platforms.
Keep in mind that bytecode interpreted languages (Ruby, Python) are typically called interpreted languages. Java is usually called "compiled" because of the explicit step vs. Ruby and Python, but it's essentially the same. And typically you'll find discussions wrt to JVM about interpretation in Java referring to bytecode interpretation vs. compiling being JIT compiling.
Ultimately limiting interpreters to AST interpreters is not quite correct. The AST is just another IR that source code needs to be converted to just like bytecode. And the AST is essentially also executed by a virtual machine. Interpretation of the IR (the AST, or bytecode, etc.) is one part of a VM. Of course in some instances the VMness of a runtime is more pronounced (Java) than in others (Ruby).
The difference between interpretation and compilation is that the latter is meant to run on real hardware vs. the former implies something that executes the IR of a program by dynamically choosing which machine instructions to run.
Of course a compiler is also something that takes in some code and outputs some other, typically more low level representation.
My point being I don't think there is a strict or consistent definition these days for what an interpreter is.
Case in point: I've also heard people say interpreters read the code "line by line" (or rather, not that but more accurately, as far as they need to read before they know what to execute next) and execute it piece by piece. Which might be how some interpreters worked in some languages in the past. An AST interpreter OTOH already implies some whole source file pre-processing. Is an AST interpreter then less of an interpreter than one that streams the source code line by line. Is a program that does that more an interpreter than another which goes a step further and, e.g. executes a simplified AST, or one that executes a straightforward trivial bytecode representation of the simplified AST?
> Interpreters usually execute by walking the abstract syntax tree at runtime, bytecode VMs usually execute actual bytecode similar to machine code.
This isn't the right way to separate these concepts. A VM that executes bytecodes by interpretation is also an interpreter (Python, Ruby, PHP are well-known examples). Many VMs don't interpret, or they interpret mostly during startup, but actually try to compile hot parts of the code and execute that machine code rather than interpreting (Java and JavaScript VMs are well-known examples).
The VM part more properly refers to things like data representation and memory management. Interpreters of an abstract syntax tree, interpreters of bytecodes, and just in time compilers all need to run inside a system that takes care of loading code, allocating memory when requested, and often doing garbage collection. These services are what make a VM. The exact execution strategy is not what determines VM-ness.
Thanks for clearing that up.
Which languages or implementations of languages directly interpret the AST without the intermediary bytecode compilation?
I know Python, Java and JavaScript (V8 and SpiderMonkey) all compile to bytecode first probably to speed up subsequent runs and run some optimisations on the bytecode.
What other benefits are there to compiling to bytecode first vs directly interpreting the AST?
One major benefit of compiling to bytecode first is that bytecode is a more convenient shared source of truth.
For example, SpiderMonkey has two interpreter and two compiler tiers. The output of the parser is bytecode. We can interpret it immediately, and it's a convenient input to the compilers. It also simplifies transitions between tiers: for example, when we bail out of Warp, we resume at a specific bytecode offset in the baseline interpreter.
I'm not sure how you would resume at a particular point in the execution of an AST-based interpreter without agreeing on a linearized traversal of the AST, which is 90% of the way to bytecode.
Perl is a tree based interpreter. It's not exactly an AST anymore but the time it's being executed, but close enough.
If you compile to AST and walk that then your portability is at the source level; you have to send the source over the wire; each target then needs a parser+walker and each target needs to parse+walk. If you compile to bytecode you can send bytecode over the wire and then simply interpret that bytecode.
Portability. Say I wanted to make language X run on all platforms, but I didn't actually care about compiling it on all platforms. I can just write a relatively simple VM for each platform. This is one of the reasons Java was and still kinda is so ubiquitous
3 replies →
Gambit Scheme, CLISP, CMUCL are capable of interpreting ASTs, and I believe (although I'm not 100% sure) that this is the case for Lispworks and Allegro CL as well. Also, Ruby until version 1.8.