Deegen: A JIT-Capable VM Generator for Dynamic Languages

1 year ago (arxiv.org)

> We implement LuaJIT Remake (LJR), a standard-compliant Lua 5.1 VM, using Deegen. Across 44 benchmarks, LJR's interpreter is on average 179% faster than the official PUC Lua interpreter, and 31% faster than LuaJIT's interpreter. LJR's baseline JIT has negligible startup delay, and its execution performance is on average 360% faster than PUC Lua and only 33% slower (but faster on 13/44 benchmarks) than LuaJIT's optimizing JIT.

presentation by the author

Deegen: A LLVM-based Compiler-Compiler for Dynamic Languages https://www.youtube.com/watch?v=5cAUX9QPj4Y

Slides https://aha.stanford.edu/sites/g/files/sbiybj20066/files/med...

Ongoing work documented here https://sillycross.github.io/ and some comments here https://lobste.rs/s/ftsowh/building_baseline_jit_for_lua

https://github.com/luajit-remake/luajit-remake

  • My heart sank at the description of being LLVM based. (I couldn't think of a worse choice for creating a JIT compiler.) Thankfully, they don't use LLVM at runtime! LLVM is only for static compilation of the JIT.

If this can generate a v8/spidermonkey class engine for new scripting languages that would be incredible.

It is very exciting to get a multi-tier VM from just bytecode encoded version of VM spec.

  • Yes! I've been waiting for a practical tool like this, and would love to write a JIT for Squirrel/Quirrel using it.

    But I'm looking through the luajit-remake codebase, and there is still a lot of code. Assuming that the drt and deegen directories are Deegen (however, at lease drt/tvalue.h is clearly part of the VM, not of Deegen):

      > fd . -e h -e cpp | egrep -v "test|thirdparty|deegen|drt" | xargs wc --total=only --lines
      34734
      > fd . -e h -e cpp | egrep -v "test|thirdparty" | xargs wc --total=only --lines
      97629
    

    In comparison, Lua 5.2.4 is 20.3k lines of C and LuaJIT 1.1.5, which is a (comparable?) method JIT compiler, is 22.8k lines of C and 4.8k lines of Lua (for dynasm and JIT support). LuaJIT 2.1 is 74.9k lines of C, 13.7k Lua.

    • I think a large part of that might be the language they choose. Every C++ code example in the paper feels extremely verbose to me, and I wonder to which degree that is inherently required for encoding language semantics, and to which degree it's C++ syntax being noisy.

      This is not a critique of the authors, btw. Considering the breadth and depthtof various types of domain-specific knowledge that have to be "synthesized" on a project like this, developing a mastery of C++ is almost a given. So implementing things in C++ was likely the most natural approach for them. It technically also might be the most portable choice, since anyone who has LLVM installed will also have a C++ compiler.

      I do wonder what it would be like if this were built upon a language with more appropriate "ergonomics" though. Maybe they can invent and implement DSL for Deegen in Deegen, haha.

      1 reply →

I wonder if this would work for python.