Comment by titzer
1 day ago
The section on dynamic compilers is more or less all about trace compilation. Generally, trace compilation is a dead end and has been abandoned repeatedly. The more important concepts here are type feedback and speculation and deoptimization, as well as making fast compilers and tiering.
The course overall looks good, and it's great that so much is available online, so well done, Adrian.
Thanks, Ben. I admit I mostly think tracing is just a mind-expanding concept to learn about, even if history has proven it’s not very practical as an organizing principle. But as you say, I’d love to offer more context on “what actually seems to work” industrially.
Yeah, it is conceptually interesting. I like giving students perspective and in 770 (https://www.cs.cmu.edu/~wasm/cs17-770/fall2025/) I might spend half or less of a lecture on tracing and I don't pull punches on how I think it ends up not really working well in real systems. It's a good opportunity to talk about program behavior and the cost/benefit of speculation.
We spend a lot more time on type feedback, ICs, and deoptimization which are the more universal concepts that can be applied to multiple different compiler designs.
To be fair, the listed Self paper did age quite well!
I work on PyTorch torch.compile and it’s a tracing compiler as well. Perhaps this domain is very narrow though; it is also a very not conventional compiler, you’d probably be deeply offended by some of the stuff we do! ;)
> Generally, trace compilation is a dead end and has been abandoned repeatedly.
JAX is a tracing compiler!
(I know, I know, it sits in an extremely different part of the problem space than TraceMonkey or LuaJIT. Still.)
Interesting. I think numerical computing is a narrow enough domain where programs have very well-behaved control flow, which avoids most of the problems of trace compilation. Loops over branchy code, which are really common in general programs, are very difficult to make work well with tracing.
Numerical programs being very stable in terms of control is what enables GPU parallelization and loop optimizations in the long tradition of Fortran compilers. Optimizations like loop tiling, interchange, strip mining, etc aren't going to be easy to do with trace compilation.
Anyway my comment was more directed toward trace compilation in the context of dynamic languages, and there I think it's pretty well established it only works well for small programs.
ML compilers in particular go beyond even the level of stability you would expect from numerical programs. Due to how the SIMT model of thread/warp divergence works, the hardware heavily punishes unstable branches. E.g. if you have 32 threads taking a branch then recoalescing on a barrier -- if they all go the same direction then they can go down the execution pipe as a single bundle, but if 1 takes it while 31 don't, then that's 2x the ex-pipe usage by default (and if you have e.g. a computed-branch, performance goes out the window). Consequently, the whole stack is built around the expectation of stable control flow, even to the detriment of performance (from a local perspective).
ML frameworks even take advantage of this to compute, ahead-of-time, how much memory will be used at different points in the program graph, and thereafter schedule memcpy's to make space as necessary. Of course this only works for well-behaved program classes, but e.g. most LLM architectures fit into that category. Interestingly MoE models don't, since they require data-dependent control flow, thus the recent push towards accommodating dynamism in frameworks (like JAX, which until ~recently couldn't handle it at all).
My feeling about JavaScript is statistically speaking, random JavaScript code is more likely to be spaghetti nonsense than, say, the equivalent Python code. This feeling isn't based on empirical data, tbh it's probably as much anti-JS bias as it is experience with poorly written JavaScript.
Is your criticism of tracing specific to messy, confusing code, with lots of edge cases in the main loop, or does it also hold true for well written code?
I have no experience with compiler design, didn't even take a compilers course in college.
1 reply →
and PyPy right?
The TraceMonkey paper was on my qual reading list, and my quals happened to be around the time TraceMonkey was ripped out of SpiderMonkey. I was talking with one of the developers at the time (I think it was Jason Orendorff?), who said that tracing just doesn't work out, but there was limited circumstances where he thought it might work out... but I've completely forgotten what those circumstances were.
Trace compilation is NOT a dead end.
LuaJIT works just fine. On big programs. On hundreds of millions of servers and devices.
I find it deeply saddening that even scholars keep repeating this trope. Ignorance is bliss.
Hi Mike, I think your work on luajit is really cool. I worked on JavaScript for some time and among all of the JS engines out there, a lot of ideas were tried. TraceMonkey in particular did explore trace compilation for JavaScript and benefited from a lot of fundamental research into trace compilation from Michael Franz's group and others. I've talked to many of those people. It just didn't work out for JS as there are too many diverse programs out there that aren't well-behaved. It's not enough to get great performance on some programs (and hopefully good performance on average programs) if there are pathological cases that ruin user experience. The internet is the greatest VM fuzzer yet invented and produces some stunningly awful things that present constant challenges to speculative dynamic optimization. It's well accepted in the JS space that method JITs have more stable, understandable performance, though they too can get into trouble.
Best of luck.
I'm not sure one can infer anything from a single failed Mozilla project.
'Real-world' JavaScript is a challenge for any compiler, no matter the underlying technology.
The technical debt in SpiderMonkey (at that time, anyway) was eye-watering. Grafting anything on top of that was hard. They haven't even gotten to the point of implementing the crucial pieces of a trace compiler before the project folded.
Trace compilers make nice textbook exercises. Getting them into production is a different matter: region selection, side-exit handling, trace graph evolution, deep VM integration, code generation adapted to all of this … far from trivial, but doable.
Neither is it trivial to create a production-quality method-at-a-time JIT compiler and VM for a dynamic language.
Ceterum censeo: The fundamental papers on trace compilation are from Fisher (1981) and the teams around Multiflow (1990) and Dynamo (1999). Franz, Gal, et al can be credited for later re-popularizing the idea, but they haven't added anything of note.
Didn't you guys have this discussion like 10 years ago in Lambda the Ultimate comments?
Has LuaJIT been superseded?
Got any other recommended resources on building compilers?
PyTorch?