← Back to context

Comment by keyle

2 years ago

Every time I hear about LLVM, it turns into a rant. Clearly there is a problem there that needs to be fixed. Maybe the LLVM project team should address those issues.

The "problem" with the LLVM project is its massive success, coupled with its incredibly difficult problem domain. Turns out it's actually really hard to write modular compiler infrastructure that serves as the optimizer and code generator for N different arbitrary programming languages. The fact that it works in this capacity at all, and still manages to be competitive with GCC in its original use-case (being a C/C++ backend) is a monumental and unmatched achievement.

As someone who works on an LLVM-based compiler at $DayJob and also has written a compiler front-end that uses LLVM in my free time, I do have a ton of gripes, but any time I feel particularly frustrated by them, I spend a little bit of time working on my non-LLVM backend. After a few days of angry swearing with little to show for it, I go back to working with LLVM with a much greater appreciation for what it's giving me.

  • You got pro tips for working with LLVM types after parsing LLVM IR? I can't for the life of me figure out where in the class hierarchy I am, and the doxygen is... interesting. Language wrangling is much nicer in a language with proper ADTs like Haskell, but I also feel like there's probably that bit of LLVM documentation that I haven't read.

  • Btw, is there a good learning pathway for LLVM/MLIR for folks w/o compiler background? I come mostly from HW but is super interested in the topic due to work-related stuff. Since your day job is LLVM related maybe you can give great advice.

  • When clang came about, GCC was already renamed into GNU Compiler Collection.

Part of the issue is that LLVM is stuck between two very hard places.

1. Needing to support tons of different platforms with various different documented and undocumented behaviours

2. Needing to support many languages with specifically undefined behaviour.

Trying to bridge the undefined nature of the two sides means that things can be fragile. Assumptions that worked for one set of undefined problems may not work for another.

But so much depends on LLVM these days that I can’t think of it as anything but a success. A flawed success but one that is doing its best to bring order to a naturally chaotic problem space.

GCC and MSVC have similar problems because it’s inherent to the problem space. So while it’s frustrating hitting those bugs, everyone in the space knows they aren’t fundamental issues with the project itself.

It's not obvious that "LLVM, but not incredibly frustrating" is a thing which can exist. I don't think it's likely, but it's possible that in a few decades the widespread view will be that the very concept of LLVM was a mistake, and a universal compiler backend is just a trap which makes it easy to bootstrap a language but inevitably causes massive problems down the road.

LLVM does have plenty of incidental problems which are clearly fixable and just need a lot of work, but even if you fixed all of them you'd still have people who use LLVM ranting about it.

  • MLIR is "LLVM done better", in fact by the same person. It fixes many of unforced LLVM problems, for example LLVM's inability to parallelize code generation.

    • MLIR is part of LLVM, no? And going by the website sounds like it uses LLVM (or bring-your-own) backend for platform specific code generation.