Comment by norir

2 months ago

Yes, this is fine for basic exploration but, in the long run, I think LLVM taketh at least as much as it giveth. The proliferation of LLVM has created the perception that writing machine code is an extremely difficult endeavor that should not be pursued by mere mortals. In truth, you can get going writing x86_64 assembly in a day. With a few weeks of effort, it is possible to emit all of the basic x86_64 instructions. I have heard aarch64 is even easier but I only have experience with x86_64.

What you then realize is that it is possible to generate quality machine code much faster than LLVM and using far fewer resources. I believe both that LLVM has been holding back compiler evolution and that it is close to if not already at peak popularity. As LLMs improve, the need for tighter feedback loops will necessitate moving off the bloat of LLVM. Moreover, for all of the magic of LLVMs optimization passes, it does very little to prevent the user from writing incorrect code. I believe we will demand more from a compiler backend than LLVM can ever deliver.

The main selling point of LLVM is that you gain access to all of the targets, but this is for me a weak point in its favor. Firstly, one can write a quality self hosting compiler with O(20) instructions. Adding new backends should be trivial. Moreover, the more you are thinking about cross platform portability, the more you are worrying about hypothetical problems as well as the problems of people other than yourself. Get your compiler working well first on your machine and then worry about other machines.

6 comments

norir

contificate 2 months ago

I agree. I've found that, for the languages I'm interesting in compiling (strict functional languages), a custom backend is desirable simply because LLVM isn't well suited for various things you might like to do when compiling functional programming languages (particularly related to custom register conventions, split stacks, etc.).

I'm particularly fond of the organisation of the OCaml compiler: it doesn't really follow a classical separation of concerns, but emits good quality code. E.g. its instruction selection is just pattern matching expressed in the language, various liveness properties of the target instructions are expressed for the virtual IR (as they know which one-to-one instruction mapping they'll use later - as opposed to doing register allocation strictly after instruction selection), garbage collection checks are threaded in after-the-fact (calls to caml_call_gc), its register allocator is a simple variant of Chow et al's priority graph colouring (expressed rather tersely; ~223 lines, ignoring the related infrastructure for spilling, restoring, etc.)

As a huge aside, I believe the hobby compiler space could benefit from someone implementing a syntactic subset of LLVM, capable of compiling real programs. You'd get test suites for free and the option to switch to stock LLVM if desired. Projects like Hare are probably a good fit for such an idea: you could switch out the backend for stock LLVM if you want.

maybewhenthesun 2 months ago

>Adding new backends should be trivial.

Sounds like famous last words :-P

And I don't really know about faster once you start to handle all the edge cases that invariably crop up.

Point in case: gcc

fc417fc802 2 months ago

It's the classic pattern where you redefine the task as only 80% of the original.

cardanome 2 months ago

That is why the Hare languages uses QBE instead: https://c9x.me/compile/

Sure it can't do all the optimizations LLVM can but it is radically simpler and easier to use.

christophilus 2 months ago

Hare is a very pleasant language to use, and I like the way the code looks vs something like Zig. I also like that it uses QBE for the reasons they explained.
That said, I suspect it’ll never be more than a small niche if it doesn’t target Mac and Windows.

sixthDot 2 months ago

If only that was only about emitting byte code in a file then calling the linker... you also have the problem of debug information, optimizers passes, the amount of tests required to prove the output byte code is valid, etc.