Writing a C Compiler, in Zig (2025)

20 hours ago (ar-ms.me)

Looking at the repo, the author seemed a little fed up [1] with the nature of lower level language and quitted.

[1] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

  • I’ve just read the two functions there by that footnote, `reaching_copies_meet`. I have so much code review feedback just on code style, before we even get into functionality. And it’s like 20 lines. (The function shouldn’t return an error set, it should take an allocator, the input parameter slices should be const, the function shouldn’t return either the input slice or a newly allocated slice.)

    It’s interesting how Zig clicked for me pretty quickly (although I have been writing it for a couple of years now). But some of the strategies of ownership and data oriented design I picked up writing JavaScript. Sometimes returning a new slice and sometimes returning the same slice is a problem for memory cleanup, but I wouldn’t do it even in JavaScript because it makes it difficult for the caller to know whether they can mutate the slice safely.

    I suspect that there’s a way to write this algorithm without allocating a temporary buffer for each iteration. If I’m right that it’s just intersecting N sets, then I would start by making a copy of the first set, and on each iteration, removing items that don’t appear in the new set. I suspect the author is frustrated that Zig doesn’t have an intersect primitive for arrays, but usually when the Zig standard library doesn’t have something, it’s intentionally pushing you to a different algorithm.

  • Feels like maybe something lost in translation with their explanation - they say they were fed up of data structures etc. but they returned to Rust? I’m assuming there’s something a bit more nuanced about what they got tired of with Zig

    • Rust is a world away from Zig as far as being low-level. Rust does not have manual memory management and revolves around RAII which hides a great deal of complexity from you. Moreover it is not unusual for a Rust project to have 300+ dependencies that deal with data structures, synchronization, threading etc. Zig has a rich std lib, but is otherwise very bare and expects you to implement the things you actually want.

      6 replies →

    • I think Rust is "higher level" than C or Zig in the sense that there are most abstractions than C or Zig. Its not Javascript, but it is possible to program Rust without worrying too much about low level concerns.

      6 replies →

    • While you can obviously write low level code in Rust and manage allocations, memory, use pointers etc, you can also write much higher level code leveraging abstractions both in Rust itself and its' rich ecosystem. If you're coming from higher level languages it's much friendlier than C/C++ or Zig. I think I would struggle to write C or Zig effectively but I have no issues with Rust and I really enjoy the language.

  • Quite a footnote [0]:

    > I do not know if it is me being bored with the project, or annoyed with having to build and design a data structure, that has soured me on this project. But I have really at this point lost most motivation to continue this chapter. The way Zig is designed, it makes me deal with the data structure and memory management complexity head on, and it is tiresome. It is not "simpler" than, say, Rust: it just leaves the programmer to deal with the complexity, <strike-through>gaslighting the user</strike-through> claiming it is absolutely necessary.

    [0] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features. A compiler is just a text -> text translation tool if you can leverage other tools such as an assembler and never needs to access machine level instructions. E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image. Even when an assembler isn't available, all your implementation language needs to support, in terms of "low level" features, is writing of bytes to a file.

But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.

  • > I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.

    Performance.

    You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.

    But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.

    When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.

    • Does “low level” translate to performance? Is Rust a “low level” language?

      Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.

    • I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).

      E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.

      The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.

      [1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!

    • > But I also accept that that choice means there's an upper limit to how fast my compiler will.

      Don't buy it.

      A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.

      Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.

      And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.

      I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.

  • This comment started out strong, but then:

    > Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

    It may be the case that it doesn't conjure up such an image, but Pascal is approximately on the same rung as Zig or D—lower level than Go, higher level than assembly. If folks have a different impression, the problem is just that: their impression.

I thought Zig has a C compiler built in? Or is it just the Zig build system that's able to compile C, but uses an external compiler for that?

Still a proper programmer-flex to build another one.

  • Zig actually bundles LLVM's Clang, which it uses to compile C with the `zig cc` command. But the long term goal seems to not be so tightly coupled to LLVM, so I'm expecting that to move elsewhere. They still do some clever stuff around compiler-rt, allowing it to be better at cross-compilation than raw Clang, but the bulk of it is mostly just Clang.

    There is also another C compiler written in Zig, Aro[1], which seems to be much more complete than TFA. Zig started using that as a library for its TranslateC functionality (for translating C headers into Zig, not whole programs) in 0.16.

    [1]: https://github.com/Vexu/arocc

Cool project. Feels like writing a C compiler in Zig aligns nicely with the old "maintain it in Zig" idea that was part of Zig's early value proposition. Is that still considered a relevant goal today?

Longer term it also makes me wonder whether something like this could eventually reduce reliance on Clang/LLVM for the C frontend in zig's toolchain.