← Back to context

Comment by gritzko

3 days ago

I wonder, at which point it is worth it to make a language? I personally implemented generics, slices and error propagation in C… that takes some work, but doable. Obviously, C stdlib goes to the trash bin, but there is not much value in it anyway. Not much code, and very obsolete.

Meanwhile, a compiler is an enormously complicated story. I personally never ever want to write a compiler, cause I already had more fun than I ever wanted working with distributed systems. While idiomatic C was not the way forward, my choice was a C dialect and Go for higher-level things.

How can we estimate these things? Or let's have fun, yolo?

> Meanwhile, a compiler is an enormously complicated story.

I don't intend to downplay the effort involved in creating a large project, but it's evident to me that there's a class of "better C" languages for which LLVM is very well suited.

On purely recreational grounds, one can get something small off the ground in an afternoon with LLVM. It's very enjoyable and has a low barrier to entry, really.

  • Yes, this is fine for basic exploration but, in the long run, I think LLVM taketh at least as much as it giveth. The proliferation of LLVM has created the perception that writing machine code is an extremely difficult endeavor that should not be pursued by mere mortals. In truth, you can get going writing x86_64 assembly in a day. With a few weeks of effort, it is possible to emit all of the basic x86_64 instructions. I have heard aarch64 is even easier but I only have experience with x86_64.

    What you then realize is that it is possible to generate quality machine code much faster than LLVM and using far fewer resources. I believe both that LLVM has been holding back compiler evolution and that it is close to if not already at peak popularity. As LLMs improve, the need for tighter feedback loops will necessitate moving off the bloat of LLVM. Moreover, for all of the magic of LLVMs optimization passes, it does very little to prevent the user from writing incorrect code. I believe we will demand more from a compiler backend than LLVM can ever deliver.

    The main selling point of LLVM is that you gain access to all of the targets, but this is for me a weak point in its favor. Firstly, one can write a quality self hosting compiler with O(20) instructions. Adding new backends should be trivial. Moreover, the more you are thinking about cross platform portability, the more you are worrying about hypothetical problems as well as the problems of people other than yourself. Get your compiler working well first on your machine and then worry about other machines.

    • I agree. I've found that, for the languages I'm interesting in compiling (strict functional languages), a custom backend is desirable simply because LLVM isn't well suited for various things you might like to do when compiling functional programming languages (particularly related to custom register conventions, split stacks, etc.).

      I'm particularly fond of the organisation of the OCaml compiler: it doesn't really follow a classical separation of concerns, but emits good quality code. E.g. its instruction selection is just pattern matching expressed in the language, various liveness properties of the target instructions are expressed for the virtual IR (as they know which one-to-one instruction mapping they'll use later - as opposed to doing register allocation strictly after instruction selection), garbage collection checks are threaded in after-the-fact (calls to caml_call_gc), its register allocator is a simple variant of Chow et al's priority graph colouring (expressed rather tersely; ~223 lines, ignoring the related infrastructure for spilling, restoring, etc.)

      --

      As a huge aside, I believe the hobby compiler space could benefit from someone implementing a syntactic subset of LLVM, capable of compiling real programs. You'd get test suites for free and the option to switch to stock LLVM if desired. Projects like Hare are probably a good fit for such an idea: you could switch out the backend for stock LLVM if you want.

    • >Adding new backends should be trivial.

      Sounds like famous last words :-P

      And I don't really know about faster once you start to handle all the edge cases that invariably crop up.

      Point in case: gcc

      1 reply →

    • If only that was only about emitting byte code in a file then calling the linker... you also have the problem of debug information, optimizers passes, the amount of tests required to prove the output byte code is valid, etc.

  • >On purely recreational grounds, one can get something small off the ground in an afternoon with LLVM. It's very enjoyable and has a low barrier to entry, really.

    Is there something analogous for those wanting to create language interpreters, not compilers? And preferably for interpreters one wants to develop in Python?

    Doesn't have to literally just an afternoon, it could be even a few weeks, but something that will ease the task for PL newbies? The tasks of lexing and parsing, I mean.

    • There's quite neat lexer and parser generators for Python that can ease the barrier to entry. For example, I've used PLY now and then for very small things.

      On the non-generated side, lexer creation is largely mechanical - even if you write it by hand. For example, if you vaguely understand the idea of expressing a disjunctive regular expression as a state machine (its DFA), you can plug that into skeleton algorithms and get a lexer out (for example, the algorithm shown in Reps' "“Maximal-Munch” Tokenization in Linear Time " paper). For parsing, taking a day or two to really understand Pratt parsing is incredibly valuable. Then, recursive descent is fairly intuitive to learn and implement, and Pratt parsing is a nice way to structure your parser for the more expressive parts of your language's grammar.

      Nowadays, Python has a match (pattern matching) construct - even if its semantics are somewhat questionable (and potentially error-prone). Overall, though, I don't find Python too unenjoyable for compiler-related programming: dataclasses (and match) have really improved the situation.

    • I am a big fan of Ragel[1]. That is a high performance parser generator. In fact, it can generate different types of parsers, very powerful. Unfortunately, it takes a lot of skill to operate. I wrote a parser generator generator to make it all smooth[2], but after 8 years I still can't call it effortless. A colleague of mine once "broke the internet" with a Ragel bug. So, think twice. Still, for weekend activities I highly recommend it, just for the way of thinking it embodies.

      [1]: https://www.colm.net/open-source/ragel/

      [2]: https://github.com/gritzko/librdx/blob/master/rdx/JDR.lex

      2 replies →

> I wonder, at which point it is worth it to make a language?

AT ANY POINT.

No exist, nothing, that could yield more improvements that a new language. Is the ONLY way to make a paradigm(shift) stick. Is the ONLY way to turn "discipline" into "normal work".

Example:

"Everyone knows that is hard to mutate things":

* Option 1: DISCIPLINE

* Option 2: you have "let" and you have "var" (or equivalent) and remove MILLIONS of times where somebody somewhere must think "this var mutates or not?".

"Manually manage memory is hard"

* Option 1: DISCIPLINE

* Option 2: Not need, for TRILLONS of objects across ALL the codebases with any form of automatic memory management, across ALL the developers and ALL their apps to very close to 100% to never worry about it

* Option 3: And now I can be sure about do this with more safety and across threads and such

---

Make actual progress with a language is hard, because there is a fractal of competing things that in sore need of improvement, and a big subset of users are anti-progress and prefer to suffer decades of C (example) than some gradual progress with something like pascal (where a "string" exist).

Plus, a language need to coordinate syntax (important) with std library (important) with how frameworks will end (important) with compile-time AND runtime outcomes (important) with tooling (important).

And miss dearly any of this and you blew it.

But, there is not other kind of project (apart from a OS, FileSystem, DBs) where the potential positive impact will extend to the future as much.

At the point you want to interface with people outside of your direct influence. That's the value of a language — a shared understanding.

So long as only you use your custom C dialect, all is fine. Trouble starts when you'd like others to use it too or when you'd like to use libraries written by people who used a different language, e.g. C.

This actually started of by Christoffer (C3 author) contributing to C2 but not being satisfied with the development speed there, wanting to try his own things and moving forward more quickly. Apparently together with LLVM it was doable to write a new compiler for what is a successor to C2.