← Back to context

Comment by picomancer

11 years ago

TLDR: You can't write the first compiler for a language in the language. However, you can write the second compiler in the language and use the first compiler to compile it.

If you already have a compiler in the language you're compiling, you can update the compiler with new features by the following process, called "bootstrapping". This process is used in gcc for example:

1. Use the old binary version to compile the new source version.

2. Use the binary produced in step 1 to compile the new source version.

3. Use the binary provided in step 2 to compile the new source version.

The results of stages 2 and 3 should be the same (assuming the old and new compilers assign the same semantics to the compiler source code and don't have non-determinism, e.g. using wall-clock time to determine when to stop searching for optimizations).

The bootstrap process can't be used on the first compiler for a brand-new language, because there is no "old compiler" that can be used in stage 1. The only way around this is to write the first compiler in a different language that already exists.

Of course, if a self-hosted compiler is your goal, you can afford letting the very first compiler (the one written in another language) be limited, "dirty," or "hacky". You don't have to implement the entire language in the first compiler; the first compiler just has to support enough of the new language to allow you to write the second compiler.

Anyway, once you have the first compiler, you write the first version of the second compiler in whatever subset of the language the first compiler supports. Once the first compiler builds the first version of the second compiler, the first compiler is no longer needed; you can add all new features to the second compiler only.

New versions of the second compiler (perhaps supporting more language features) can now be produced by bootstrapping. Of course, you can also use the second compiler to compile a totally different compiler implementation written in the new language (this would be the third compiler for the new language).