Comment by chriswarbo

6 years ago

"Traditional" compilers (e.g. GCC, GHC, javac, etc.) are essentially single-purpose black boxes: source code goes in, executable comes out.

Usually that source code must be on disk, often it must be arranged in a certain directory structure (e.g. src/module/...), and sometimes it must be in files with particular names (e.g. javac). This forces programmatic use of the compiler to be more complicated, e.g. setting up temporary directories to appease these rules.

That single-purpose is a common use-case, but certainly not the only one. Traditional compilers typically perform pre-processing, lexing, parsing, precedence resolution, name resolution, macro expansion, type inference, type checking, optimisation, code generation, linking, stripping, etc. all within the same binary (there are some exceptions, e.g. the C pre-processor can also be invoked separately).

In my experience, this is the opposite of composable and extendable! Each of these steps is very useful in its own right, yet we typically have no way to invoke them independently, e.g. to parse code into a structured form; or to infer the type of an expression; or to resolve a name; or to optimise an AST; etc.

To make this composable and extendable in the way you suggest, we would need to make these separate processes, piped together with a build tool (e.g. make, or a helper script). In practice this doesn't happen, but some projects have hooks into their code for extensibility; e.g. GCC can be run with different front- and back-ends, and the "middle-end" can be extended with new passes and plugins (finally!); GHC has some limited plugin functionality, and has a (very flaky!) Haskell API for invoking its different stages; etc.

My point is that the "traditional" world was pretty awful for composability and extendability. From the outside, we had big opaque compiler processes invoked by Make. If we're willing to drop down to the compiler's implementation language, there were some limited facilities to make them do something other than their usual source files -> binary task.

If we look at the post, we see that it's talking about "drop down to the compiler's implementation language" rather than standalone processes with stdio and Makefiles. However, the approach it's talking about is precisely one of pure functions (e.g. `fetchType`) and a flexible build system (Rock), providing composability and extendability. It even says this explicitly, e.g.

> The rules of our compiler, i.e. its "Makefile", then becomes the following function, reusing the functions from above:

Note that the post isn't specifically about LSP; it only mentions "providing editor tooling, e.g. through a language server". It doesn't even talk about long-running processes. As a counter-example, it would be pretty trivial to expose these 'tasks' as standalone commands, piping through stdio, if we really wanted to. So we're not "giving up too much"; we would be gaining composability and extendability!

As for "faster recommendations in IDEs", that's a complete straw man. The post gives the example of querying for the type of a qualified function name, and a few others (e.g. resolving names). Sure, those would be useful for IDEs, but they would also be useful for many more systems. Some examples, off the top of my head:

- Code search, e.g. searching by type (like Hayoo, but more general and less flaky); finding usages across a package repo (this relies on name resolution)

- Chasing (resolved) name occurrences, e.g. to finding downstream projects impacted by a breaking change; or to speed up delta-debugging by only checking commits which change code used by the test.

- Documentation generators can benefit from looking up types.

- Static analysers benefit from name resolution, type inference/lookup, etc.

Personally I've spent years on projects which use compilers for things other than their usual source files -> executable task, and their lack of composability and extendability is painful (my comment history is full of rants about this, mostly regarding GHC!). The approach described in this post would be amazing to see in "real" languages (i.e. those with lots of users and code, where more tooling and automation would provide a lot of benefit). I've often thought about a similar approach to this 'query-based' design, and would love to see things go even further in this direction (e.g. to a Prolog-style database of code)