← Back to context

Comment by norir

2 days ago

Compiler performance must be considered up front in language design. It is nearly impossible to fix once the language reaches a certain size without it being a priority. I recently saw here the observation that one can often get a 2x performance improvement through optimization, but 10x requires redesigning the architecture.

Rust can likely never be rearchitected without causing a disastrous schism in the community, so it seems probable that compilation will always be slow.

Not only language.

Many of complaints towards Rust, or C++, are in reality tooling complaints.

As shown on other ecosystems, the availability of interpreters or image based tooling are great ways to overcome slow optimizating compilers.

C++ already had a go at this back in the early 90's with Energize C++ and Visual Age for C++ v4, both based on Common Lisp and Smalltalk from their respective owners.

They failed on the market due to the hardware requirements for 90's budgets.

Now slowly coming back with tooling like Visual C++ hot reload improvements, debugging optimised builds, Live++, Jupiter notebooks.

Rational Software started their business selling Ada Machines, the same development experience as Lisp Machines, but with Ada, lovely inspired on Xerox PARC experience with Mesa and Mesa/Cedar.

Haskell and OCaml, besides the slow compilers, have bytecode interpreters and REPLs.

D has the super fast dms, with ldc and gdc, for the optimised builds suffering from longer compile times.

So while Rust cannot be archited in a different way, there is certainly plenty of room for interpreters, REPLs, not compiling always from source and many other tooling improvements, within the same language.

  • I had a coworker who was using Rational back then, and found out one of its killer features was caching of pre compiled headers. Whoever changed them had to pay the piper of compilation, but everyone else got a copy shipped to them over the local network.

    • Yes, you are most likely talking about ClearMake, the build tool used by ClearCase.

      It may have required dedicated infra team, but it had features that many folks only got to discover with git.

      Better save those view description configurations safely.

It's certainly possible to think of language features that would preclude trivially-achievable high-performance compilation. None of those language features that are present in Rust (specifically, monomorphized generics) would have ever been considered for omission, regardless of their compile-time cost, because that would have compromised Rust's other goals.

  • There are many more mundane examples of language design choices in rust that are problematic for compile time. Polymorphization (which has big potential to speed up compile time) has been blocked on pretty obscure problems with TypeId. Procedural macros require double parsing. Ability to define items in function bodies prevents skipping parsing bodies. Those things are not essential, they could pretty easily be tweaked to be less problematic for compile time without compromising anything.

    • This is an oversimplification. Automatic polymorphization is blocked on several concerns, e.g. dyn safety (and redesigning the language to make it possible to paper over the difference between dyn and non-dyn safe traits imposes costs on the static use case), and/or obscure LLVM implementation deficiencies (which was the blocker for the last time I proposed a Swift-style ABI to address this). Procedural macros don't require double-parsing; many people do use syn to parse the token stream, but 1) parsing isn't a performance bottleneck, 2) providing a parsed AST rather than a token stream freezes the AST, which is something that the Rust authors deliberately wanted to avoid, rather than being some kind of accident of design, 3) at any point in the future the Rust devs could decide to stabilize the AST and provide a parsed representation, so this isn't anything unfixable that would cause any sort of trauma in the community, 4) proc macro expansions are trivially cacheable if you know you're not doing arbitrary I/O, which is easy to achieve manually today and should absolutely be built-in to the compiler (if for no other reason than having a sandboxed dev environment), but once again this is easy to tack on in future versions. As for allowing item definitions in function bodies, I want to reiterate that parsing is not a bottleneck.

      8 replies →

    • Macros themselves are a terrible hack to work around support for proper reflection.

      The entire Rust ecosystem would be reshaped in such fascinating ways if we had support for reflection. I'd love to see this happen one day.

      8 replies →

  • > would have ever been considered for omission, regardless of their compile-time cost, because that would have compromised Rust's other goals.

    That basically says compiler speed isn’t a goal at all for Rust. I think that’s not completely true, but yes, speed of generated code definitely ranks very high for rust.

    In contrast, Wirth definitely had the speed at which the Oberon compiler compiled code as a goal (often quoted as that he only added compiler optimizations if they made the compiler itself so much faster that it didn’t become slower because of the added complexity, but I’m not sure he was that strict)

    http://www.projectoberon.net/wirth/CompilerConstruction/Comp..., section 16.1:

    “It is hardly surprising that certain measures for code improvement may yield considerable gains with modest effort, whereas others may require large increases in compiler complexity and size while yielding only moderate code improvements, simply because they apply in rare cases only.

    Indeed, there are tremendous differences in the ratio of effort to gain. Before the compiler designer decides to incorporate sophisticated optimization facilities, or before deciding to purchase a highly optimizing, slow and expensive compiler, it is worth while clarifying this ratio, and whether the promised improvements are truly needed.

    Furthermore, we must distinguish between optimizations whose effects could also be obtained by a more appropriate formulation of the source program, and those where this is impossible.

    The first kind of optimization mainly serves the untalented or sloppy programmer, but merely burdens all the other users through the increased size and decreased speed of the compiler.

    As an extreme example, consider the case of a compiler which eliminates a multiplication if one factor has the value 1. The situation is completely different for the computation of the address of an array element, where the index must be multiplied by the size of the elements. Here, the case of a size equal to 1 is frequent, and the multiplication cannot be eliminated by a clever trick in the source program.”

    • > That basically says compiler speed isn’t a goal at all for Rust

      No, it says that language design inherently involves difficult trade-offs, and the Rust developers consciously decided that some trade-offs were worth the cost. And their judgement appears to have been correct, because Rust today is more successful than even the most optimistic proponent would have dared to believe in 2014; that users are asking for something implies that you have succeeded to the point of having users at all, which is a good problem to have and one that nearly no language ever enjoys.

      In the context of Oberon, let's also keep in mind that Rust is a bootstrapped compiler, and in the early days the Rust developers were by far the most extensive users of the language; nobody on Earth was more acutely affected by compiler performance than they were. They still chose to prefer runtime performance (to be competitive with C++) over compiler performance (to be competitive with Go), and IMO they chose correctly.

      And as for the case of Oberon, its obscurity further confirms that prioritizing compiler performance at all cost is not a royal road to popularity.

  • What about crates as the unit of compilation? I am genuinely curious because it's not clear to me what trade-offs there are around that decision.

    • It's a "unit" in the sense of calling `rustc` once, but it's not a minimal unit of work. It's not directly comparable to what C does.

      Rust has incremental compilation within a crate. It also splits optimization work into many parallel codegen units. The compiler front-end is also becoming parallel within crates.

      The advantage is that there can be common shared state (equivalent of parsing C headers) in RAM, used for the entire crate. Otherwise it would need to be collected, written out to disk, and reloaded/reparsed by different compiler invocations much more often.

      3 replies →

    • All compilers have compilation units, there's not actually much interesting about Rust here other than using the word "crate" as a friendlier term for "compilation unit".

      What you may be referring to instead is Cargo's decision to re-use the notion of a crate as the unit of package distribution. I don't think this was necessarily a bad idea (it certainly made things simpler, which matters when you're bootstrapping an ecosystem), but it's true that prevailing best practices since then have led to Rust's ecosystem having comparatively larger compilation units (which itself isn't necessarily a bad thing either; larger compilation units do tend to produce faster code). I would personally like to see Cargo provide a way to decouple the unit of distribution from the unit of compilation, which would give us free parallelism (which currently today rustc needs to tease out via parallel codegen units (and the forthcoming parallel frontend)) and also assuage some of the perpetual hand-wringing about how many crates are in a dependency tree (which is exactly the wrong measure as getting upset about how many source files are in your C program). This would be a fully backwards-compatible change.

This was a big reason for dart canceling its previous macros attempt (as I understand it). Fast compilation is integral for Flutter development - which accounts for a late percentage of dart usage - so after IIRC more than two years of developing it they still ended up not going through with that iteration of macros because it would make hot reload too slow. That degree of level-headedness and consideration is worthy of respect IMO.

  • Dart is a meh language but their focus on hot reload single handedly made it worth it's existence.

One of the issue why compile times are so awful is that all dependencies must be compiled for each project.

20 different projects use the same dependency? They each need to recompile it.

This is an effect of the language not having a proper ABI for compiling libraries as dynamically loadable modules, which in itself presents many other issues, including making distribution of software a complete nightmare.

  • > This is an effect of the language not having a proper ABI for compiling libraries as dynamically loadable modules

    No, this is a design decision of Cargo to default to using project-local cached artifacts rather than caching them at the user or system level. You can configure Cargo to do so if you'd like. The reason it doesn't do this by default is because Cargo gives crates great latitude to configure themselves via compile-time flags, and any difference in flags means you get a different compiled artifact anyway. On top of that, there's the question of what `cargo clean` should do when you have a global cache rather than a local one.

    • Why can't Cargo have a system like PyPI where library author uploads compiled binary (even with their specific flags) for each rust version/platform combination, and if said binary is missing for certain combination, fallback to local compile? Imagine `cargo publish` handle the compile+upload task, and crates.io be changed to also host binaries.

      23 replies →

  • Dependencies must compile with the right features enabled. You can't possibly share the 2^n versions of every binary. ABI stability doesn't fix this.

  • If you use bazel to compile rust, it doesn't suffer from this problem. In fact you can get distributed caching as well.

At some point, the community is also responsible for the demanding expectation of a "not slow" compiler.

What's "slow"? What's "fast"? It depends. It depends on the program, the programmer, his or her hardware, the day of the week, the hour of the day, the season, what he or she had for lunch, ...

It's a never ending quest.

I, for exemple, am perfectly happy with the current benchmark of the rust compiler. I find a x2 improvement absolutly excellent.

The key to unlocking a 10x improvement to compilation speeds will like be multithreading. I vaguely remember that LLVM struggled with this and I am not sure where it stands today. On the frontend side language (not compiler) design will affect how well things can be parallelized, e.g. forward declatations probably help, mandatory interprocedural anaylyses probably hurt.

Having said that, we are in a bad shape when golang compiling 40kLOC in 2s is a celebrated achievement. Assuming this is single threaded on a 2GHz machine, we 2s * 2GHz / 40kLOC = 100k [cycles] / LOC

That seems like a lot of compute and I do not see how this cannot be improved substantially.

Shameless plug: the Cwerg language (http://cwerg.org) is very focussed on compilation speeds.

It is ironic how “rewrite it in Rust” is the solution to make any program fast, except the Rust compiler.

  • It's not ironic at all. Rust programs being fast is in large part due to shifting work from runtime to compile time.

maybe rustc will never be re-architectured (although it has already been rewritten once), but with developing rust standard there will come new Rust implementations. And there is a chance that they will prioritize performance when architecting.

  • The root cause of the problem is not the compiler. It's the language. If you compile a C program that links to a lot of libraries, the compiler compiles just the source code you wrote, and then the linker combines that with the pre-compiled libraries. Linking is a comparatively fast operation, so the total time is roughly what it took to compile just your source. Rust has chosen a design that requires it to compile not just your source, but also a fair chunk of the the libraries source code as well. Those libraries are usually many times the size of your source code.

    Unsaid here is modern C++ programs suffer from the same problem, because modern C++ libraries are mostly .h files that are stuffed full of templates. The outcome is the same as Rust: slow compilation, even with a GNU C++ compiler.

    The outcome is the same because Rust's generics have a lot in common with C++ templates. Both a generic Rust function and a C++ template are a kind of type safe macro. Like all macro's they generate source code, customised by the parameters the caller supplied, and because its customised that source code can't be pre-compiled and put into a library.

    The downside of this is not just slow compile times. It's also means very fat binaries. And it means security issues in a library can't be fixed by shipping a new version of a dll, you have to recompile the original program.

If the application works poorly for the developers it will eventually work poorly for everyone.

Being surrounded by suck slowly creeps into the quality of your work.

Computer programming is the only skilled labor I know of where people eschew quality tools and think they won’t output slop by doing so.

You're conflating language design and compiler architecture. It's hard to increment on a compiler to get massive performance improvement, and rearchitecture can help, but you don't necessarily need to change anything to the language itself in that regard.

Roslyn (C#) is the best example of that.

It's a massive endeavor and would need significant fundings to happen though.

  • Language design can have massive impact on compiler architecture. A language with strict define-before-use and DAG modules has the potential to blow every major compiler out of the water in terms of compile times. ASTs, type checking, code generation, optimization passes, IR design, linking can all be significantly impacted by this language design choice.

  • No, language design decisions absolutely have a massive impact the performance envelope of compilers. Think about things like tokenization rules (Zig is designed such that every line can be tokenized independently, for example), ambiguous grammars (most vexing parse, lexer hack etc.), symbol resolution (e.g. explicit imports as in Python, Java or Rust versus "just dump eeet" imports as in C#, and also things whether symbols can be defined after being referenced) and that's before we get to the really big one: type solving.

    • The lexer hack is a C thing, and Ive rarely heard anyone complain about C compiler performance. That seems more like an argument that the grammar doesn't have that much of an impact on compiler performance as other things.

      2 replies →

    • This kind of comment is funny because it reveals how uninformed people can be while having a strong opinion on a topic.

      Yes grammar can impact how theoretically fast a compiler can be, and yes the type system ads more or less works depending on how it's designed, but none of these are what makes Rust compiler slow. Parsing and lexing are negligible fraction of compile time, and typing isn't particularly heavy in most cases (with the exception of niches crates who abuse the Turing completeness of the Trait system). You're not going to make big gains by changing these.

      The massive gains are to be made later in the pipeline (or earlier, by having a way to avoid re-compiling pro macros and their dependencies before the actual compilation can even start).

      6 replies →

[flagged]

  • The original comment is mostly inline with the article.

    All the easy local optimizations have been done. Even mostly straightforward compiler wide changes take a team of people multiple years to land.

    Re-architecting the rust compiler to be faster is probably not going to happen.

    • > Re-architecting the rust compiler to be faster is probably not going to happen.

      This is a statement without the weight of evidence. The Rust compiler has been continually rearchitected since 1.0, and has doubled its effective performance multiple times since then.

      2 replies →