Comment by ameliaquining
6 months ago
"No more [...] slow compile times with complex ownership tracking."
Presumably this is referring to Rust, which has a borrow checker and slow compile times. The author is, I assume, under the common misconception that these facts are closely related. They're not; I think the borrow checker runs in linear time though I can't find confirmation of this, and in any event profiling reveals that it only accounts for a small fraction of compile times. Rust compile times are slow because the language has a bunch of other non-borrow-checking-related features that trade off compilation speed for other desiderata (monomorphization, LLVM optimization, procedural macros, crates as a translation unit). Also because the rustc codebase is huge and fairly arcane and not that many people understand it well, and while there's a lot of room for improvement in principle it's mostly not low-hanging fruit, requiring major architectural changes, so it'd require a large investment of resources which no one has put up.
I know very little about how rustc is implemented, but watching what kind of things make make Rust compile times slower, I tend to agree with you. The borrow checker rarely seems to be the culprit here. It tends to spike up exactly on the things you've mentioned: procedural macros use, generics use (monomorphization) and release builds (optimization).
There are other legitimate criticisms you can raise at the Rust borrow checker such as cognitive load and higher cost of refactoring, but the compilation speed argument is just baseless.
Procedural macros are not really _that_ slow themselves, the issue is more that they tend to generate enormous amount of code that will then have to be compiled, and _that_'s slow.
The issue, most commonly noted by devs, with proc macros is that it slows down the incremental compilation times, because proc macros are recomputed each time.
1 reply →
Also the procedural macro library itself and all of its dependencies have to be compiled. Though this only really affects initial builds, as the library can be cached on subsequent ones.
Proc macros themselves are slow too. If you compile them in debug mode, they run slowly. If you compile them in release mode, they run faster but take longer to compile. This is especially noticeable with behemoth macros like serde that use the complicated syn parser.
Compiling them in release mode does have an advantage if the proc macro is used a lot in your dep tree, since the faster invocations compensate for the increased compile time. Another option is shipping pre-compiled macros like the serde maintainer tried to do at one point, but there was sufficient (justified) backlash to shipping blobs in that case that it will probably never take off.
Here's a comparison of using serde proc macros for (De)Serialize impls vs pre-generated impls: https://github.com/Arnavion/k8s-openapi/issues/4#issuecommen... In other words the amount of code that is compiled in the end is the same; the time difference is entirely because of proc macro invocation. 5m -> 3m for debug builds, 3.5m -> 3m for release builds. It's from 2018, but the situation is largely unchanged as of today.
yesn't they require you to compile a binary (or multiple ones when nested) before being able to compile your binary and depending on a lot of factors that can add quite a bunch of overhead especially for non-incremental non-release builds (and probably can be fixed by adding sand-boxing for reproducibility making most of them pure cache-able functions allowing distributed caching of both their binaries and output, like theoretically, not sure if rust will ever end up there).
And the majority of procedural macros don't produce that much code and like you said their execution isn't the biggest problem.
E.g. the recent article about a db system ending up with 30?min compiler times and then cutting them down to 4min was a case of auto generating a whole (very enormously huge) crate (no idea if proc-macros where involved, didn't really matter there anyway).
So yeah, kinda what you said, proc macros can and should be improved, but rarely are they the root cause.
https://learning-rust.github.io/docs/lifetimes/
> Lifetime annotations are checked at compile-time. ... This is the major reason for slower compilation times in Rust.
This misconception is being perpetuated by Rust tutorials.
On the phone, so I can't now, but someone should file a ticket to that project about that error: https://github.com/learning-rust/learning-rust.github.io/iss...
Be aware that it is not part of the rust-lang organization, it's a third party.
https://github.com/learning-rust/learning-rust.github.io/pul...
Generally why I think rust compile is unfixably slow is the decision to rely on compile-time static dispatch, and heavy generic specialization, which means there's a LOT of code to compile and the resulting binary size is large.
Many-many people remarked that this is the wrong approach in todays world, where CPUs are good at doing dynamic dispatch prediction, but the cache sizes (esp. L1, and instr cache) is very limited, for most code (with the exception of very hot tight loops), fetching code into cache is going to be the bottleneck.
Not to mention, for a systems programming language, I'd expect a degree of neatness of the generated machine code (e.g. no crazy name mangling, having the same generic method appear 30 places in assembly etc.)
People say this a lot, but it doesn't seem to be borne out in practice very often. https://matklad.github.io/2021/07/10/its-not-always-icache.h...
There are compile-time techniques that can mitigate the compile-time cost of monomorphization to a degree: optimizing on a generic IR (MIR) and polymorphization (merging functions that produce equivalent bodies) come to mind as immediate examples that have been talked about or implemented to a degree in rustc.
> is unfixably slow
it's not at all unfixable, I mean sure there is a limit to speed improvements but many of the things you mention aren't really as fundamental as they seem
one one hand you don't have to go crazy with generics, `dyn` is a thing and not being generic is often just fine. Actually it's not rare to find it projects code guidelines to avoid unnecessary monopolization e.g. use `&mut dyn FnMut()` over `impl FnMut()` and similar. And sure there is some issue with people spreading some "always use generics it's faster, dynamic dispatch is evil FUD" but that's more a people then a language problem.
on the other hand rust gives very limited guarantees about how exactly a lot of stuff happens under the hood, including the rust calling convention, struct layout etc. As long as rust don't change "observed" side effects it can to whatever it wants. Dynamic/Static dispatch is in general not counted as a observed side effect so the compiler is free to not monomorphe things if it can make it work. While it already kinda somewhat doesn't monomorphize some part (e.g. T=usize,T=u64 on 64bit systems) there is a lot of untapped potential. Sure there are big limits on how far this can go. But if combined with not obsessing with generics and other improvements I think rust can have very reasonable compile times, especially in a dev->unit test loop. And many people are already fine with them now so nothing I'm overly worried about tbh.
> neatness of the generated machine code
Why would you care about that in a language where you close to never have to look at assembly of it or anything similar? It's also not really what any other languages pursue, even in modern C that is more a side effect then a intend.
Through without question kilobytes large type signatures are an issue (but the mangling isn't, IMHO if you don't use a tool to unmangle symbols on the fly that is a you problem).
It's unfixable in the sense that the problem isn't with how fast the compiler is, it's that you give it a ton of extra work. You could try to convince the library devs to use more dyn, but it'd require a culture shift. I don't think the compiler going behind the users back and second guessing whether to use static dispatch or inlining is something a low-level language should do. Java, sure.
In fact I define a systems language as something that allows the dev to describe intended machine behavior more conveniently, as opposed to a higher-level language, where the user describes desired behavior and the compiler figures out the rest.
1 reply →
Rust is really getting hurt by at least not having some kind of interpreter like OCaml and Haskell have, to dispel the perpetual of urban myths from devs without background in compilers.
fun fact there is work in progress to have a cranelift based backend, which isn't exactly an interpreter but more like a AOT compiler build for WASM
but it anyway does compile things much faster at the cost of less optimizations (doesn't mean no optimizations or that it's slow per-see it's still designed to run WASM performant in a situation where a fast/low latency AOT is needed, but WASM programs are normally already pre-optimized and you many have to to do certain low level instruction optimizations which it still does)
AFIK the goal is to run it by default for the dev->unit test loop, as very often you don't care about high perf. code execution but about low latency getting feedback.
Through idk. the state of it.
All the best for those efforts.
well there's MIRI
However you cannot use it as a general purpose implementation.
miri is godawful slow though
Also, Rust compile times aren't that bad the last time I checked. Maybe they got better and people just don't realize it?
> Also, Rust compile times aren't that bad the last time I checked.
I dunno - I've got a trivial webui that queries an SQLite3 database and outputs a nice table and from `cargo clean`, `cargo build --release` takes 317s on my 8G Celeron and 70s on my 20G Ryzen 7. Will port it to Go to test but I'd expect it to take <15s from clean even on the Celeron.
I don’t think build time from `clean` is the right metric. A developer is usually using incremental compilation, so that’s where I want the whatever speed I can get.
Nobody likes a 5m build time, but that’s a very old slow chip!
3 replies →
Also bloat. Why ripgrep is 2mb gzip compressed?
If you're talking about the release binary, that has an entire libc (musl) statically linked into it. And all of PCRE2. And all of its Rust dependencies including the entire standard library.
Because generic monomorphization generates a massive amount of machine code.
that can be the reason, but it's a very bad example
it's quite unlikely that it would be _that_ much smaller if it had been written in C or C++ with the _exact_ same goals, features etc. in mind.
like grep and ripgrep seem on the surface quite similar (grep something, have multiple different regex engine etc.) but if you go into the details they are quite different (not just because rg has file walking and resolution of gitignore logic build in, but also wrt. goals features of their regex engines, performance goals, terminal syntax highlighting etc.)
12 replies →
Is that a lot for an application that does what it does?
maybe the borrow checker takes most compile time if you take an avergae of how often it runs vs. how often the next compile phases are triggered over code lifespan :') (yes ok so i dont do well with lifetimes hah)
> They're not;
it's complicated, and simple
the simple answer is rust compiler times are not dominated by the borrow checker at all, so "it's fast" and you can say it's not overly related to their being borrow checking
the other simple answer is that a simple reasonable well implemented borrow checker is pretty much always fast
to complicated answer is that rusts borrow checker isn't simple as there are a _huge lot_ of code a simple borrow checker wouldn't allow which is safe and people want to write and the borrow checker rust needs to run to support all that edge cases has to basically run a constraint solver. (Which a) is a thing which in O notation is quite slow and b) is a thing CS has researched optimizations and heuristics for since decades so it is often quite fast in practice.) And as far as I remember rust currently does (did? wanted to?) run this in two layers, the simple checker checks most code and the more powerful on only engages for the cases where the simple checker fails. But , like mentioned, as the compilation still isn't dominated by the borrow checker this doesn't exactly mean its slow.
So the borrow checker isn't an issue and if you create a C-like language with a rust like borrow checker it will compile speedily, at least theoretically, if you then also have a tone of code gen and large compilation units you might run into similar issues as rust does ;)
Also recently most of the "especially bad cases" project in rust have run into (wrt. compiler times, AFIK) all had the same kind of pattern: A enormous huge amount of code (often auto generated, often even huge before monomorphization) being squeezed into very few (often one single) LLVM compilation unit leading to both LLVM struggling hard with optimizations and then the linker drowning, too. And here is the thing, that can happen to you in C too and then your compilation times will be terrible, too. Through people tend to very rarely run into it in C.
> not low-hanging fruit, requiring major architectural changes, so it'd require a large investment of resources which no one has put up.
it still happens from time to time (e.g. polenious) and then there are still many "hard" but very useful improvements which don't require any large scale architectural changes and also some bigger issues which wouldn't be fixed by large scale architectural improvements. So not sure if we are anywhere close to needing a large scale architectural overall in rustc, probably not.
E.g. in a somewhat recent article about way too long rust compiler times many HN comments thought that rustc had some major architectural issues wrt. parallelization, but the issue was that rust failed to properly subsection the massive auto-generated crate when handing code units to LLVM and that isn't an architectural issue. Or e.g. not replacing LLVM with cranelift (if viable) for the change->unit test loop is a good example for a change which can largely improve dev experience/decrease compiler times for the place where it matters the most (technically it does change the architecture of the stack, and needed many many small changes to allow a non LLVM backend, but it's not "a major rewrite(architectural) change" in the rustc compiler code).