← Back to context

Comment by dboreham

2 days ago

I'd vote for filesystem space utilization to be worked on before performance.

The problems are largely related. Cut down the amount of intermediate compilation artifacts by half and you'll have sped up the compiler substantially. Monomorphization and iterator expansion and such is a significant contributor to both issues.

One of the reasons I quit rust is literally because having 4-5 projects checked out that use serde would fill my laptop drive with junk in a few weeks.

Why not both?

If I had to choose though, I would choose compilation speed. Buying an SSD to double my storage is much more cost effective than buying a bulkier processor to halve my compilation times.

  • > Why not both?

    For better and worse, the Rust project is a "show-up-ocracy": the work that gets done is the one that volunteers (or paid employees from a company donating their time to the project) spend time doing. It's hard in open source projects to tell people "this is important and you must work on it", but rather you have to convince people that it is important and have to hope they will have the time, inclination and skill to work on it.

    FWIW, people were working on the cargo cache GC feature years ago[1], but I am not aware of what the current state of that is. I wouldn't be surprised if it wasn't turned on because there are unresolved questions.

    1: https://blog.rust-lang.org/2023/12/11/cargo-cache-cleaning/

Is this not more a Cargo thing? Cargo is obsessed with correct builds and eventually the file system fills up with old artifacts.

(I know, I have to declare Cargo bankruptcy every few weeks and do a full clean & rebuild)

  • A little of both. Incremental compilation cache is likely the single largest item in the target directory but it gets cleaned up on each invocation so it doesn't scale in size with time.

    I believe the next release will have a cache GC but only for global caches (e.g. `.crate` files). I'd like us to at least cleanup the layout of the target directory so its easier to track stuff before GCing it. Work is underway for this. A cheap GC we could add earlier is for artifacts specific to older cargo versions.

  • Correct builds != never running garbage cleanup. I would settle for it evicting older variants of a build (I also dislike the random hash that’s impossible to determine what specifically is different between two hashes / which one is newer).

Not that this isn't a problem, it is, target folders currently take up ~100gb on my machine but...

I'd still, by far, prefer a tiny incremental compile speed increase over a substantial storage reduction. I can get a bigger SSD, I can't get my time back :'(

This is one of the only reasons I disliked Haskell. GHC and lib files can easily take over 2GB of storage.

What’s stopping cargo from storing libraries in one global directory(via hash or whatever), to be re-used whenever needed?

  • That work is being tracked in https://github.com/rust-lang/cargo/issues/5931

    Someone has taken up the work on this though there are some foundational steps first.

    1. We need to delineate intermediate and final build artifacts so people have a clearer understanding in `target/` what has stability guarantees (implemented, awaiting stabilization).

    2. We then need to re-organize the target directory from being organized by file type to being organized by crate instance.

    3. We need to re-do the file locking for `target/` so when we share things, one cargo process won't lock out your entire system

    4. We can then start exploring moving intermediate artifacts into a central location.

    There are some caveats to this initial implementation

    - To avoid cache poisoning, this will only items with immutable source that and an idempotent build, leaving out your local source and stuff that depends on build scripts and proc-macros. There is work to reduce the reliance on build scripts and proc-macros. We may also need a "trust me, this is idempotent" flag for some remaining cases.

    - A new instance of a crate will be created in the cache if any dependency changes versions, reducing reuse. This becomes worse when foundation crates release frequently and when adding or updating a specific dependency, Cargo prefers to keep all existing versions, creating a very unpredictable dependency tree. Support for remote caches, especially if you can use your project's CI as a cache source, would help a lot with this.

Performance has been worked on since Rust 1.0 or so, for all this time, there has been lots of work on compiler performance. There's no "before performance" :)

  • There are multiple avenues to improve both first and incremental compiles. They "just" require large architectural changes that may yield marginal improvements, so it's hard to get those projects off the ground. But after the last project all-hands I do expect at least one of these to be pursued, of not more.

    There are cases where cargo recompiles crates "unnecessarily", cargo+rustc could invert the compilation pyramid to start at the root and then only compile reachable items (similar effect to LTO, but can produce a binary if an item with a compile error isn't evaluated, improving clean compiles), having better communication with linkers and having access to incremental linking would be a boon for incremental compiles, etc.