Ever since mold relicensed from AGPL to MIT (as part of mold 2.0 release), the worldwide need for making another fast linker has been greatly reduced, so I wasn't expecting a project like this to appear. And definitely wasn't expecting it to already be 2x faster than mold in some cases. Will keep an eye on this project to see how it evolves, best of luck to the author.
Note that Mold has no interest in becoming incremental, so there is a big reason there for another linker to exist. I find it kind of embarrassing that MS' linker has been incremental by default for decades, yet there's no production ready incremental linker on Linux yet.
OTOH even lld, fast but fairly slower than mold, is already incredibly faster than MS's linker even without the incrmeentality. Like, I'm routinely linking hundreds of megabytes in less than a second anyways, not sure incrementality is that much worth it
Additionally the way precompiled headers are handled in Visual C++ and C++ Builder have always been much better than traditional UNIX compilers, and now we have modules as well.
Hmm, my naive summary of AGPL is "If you run AGPL code in your web backend you are obliged to offer the backend source to everyone using a web client". No wonder it's explicitly forbidden at Google.
What does that mean for a linker? If you ship a binary linked with an AGPL linker you need to offer the source of the linker? Or of the program being linked?
iirc the mold author wanted to make money off of it (and I dont blame him).
AGPL is avoided like the plague by big corps: same big corps are known for having money to pay for licenses and sometimes (yes, I look at you Amazon) being good at deriving value from FLOSS without giving back.
iirc AGPL was used so everyone can just use it, big biz is still compelled to buy a license. this has been done before and can be seen as one of the strategies to make money off FLOSS.
Corps want to be able to release and use tools that take away the freedoms that GPL-family licenses provide. Often this results in duplication of effort.
This is not theoretical; it happens quite frequently. For toolchains, in particular I'm aware of how Apple (not that they're unique in this) has "blah blah open source" downloads, but often they do not actually correspond with the binaries. And not just "not fully reproducible but close" but "entirely new and incompatible features".
The ARM64 saga is a notable example, which went on for at least six months (at least Sept 2013 to March 2014). XCode 5 shipped with a closed-source compiler only for all that time.
Corps don't want to have to release the source code for their internal forks. They could also potentially be sued for everything they link using it because the linked binaries could be "derivative works" according to a judge who doesn't know anything.
what is the status of Windows support in mold? reading the github issues leads to a circular confusion, the author first planned to support it, then moved Windows support to the sold linker, but then sold got archived recently so in the end there is no Windows support or did I just misunderstand the events?
Mold will be faster than LLD even using LTO, but all of its benefits will be absolutely swamped by the LTO process, which is, more or less, recompiling the entire program from high-level LLVM-IR. That's extremely expensive and dwarfs any linking advantages.
So the benefit will be barely noticable. As another comment points out, LTO should only be used when you need a binary optimized to within an inch of its life, such as a release copy, or a copy for performance testing.
Yeah, if you're development process requires LTO you may be holding it wrong....
Specifically, if LTO is so important that you need to be using it during development, you likely have a very exceptional case, or you have some big architectural issues that are causing much larger performance regressions then they should be.
Agreed. Both fast and small are desirable for sandboxed (least authority) isomorphic (client and server) microservices with WebAssembly & related tech.
Yes if you are the only developper and never received nor accepted external contributions or if you managed to get permission from every single person who contributed or replaced their code with your own.
Yes. Generally you need permissions from contributors (either asking them directly or requiring a contribution agreement that assigns copyright for contributions to either the author or the org hosting the project), but you can relicense from any license to any other license.
That doesn't extinguish the prior versions under the prior license, but it does allow a project to change its license.
I looked at this before, is it ready for production? I thought not based on the readme, so I'm still using mold.
For those on macOS, Apple released a new linker about a year or two ago (which is why the mold author stopped working on their macOS version), and if you're using it with Rust, put this in your config.toml:
What would be refreshing would be a C/C++ compiler that did away with the intermediate step of linking and built the whole program as a unit. LTO doesn't even have to be a thing if the compiler can see the entire program in the first place. It would still have to save some build products so that incremental builds are possible, but not as object files, the compiler would need metadata to know of the origin and dependencies of all the generated code so it would be able to replace the right things.
External libs are most often linked dynamically these days, so they don't need to be built from source, so eliminating the linker doesn't pose a problem for non-open source dependencies. And if that's not enough letting the compiler also consume object files could provide for legacy use cases or edge cases where you must statically link to a binary.
I totally see the point of this, but still, you have to admit this is pretty funny:
> Developers sometimes experience trouble debugging the quarter-million line amalgamation source file because some debuggers are only able to handle source code line numbers less than 32,768 [...] To circumvent this limitation, the amalgamation is also available in a split form, consisting of files "sqlite3-1.c", "sqlite3-2.c", and so forth, where each file is less than 32,768 lines in length
>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.
That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.
> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.
You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.
There’s been a lot of interest in faster linkers spurred by the adoption and popularity of rust.
Even modest statically linked rust binaries can take a couple of minutes in the link stage of compilation in release mode (using mold). It’s not a rust-specific issue but an amalgam of (usually) strictly static linking, advanced link-time optimizations enabled by llvm like LTO and bolt, and a general dissatisfaction with compile times in the rust community. Rust’s (clinically) strong relationship with(read: dependency on) LLVM makes it the most popular language where LLVM link-time magic has been most heavily universally adopted; you could face these issues with C++ but it wouldn’t be chalked up to the language rather than your toolchain.
I’ve been eyeing wild for some time as I’m excited by the promise of an optimizing incremental linker, but to be frank, see zero incentive to even fiddle with it until it can actually, you know, link incrementally.
C++ can be rather faster to compile than Rust, because some compilers do have incremental compilation, and incremental linking.
Additionally, the acceptance of binary libraries across the C and C++ ecosystem, means that more often than not, you only need to care about compiling you own application, and not the world, every time you clone a repo, or switch development branch.
I solved this by using Wasm. Your outer application shell calls into Wasm business logic, only the inner logic needs to get recompiled, the outer app shell doesn't even need to restart.
Can you name a few of these features, for those of us who don't know much about linking beyond the fact that it takes compiled object files and makes an executable (and maybe does LTO)?
I’m not sure if you’re intending to leave a negative or positive remark, or just a brief history, but the fact that people are still managing to squeeze better performance into linkers is very encouraging to me.
Certainly no intention to be negative. Not having run the numbers, I don't know if the older ones got slower over time due to more features, or the new ones are squeezing out new performance gains. I guess it's also partly that the bigger codebases scaled up so much over this period, so that there are gains to be had that weren't interesting before.
Unfortunately gcc doesn't accept arbitrary linkers via the `-fuse-ld=` flag. The only linkers it accepts are bfd, gold lld and mold. It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.
As for why Rust uses gcc or clang to invoke the linker rather than invoking the linker directly - it's because the C compiler knows what linker flags are needed on the current platform in order to link against libc and the C runtime. Things like `Scrt1.o`, `crti.o`, `crtbeginS.o`, `crtendS.o` and `crtn.o`.
> It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.
Instead of renaming and passing -B in, you can also modify the GCC «spec» file's «%linker» section to make it point to a linker of your choice, i.e.
%linker:
/scratch/bin/wild %{wild_options}
Linking options can be amended in the «%link_command» section.
It is possible to either modify the default «spec» file («gcc -dumpspecs») or pass your own along via «-specs=my-specs-file». I have found custom «spec» files to be very useful in the past.
I'm curious: what's the theory behind why this would be faster than mold in the non-incremental case? "Because Rust" is a fine explanation for a bunch of things, but doesn't explain expected performance benefits.
"Because there's low hanging concurrent fruit that Rust can help us get?" would be interesting but that's not explicitly stated or even implied.
I'm not actually sure, mostly because I'm not really familiar with the Mold codebase. One clue is that I've heard that Mold gets about a 10% speedup by using a faster allocator (mimalloc). I've tried using mimalloc with Wild and didn't get any measurable speedup. This suggests to me that Mold is probably making heavier use of the allocator than Wild is. With Wild, I've certainly tried to optimise the number of heap allocations.
But in general, I'd guess just different design decisions. As for how this might be related to Rust - I'm certain that were Wild ported from Rust to C or C++, that it would perform very similarly. However, code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++, so maintaining that code could be tricky. Certainly when I've coded in C++ in the past, I've found myself coding more defensively, even at a small performance cost, whereas with Rust, I'm able to be a lot bolder because I know the compiler has got my back.
Rust is a perfectly fine language, and there's no reason you should not be able to implement fast incremental linking using Rust, so - I wish you success in doing that.
... however...
> code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++,
That "dig" is probably not true. Or rather, your very conflation of C and C++ suggests that you are talking about the kind of code which would not be used in modern C++ of the past decade-or-more. While one _can_ write footguns in C++ easily, one can also very easily choose not to do so - especially when writing a new project.
What a coincidence. :) Just an hour ago I compared the performance of wild, mold, and (plain-old) ld on a C project I'm working on. 23 kloc and 172 files. Takes about 23.4 s of user time to compile with gcc+ld, 22.5 s with gcc+mold, and 21.8 s with gcc+wild. Which leads me to believe that link time shouldn't be that much of a problem for well-structured projects.
It sounds like you're building from scratch. In that case, the majority of the time will be spent compiling code, not linking. The case for fast linkers is strongest when doing iterative development. i.e. when making small changes to your code then rebuilding and running the result. With a small change, there's generally very little work for the compiler to do, but linking is still done from scratch, so tends to dominate.
Yep in my case I have 11 * 450MB executables that take about 8 minutes to compile and link. But for small iterative programming cycles using the standard linker with g++, it takes about 30 seconds to link (If I remember correctly). I tried mold and shaved 25% of that time, which didn't seem worth the change overall; attempted wild a year ago but ran into issues, but will revisit at some point.
Exactly. But also even in build-from-scratch use-case when there's a multitude of binaries to be built - think 10s or 100s of (unit, integration, performance) test binaries or utilities that come along with the main release binary etc. Faster linkers giving even a modest 10% speedup per binary will quickly accumulate and will obviously scale much better.
True, I didn't think of that. However, the root cause here perhaps is fat binaries? My preferred development flow consists of many small self-contained dynamically linked libraries that executables link to. Then you only have to relink changed libraries and not executables that depend on them.
NB. This is not to suggest wild is bloated. The issue if any is the software being developed with it and the computers of those who might use such software.
Half in jest, but I'd think anybody coding in Rust already has 32GB of RAM...
(Personally, upgrading my laptop to 64GB at the expense of literally everything else was almost a great decision. Almost, because I really should have splurged on RAM and display instead of going all-in on RAM. The only downside is that cleaning up open tabs once a week became a chore, taking up the whole evening.)
ELF(COFF) should now be only an assembler output format on modern large hardware architecture.
On modern large hardware architecture, for executable files/dynamic libraries, ELF(PE[+]) has overkill complexity.
I am personnally using a executable file format of my own I do wrap into an "ELF capsule" on linux kernel. With position independent code, you kind of only need memory mapped segments (which dynamic libraries are in this very format). I have two very simple partial linkers I wrote in plain and simple C, one for risc-v assembly, one for x86_64 assembly, which allow me to link into such executable file some simple ELF object files (from binutils GAS).
There is no more centralized "ELF loader".
Of course, there are tradeoffs, 1 billion times worth it in regards of the accute simplicity of the format.
(I even have a little vm which allows me to interpret simple risc-v binaries on x86_64).
Not yet. The Linux kernel uses linker scripts, which Wild doesn't yet support. I'd like to add support for linker scripts at some point, but it's some way down the priority list.
Compilers take the code the programmer writes, and turns it into things called object files. Object files are close to executable by the target processor, but not completely. There are little places where the code needs to be rewritten to handle access to subroutines, access operating system functionality, and other things.
A linker combines all these object files, does the necessary rewriting, and generates something that the operating system can use.
It's the final step in building an executable.
--
More complicatedly: a linker is a little Turing machine that runs over the object files. Some can do complicated things like rewriting code, or optimizing across function calls. But, fundamentally, they plop all the object files together and follow little scripts (or rewrites) that clean up the places the compiler couldn't properly insert instructions because the compiler doesn't know the final layout of the program.
I think the optimal approach for development would be to not produce a traditional linked executable at all, but instead just place the object files in memory, and then produce a loader executable that hooks page faults in those memory areas and on-demand mmaps the relevant object elsewhere, applies relocations to it, and then moves it in place with mremap.
Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.
This would basically make a copyless lazy incremental linker.
This makes some very naïve assumptions about the relationships between entities in a program; in particular that you can make arbitrary assertions about the representation of already-allocated datastructures across multiple versions of a component, that the program's compositional structure morphs in understandable ways, and that you can pause a program in a state where a component can actually be replaced.
By the time you have addressed these, you'll find yourself building a microkernel system with a collection of independent servers and well-defined interaction protocols. Which isn't necessarily a terrible way to assemble something, but it's not quite where you're trying to go...
You can sort of do that with some of LLVM's JIT systems https://llvm.org/docs/JITLink.html, I'm surprised that no one has yet made a edit and continue system using it.
> Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.
Isn't this how dynamic linking works? If you really want to reduce build times, you should be making your hot path in the build a shared library, so you don't have to relink so long as you're not changing the interface.
> Mold is already very fast, however it doesn't do incremental linking and the author has stated that they don't intend to. Wild doesn't do incremental linking yet, but that is the end-goal. By writing Wild in Rust, it's hoped that the complexity of incremental linking will be achievable.
Can someone explain what is so special about Rust for this?
I assume that he is referring to "fearless concurrency", the idea that Rust makes it possible to write more complex concurrent programs than other languages because of the safety guarantees:
1. mold doesn't do incremental linking because it is too complex to do it while still being fast (concurrent).
2. Rust makes it possible to write very complex fast (concurrent) programs.
3. A new linker written in Rust can do incremental linking while still being fast (concurrent).
EDIT: I meant this originally, but comments were posted before I added it so I want to be clear that this part is new: (Any of those three could be false; I take no strong position on that. But I believe that this is the motivating logic.)
Actually a lot of the hacks that mold uses to be the fastest linker would be, ironically, harder to reproduce with rust because they’re antithetical to its approach. Eg Mold intentionally eschews used resource collection to speed up execution (it’ll be cleaned up by the os when the process exits) while rust has a strong RAII approach here that would introduce slowdowns.
That’s puzzling to me too. Rust is a great language, and probably makes developing Wild faster. But the complexity of incremental linking doesn’t stem from the linker’s implementation language. It stems from all the tracking, reserved spacing, and other issues required to link a previously linked binary (or at least parts of it) a second time.
Rust allows your to enforce more invariants at compile time, so implementing a complex system where you are likely to make a mistake and violate those invariants is easier.
I would guess the idea is that in Rust the complexity is cheaper on a "per unit" basis so you can afford more complexity. So yes, it is a more complicated problem than the previous linkers, but, in Rust maybe you can get that done anyway.
1. Rust's well designed type system and borrow checker makes writing code that works just easier. It has the "if it compiles it works" property (not unique to Rust; people say this about e.g. Haskell too).
2. Rust's type system - especially its trait system can be used to enforce safety constraints statically. The obvious one is the Send and Sync traits for thread safety, but there are others, e.g. the Fuchsia network code statically guarantees deadlocks are impossible.
Mold is written in C++ which is extremely error prone in comparison.
It's feasible to write complex correct programs with optimal performance in Rust, unlike any other programming language (complex+correct is not feasible in C/C++/assembly/Zig/etc., optimal performance not possible in any other language).
That is baffling. Maybe the author assumes that a language with many safeguards will lead to keeping complexity under control for a difficult task.
By the way I had to lookup what incremental linking is, in practice I think it means that code from libraries and modules that have not changed won’t need to be re-packed each time which ch will save time for frequent development builds, it’s actually ingenious
Ever since mold relicensed from AGPL to MIT (as part of mold 2.0 release), the worldwide need for making another fast linker has been greatly reduced, so I wasn't expecting a project like this to appear. And definitely wasn't expecting it to already be 2x faster than mold in some cases. Will keep an eye on this project to see how it evolves, best of luck to the author.
Note that Mold has no interest in becoming incremental, so there is a big reason there for another linker to exist. I find it kind of embarrassing that MS' linker has been incremental by default for decades, yet there's no production ready incremental linker on Linux yet.
OTOH even lld, fast but fairly slower than mold, is already incredibly faster than MS's linker even without the incrmeentality. Like, I'm routinely linking hundreds of megabytes in less than a second anyways, not sure incrementality is that much worth it
8 replies →
Additionally the way precompiled headers are handled in Visual C++ and C++ Builder have always been much better than traditional UNIX compilers, and now we have modules as well.
2 replies →
It has to be a candidate for the longest biggest gap in build tooling ever.
[flagged]
5 replies →
Why does AGPL Vs MIT matter for a linker?
Hmm, my naive summary of AGPL is "If you run AGPL code in your web backend you are obliged to offer the backend source to everyone using a web client". No wonder it's explicitly forbidden at Google.
What does that mean for a linker? If you ship a binary linked with an AGPL linker you need to offer the source of the linker? Or of the program being linked?
2 replies →
iirc the mold author wanted to make money off of it (and I dont blame him).
AGPL is avoided like the plague by big corps: same big corps are known for having money to pay for licenses and sometimes (yes, I look at you Amazon) being good at deriving value from FLOSS without giving back.
iirc AGPL was used so everyone can just use it, big biz is still compelled to buy a license. this has been done before and can be seen as one of the strategies to make money off FLOSS.
2 replies →
Corps want to be able to release and use tools that take away the freedoms that GPL-family licenses provide. Often this results in duplication of effort.
This is not theoretical; it happens quite frequently. For toolchains, in particular I'm aware of how Apple (not that they're unique in this) has "blah blah open source" downloads, but often they do not actually correspond with the binaries. And not just "not fully reproducible but close" but "entirely new and incompatible features".
The ARM64 saga is a notable example, which went on for at least six months (at least Sept 2013 to March 2014). XCode 5 shipped with a closed-source compiler only for all that time.
1 reply →
Corps don't want to have to release the source code for their internal forks. They could also potentially be sued for everything they link using it because the linked binaries could be "derivative works" according to a judge who doesn't know anything.
17 replies →
what is the status of Windows support in mold? reading the github issues leads to a circular confusion, the author first planned to support it, then moved Windows support to the sold linker, but then sold got archived recently so in the end there is no Windows support or did I just misunderstand the events?
Maybe I'm holding it wrong, but mold isn't faster at all if you're using LTO, which you probably should be.
Mold will be faster than LLD even using LTO, but all of its benefits will be absolutely swamped by the LTO process, which is, more or less, recompiling the entire program from high-level LLVM-IR. That's extremely expensive and dwarfs any linking advantages.
So the benefit will be barely noticable. As another comment points out, LTO should only be used when you need a binary optimized to within an inch of its life, such as a release copy, or a copy for performance testing.
2 replies →
I think we're talking about non-release builds here. In those, you don't want to use LTO, you just want to get that binary as fast as possible.
Yeah, if you're development process requires LTO you may be holding it wrong....
Specifically, if LTO is so important that you need to be using it during development, you likely have a very exceptional case, or you have some big architectural issues that are causing much larger performance regressions then they should be.
7 replies →
Agreed. Both fast and small are desirable for sandboxed (least authority) isomorphic (client and server) microservices with WebAssembly & related tech.
You should be using LTO where incremental build times are a concern, i.e. for development builds.
And for realease builds link time is hardly a concern.
Wait a minute, it’s possible to relicense something from GPL to MIT?
Yes if you are the only developper and never received nor accepted external contributions or if you managed to get permission from every single person who contributed or replaced their code with your own.
6 replies →
Yes. Generally you need permissions from contributors (either asking them directly or requiring a contribution agreement that assigns copyright for contributions to either the author or the org hosting the project), but you can relicense from any license to any other license.
That doesn't extinguish the prior versions under the prior license, but it does allow a project to change its license.
I looked at this before, is it ready for production? I thought not based on the readme, so I'm still using mold.
For those on macOS, Apple released a new linker about a year or two ago (which is why the mold author stopped working on their macOS version), and if you're using it with Rust, put this in your config.toml:
No, the author is pretty clear that it shouldn't be used for production yet
Great, I'll keep a look out but will hold off on using it for now.
I don't even use mold for production. It's for development.
Isn't the new linked just the default these days? I'm not sure adding that has any effect.
Can you confirm that's still the right location for Sequioa?
I have the command line tools installed and I only have /usr/bin/ld and /usr/bin/ld-classic
Then it'd be the /usr/bin/ld as I believe my solution was for before they moved the linker it seems.
/usr/bin/ld will correctly invoke the right linker, it's a stub to look at your developer dir and reexec.
What would be refreshing would be a C/C++ compiler that did away with the intermediate step of linking and built the whole program as a unit. LTO doesn't even have to be a thing if the compiler can see the entire program in the first place. It would still have to save some build products so that incremental builds are possible, but not as object files, the compiler would need metadata to know of the origin and dependencies of all the generated code so it would be able to replace the right things.
External libs are most often linked dynamically these days, so they don't need to be built from source, so eliminating the linker doesn't pose a problem for non-open source dependencies. And if that's not enough letting the compiler also consume object files could provide for legacy use cases or edge cases where you must statically link to a binary.
SQLite3 just concatenation everything together into one compilation unit. So, more people have been using this than probably know about it.
https://sqlite.org/amalgamation.html
I totally see the point of this, but still, you have to admit this is pretty funny:
> Developers sometimes experience trouble debugging the quarter-million line amalgamation source file because some debuggers are only able to handle source code line numbers less than 32,768 [...] To circumvent this limitation, the amalgamation is also available in a split form, consisting of files "sqlite3-1.c", "sqlite3-2.c", and so forth, where each file is less than 32,768 lines in length
12 replies →
*concatenates
Apologies for the typo. And now it is too late to edit the post.
[flagged]
>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.
That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.
https://gcc.gnu.org/onlinedocs/gccint/IPA.html https://gcc.gnu.org/onlinedocs/gccint/IPA-passes.html
It scales to millions of lines of code because it's partioned.
> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.
You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.
SQLite3 may be a counter-example:
https://sqlite.org/amalgamation.html
There’s been a lot of interest in faster linkers spurred by the adoption and popularity of rust.
Even modest statically linked rust binaries can take a couple of minutes in the link stage of compilation in release mode (using mold). It’s not a rust-specific issue but an amalgam of (usually) strictly static linking, advanced link-time optimizations enabled by llvm like LTO and bolt, and a general dissatisfaction with compile times in the rust community. Rust’s (clinically) strong relationship with(read: dependency on) LLVM makes it the most popular language where LLVM link-time magic has been most heavily universally adopted; you could face these issues with C++ but it wouldn’t be chalked up to the language rather than your toolchain.
I’ve been eyeing wild for some time as I’m excited by the promise of an optimizing incremental linker, but to be frank, see zero incentive to even fiddle with it until it can actually, you know, link incrementally.
C++ can be rather faster to compile than Rust, because some compilers do have incremental compilation, and incremental linking.
Additionally, the acceptance of binary libraries across the C and C++ ecosystem, means that more often than not, you only need to care about compiling you own application, and not the world, every time you clone a repo, or switch development branch.
compiling crates in parallel is fast on a good machine. OTOH managing C++ dependencies without a standard build & packaging system is a nightmare
1 reply →
I solved this by using Wasm. Your outer application shell calls into Wasm business logic, only the inner logic needs to get recompiled, the outer app shell doesn't even need to restart.
I don’t think I can use wasm with simd or syscalls, which is the bulk of my work.
1 reply →
How is this different than dynamically linking the business logic library?
1 reply →
2008: Gold, a new linker, intended to be faster than Gnu LD
2015(?): Lld a drop in replacement linker, at least 2x as fast as Gold
2021: mold, a new linker, several times faster than lld
2025: wild, a new linker...
Rarely mentioned: all of these occur at the cost of not implementing a very large number of useful features used by real-world programs.
Like ICF? Wait no, everyone supports that except GNU ld.
Can you name a few of these features, for those of us who don't know much about linking beyond the fact that it takes compiled object files and makes an executable (and maybe does LTO)?
4 replies →
I’m not sure if you’re intending to leave a negative or positive remark, or just a brief history, but the fact that people are still managing to squeeze better performance into linkers is very encouraging to me.
Certainly no intention to be negative. Not having run the numbers, I don't know if the older ones got slower over time due to more features, or the new ones are squeezing out new performance gains. I guess it's also partly that the bigger codebases scaled up so much over this period, so that there are gains to be had that weren't interesting before.
1 reply →
Gold is slated for removal from binutils for version 2.44.0, so it's officially dead.
Where is the effort going now? lld?
For windows, there is also [The RAD Linker](https://github.com/EpicGamesExt/raddebugger?tab=readme-ov-fi...) though quite early days.
Related, and a good one, though old:
The book Linkers and Loaders by John Levine.
Last book in the list here:
https://www.johnlevine.com/books.phtml
I had read it some years ago, and found it quite interesting.
It's a standard one in the field.
He has also written some other popular computer books (see link above - pun not intended, but noticed).
That looks promising. In Rust to begin with and with the goal of being fast and support incremental linking.
To use it with Rust, this can probbaly also work using gcc as linker driver.
In project's .cargo/config.toml:
Side note, but why does Rust need to plug into gcc or clang for that? Some missing functionality?
Unfortunately gcc doesn't accept arbitrary linkers via the `-fuse-ld=` flag. The only linkers it accepts are bfd, gold lld and mold. It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.
As for why Rust uses gcc or clang to invoke the linker rather than invoking the linker directly - it's because the C compiler knows what linker flags are needed on the current platform in order to link against libc and the C runtime. Things like `Scrt1.o`, `crti.o`, `crtbeginS.o`, `crtendS.o` and `crtn.o`.
> It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.
Instead of renaming and passing -B in, you can also modify the GCC «spec» file's «%linker» section to make it point to a linker of your choice, i.e.
Linking options can be amended in the «%link_command» section.
It is possible to either modify the default «spec» file («gcc -dumpspecs») or pass your own along via «-specs=my-specs-file». I have found custom «spec» files to be very useful in the past.
The «spec» file format is documented at https://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html
Ah, good to know, thanks!
May be it's worth filing a feature request for gcc to have parity with clang for arbitrary linkers?
Because Rust compiler generates IR bytecode, not machine code.
That's the reason to use llvm as part of Rust compiler toolchain, not to use gcc or clang as linker manager?
1 reply →
I'm curious: what's the theory behind why this would be faster than mold in the non-incremental case? "Because Rust" is a fine explanation for a bunch of things, but doesn't explain expected performance benefits.
"Because there's low hanging concurrent fruit that Rust can help us get?" would be interesting but that's not explicitly stated or even implied.
I'm not actually sure, mostly because I'm not really familiar with the Mold codebase. One clue is that I've heard that Mold gets about a 10% speedup by using a faster allocator (mimalloc). I've tried using mimalloc with Wild and didn't get any measurable speedup. This suggests to me that Mold is probably making heavier use of the allocator than Wild is. With Wild, I've certainly tried to optimise the number of heap allocations.
But in general, I'd guess just different design decisions. As for how this might be related to Rust - I'm certain that were Wild ported from Rust to C or C++, that it would perform very similarly. However, code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++, so maintaining that code could be tricky. Certainly when I've coded in C++ in the past, I've found myself coding more defensively, even at a small performance cost, whereas with Rust, I'm able to be a lot bolder because I know the compiler has got my back.
> Mold gets about a 10% speedup by using a faster allocator (mimalloc). I've tried using mimalloc with Wild and didn't get any measurable speedup
Perhaps it is worth repeating the experiment with heavy MLoC codebases. jmalloc or mimalloc.
Rust is a perfectly fine language, and there's no reason you should not be able to implement fast incremental linking using Rust, so - I wish you success in doing that.
... however...
> code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++,
That "dig" is probably not true. Or rather, your very conflation of C and C++ suggests that you are talking about the kind of code which would not be used in modern C++ of the past decade-or-more. While one _can_ write footguns in C++ easily, one can also very easily choose not to do so - especially when writing a new project.
9 replies →
What a coincidence. :) Just an hour ago I compared the performance of wild, mold, and (plain-old) ld on a C project I'm working on. 23 kloc and 172 files. Takes about 23.4 s of user time to compile with gcc+ld, 22.5 s with gcc+mold, and 21.8 s with gcc+wild. Which leads me to believe that link time shouldn't be that much of a problem for well-structured projects.
It sounds like you're building from scratch. In that case, the majority of the time will be spent compiling code, not linking. The case for fast linkers is strongest when doing iterative development. i.e. when making small changes to your code then rebuilding and running the result. With a small change, there's generally very little work for the compiler to do, but linking is still done from scratch, so tends to dominate.
Yep in my case I have 11 * 450MB executables that take about 8 minutes to compile and link. But for small iterative programming cycles using the standard linker with g++, it takes about 30 seconds to link (If I remember correctly). I tried mold and shaved 25% of that time, which didn't seem worth the change overall; attempted wild a year ago but ran into issues, but will revisit at some point.
Exactly. But also even in build-from-scratch use-case when there's a multitude of binaries to be built - think 10s or 100s of (unit, integration, performance) test binaries or utilities that come along with the main release binary etc. Faster linkers giving even a modest 10% speedup per binary will quickly accumulate and will obviously scale much better.
True, I didn't think of that. However, the root cause here perhaps is fat binaries? My preferred development flow consists of many small self-contained dynamically linked libraries that executables link to. Then you only have to relink changed libraries and not executables that depend on them.
1 reply →
The linker time is important when building something like Chrome, not small projects.
Fast linkers are mostly useful in incremental compilation scenarios to cut down on the edit cycle.
How about ld.lld?
"These benchmark were run on David Lattimore's laptop (2020 model System76 Lemur pro), which has 4 cores (8 threads) and 42 GB of RAM."
https://news.ycombinator.com/item?id=33330499
NB. This is not to suggest wild is bloated. The issue if any is the software being developed with it and the computers of those who might use such software.
https://news.ycombinator.com/item?id=42896619
"... I have 16 GB of ram, I can't upgrade it..."
Half in jest, but I'd think anybody coding in Rust already has 32GB of RAM...
(Personally, upgrading my laptop to 64GB at the expense of literally everything else was almost a great decision. Almost, because I really should have splurged on RAM and display instead of going all-in on RAM. The only downside is that cleaning up open tabs once a week became a chore, taking up the whole evening.)
The real issue is actually runtime ELF (and PE) which are obsolete on modern hardware architecture.
What do you mean by this?
ELF(COFF) should now be only an assembler output format on modern large hardware architecture.
On modern large hardware architecture, for executable files/dynamic libraries, ELF(PE[+]) has overkill complexity.
I am personnally using a executable file format of my own I do wrap into an "ELF capsule" on linux kernel. With position independent code, you kind of only need memory mapped segments (which dynamic libraries are in this very format). I have two very simple partial linkers I wrote in plain and simple C, one for risc-v assembly, one for x86_64 assembly, which allow me to link into such executable file some simple ELF object files (from binutils GAS).
There is no more centralized "ELF loader".
Of course, there are tradeoffs, 1 billion times worth it in regards of the accute simplicity of the format.
(I even have a little vm which allows me to interpret simple risc-v binaries on x86_64).
2 replies →
Can it link the Linux kernel yet? Was a useful milestone for LLD.
Not yet. The Linux kernel uses linker scripts, which Wild doesn't yet support. I'd like to add support for linker scripts at some point, but it's some way down the priority list.
Does it at least support -Ttext, -Tdata, etc.?
Is it too late to ask what a linker is?
I'll ELI5:
Compilers take the code the programmer writes, and turns it into things called object files. Object files are close to executable by the target processor, but not completely. There are little places where the code needs to be rewritten to handle access to subroutines, access operating system functionality, and other things.
A linker combines all these object files, does the necessary rewriting, and generates something that the operating system can use.
It's the final step in building an executable.
--
More complicatedly: a linker is a little Turing machine that runs over the object files. Some can do complicated things like rewriting code, or optimizing across function calls. But, fundamentally, they plop all the object files together and follow little scripts (or rewrites) that clean up the places the compiler couldn't properly insert instructions because the compiler doesn't know the final layout of the program.
I just knew it's going to be Rust as soon as I've read the title.
I think the optimal approach for development would be to not produce a traditional linked executable at all, but instead just place the object files in memory, and then produce a loader executable that hooks page faults in those memory areas and on-demand mmaps the relevant object elsewhere, applies relocations to it, and then moves it in place with mremap.
Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.
This would basically make a copyless lazy incremental linker.
This makes some very naïve assumptions about the relationships between entities in a program; in particular that you can make arbitrary assertions about the representation of already-allocated datastructures across multiple versions of a component, that the program's compositional structure morphs in understandable ways, and that you can pause a program in a state where a component can actually be replaced.
By the time you have addressed these, you'll find yourself building a microkernel system with a collection of independent servers and well-defined interaction protocols. Which isn't necessarily a terrible way to assemble something, but it's not quite where you're trying to go...
You can sort of do that with some of LLVM's JIT systems https://llvm.org/docs/JITLink.html, I'm surprised that no one has yet made a edit and continue system using it.
My parens sense is tingling. This sounds like a lisp-machine, or just standard lisp development environment.
1 reply →
They have! It's called Julia and it's great.
Sounds like dynamic linking, sort of.
> Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.
Not exactly this, but Google's Propeller fixes up ("relinks") Basic Blocks (hot code as traced from PGO) in native code at runtime (like an optimizing JIT compiler would): https://research.google/pubs/propeller-a-profile-guided-reli...
Sounds like Apple's old ZeroLink from the aughts?
Isn't this how dynamic linking works? If you really want to reduce build times, you should be making your hot path in the build a shared library, so you don't have to relink so long as you're not changing the interface.
But do rust’s invariants work across dynamic links?
I thought a lot of its proofs were done at compile time not link time.
2 replies →
That sounds a lot like traditional dynamic language runtimes. You kind of get that for free with Smalltalk/LISP/etc.
Linker overlays?
> Mold is already very fast, however it doesn't do incremental linking and the author has stated that they don't intend to. Wild doesn't do incremental linking yet, but that is the end-goal. By writing Wild in Rust, it's hoped that the complexity of incremental linking will be achievable.
Can someone explain what is so special about Rust for this?
I assume that he is referring to "fearless concurrency", the idea that Rust makes it possible to write more complex concurrent programs than other languages because of the safety guarantees:
https://doc.rust-lang.org/book/ch16-00-concurrency.html
So the logic would go:
1. mold doesn't do incremental linking because it is too complex to do it while still being fast (concurrent).
2. Rust makes it possible to write very complex fast (concurrent) programs.
3. A new linker written in Rust can do incremental linking while still being fast (concurrent).
EDIT: I meant this originally, but comments were posted before I added it so I want to be clear that this part is new: (Any of those three could be false; I take no strong position on that. But I believe that this is the motivating logic.)
Actually a lot of the hacks that mold uses to be the fastest linker would be, ironically, harder to reproduce with rust because they’re antithetical to its approach. Eg Mold intentionally eschews used resource collection to speed up execution (it’ll be cleaned up by the os when the process exits) while rust has a strong RAII approach here that would introduce slowdowns.
10 replies →
Both mold and lld are already very heavily concurrent. There is no fear at all there.
That’s puzzling to me too. Rust is a great language, and probably makes developing Wild faster. But the complexity of incremental linking doesn’t stem from the linker’s implementation language. It stems from all the tracking, reserved spacing, and other issues required to link a previously linked binary (or at least parts of it) a second time.
Rust allows your to enforce more invariants at compile time, so implementing a complex system where you are likely to make a mistake and violate those invariants is easier.
I would guess the idea is that in Rust the complexity is cheaper on a "per unit" basis so you can afford more complexity. So yes, it is a more complicated problem than the previous linkers, but, in Rust maybe you can get that done anyway.
There are two main factors:
1. Rust's well designed type system and borrow checker makes writing code that works just easier. It has the "if it compiles it works" property (not unique to Rust; people say this about e.g. Haskell too).
2. Rust's type system - especially its trait system can be used to enforce safety constraints statically. The obvious one is the Send and Sync traits for thread safety, but there are others, e.g. the Fuchsia network code statically guarantees deadlocks are impossible.
Mold is written in C++ which is extremely error prone in comparison.
I assume they're referring to thread-safety and the ability to more aggressively parallelize.
Mold and lld are already very heavily parallelized. It’s one of the things that makes them very fast already.
I went looking for some writing by the author about how he made wild fast, but couldn't find much: https://davidlattimore.github.io/
Rust has a pretty good incremental caching compiler that makes debug builds relatively fast.
Linking is often a very notable bottleneck for debug binaries and mold can make a big difference.
So interest in speeding up linking for Rust is expected.
Apart from what others said, maybe he plans to use Salsa or something like that. Rust has a few popular libraries for doing this.
It's feasible to write complex correct programs with optimal performance in Rust, unlike any other programming language (complex+correct is not feasible in C/C++/assembly/Zig/etc., optimal performance not possible in any other language).
[flagged]
I guess they meant that why write in Rust.
1 reply →
That is baffling. Maybe the author assumes that a language with many safeguards will lead to keeping complexity under control for a difficult task.
By the way I had to lookup what incremental linking is, in practice I think it means that code from libraries and modules that have not changed won’t need to be re-packed each time which ch will save time for frequent development builds, it’s actually ingenious
[dead]
[dead]