Comment by GoblinSlayer
6 months ago
On my system grep is 136kb, dynamically linked to libc. Frankly, I don't use all its features and all its regex engines, can't remember when I searched for something more complex than fixed text. It supports coloring and references TERM variable. fgrep is 80kb, apparently supports all features except for coloring and pcre.
This is a bit of an aside, but I wonder if it'd be possible to combine the advantages of dynamic and static linking - meaning no need to do dlsym calls, and virtual dispatch to call into libraries, you could have fixed addresses for functions while sharing dynamic libraries between processes.
How it would work, is the process would tell the OS what libraries it expects to be loaded and at what address, and it would just mmap that piece of shared readonly memory. This mapping step happens for shared libraries anyways, but the symbol resolution is dynamic.
I imagine this would also harden the application somewhat, as there are less dynamic dispatches for hackers to exploit.
If you knew in advance the addresses of each function, it wouldn't be very dynamic. It would be akin to statically link the entire system, which makes sense in embedded systems, but probably not on a system where you would use ripgrep.
It seems that what you are suggesting is using position dependent code, which has performance benefits, especially on 32-bit x86, but it seems that we are moving away from it, one reason being that modern hardware supports position independent code better, and also for security.
Having fixed addresses means that hackers know exactly where functions are, making their life easier for writing their shellcodes. As a result, a common security feature we tend to see more and more is ASLR, where code is relocated at random addresses, even for statically-linked code, which is the exact opposite of what you are suggesting.
What you are suggesting could protect against things like DLL injection, which is something I consider more of a feature than a bug. For me, the small improvement in security is not worth it, especially since it would be incompatible with ASLR. Some of these benefits could be achieved by hardening the dynamic linker without changing the executables and libraries themselves.
It doesn't have to be dynamic 99% of the time, most/all of your dependencies tend to be static in the sense that you know them in advance. Yet you'd still have less memory footprint than static linking, better startup time than both dynamic linking (no symbol resolution) and static linking (less stuff to load).
Not sure what you're suggesting with position dependent code. All libraries would still be position independent, but libfoo would be loaded at 0x5000000 in executable A and 0x6000000 in executable B.
For ASLR, you'd have the exact same performance and security characteristics as statically linked code. Either you could go position dependent and forgo ASLR, or go position independent and randomly shift your base address, in which case the loaded libs would need to account for that (for example if ASLR decides to load your process at 0x10 then libfoo would be loaded at 0x5000010).
Also I don't see any reason why you couldn't combine this with dynamic linking, and static linking in the same process.