Comment by gmueckl
7 hours ago
This assumes that executable code pages can be shared between processes. I'm skeptical that this is still a notable optimization on modern systems because dynamic linking writes to executable memory to perform relocations in the loaded code. So this would counteract copy on write. And at least with ASLR, the result should be different for each process anyway.
ld writes to the GOT. The executable segment where .text lives is not written to (it's position independent code in dynamic libraries).
ASLR is not an obstacle -- the same exact code can be mapped into different base addresses in different processes, so they can be backed by the same actual memory.
That’s true on most systems (modern or not), but actually never been true on Windows due to PE/COFF format limitations. But also, that system doesn’t/can’t do effective ASLR because of the binary slide being part of the object file spec.
I can't reconcile this with the code that GCC generates for accessing global variables. There is no additional indirection there, just a constant 0 address that needs to be replaced later.
OK, I spent a few additional minutes digging into this. It's been too long since I looked at those mechanisms. Turns out my brain was stuck in pre-PIE world.
Global variables in PIC shared libraries are really weird: the shared library's variable is placed into the main program image data segment and the relocation is happening in the shared library, which means that there is an indirection generated in the library's machine code.
Assuming the symbol is defined in the library, when the static linker runs (ld -- we're not talking ld.so), it will decide whether the global variable is preemptable or not, that is, if it can be resolved to a symbol outside the dso. Generally, by default it is, though this depends on many things -- visibility attributes, linker scripts, -Bsymbolic, etc. If it is, ld will have the final code reach into the GOT. If not, it can just use instruction (PC) relative offsets.
1 reply →
Are you looking at the code before or after the static linker runs?
Dynamic linking doesn't have to write to code. I'm not familiar with other platforms, but on macOS, relocations are all in data, and any code that needs a relocation will indirect through non-code pages. I assume it's similar on other OSes.
This optimization is essential. A typical process maps in hundreds of megabytes of code from the OS. There are hundreds of processes running at any given time. Eyeballing the numbers on an older Mac I have here (a newer one would surely be worse) I'd need maybe 50GB of RAM just to hold the code of all the running processes if the pages couldn't be shared.