← Back to context

Comment by coldpie

2 years ago

Would some kind of LD_PRELOAD interception for socket(2) work? Call the real function, then do setsockopt or whatever, and return the modified socket.

> Would some kind of LD_PRELOAD interception for socket(2) work?

That would only work if the call goes through libc, and it's not statically linked. However, it's becoming more and more common to do system calls directly, bypassing libc; the Go language is infamous for doing that, but there's also things like the rustix crate for Rust (https://crates.io/crates/rustix), which does direct system calls by default.

  • And go is wrong for doing that, at least on Linux. It bypasses optimizations in the vDSO in some cases. On Fuchsia, we made direct syscalls not through the vDSO illegal and it was funny the hacks to go that required. The system ABI of Linux really isn't the syscall interface, its the system libc. That's because the C ABI (and the behaviors of the triple it was compiled for) and its isms for that platform are the linga franca of that system. Going around that to call syscalls directly, at least for the 90% of useful syscalls on the system that are wrapped by libc, is asinine and creates odd bugs, makes crash reporters heuristical unwinders, debuggers, etc all more painful to write. It also prevents the system vendor from implementing user mode optimizations that avoid mode and context switches when necessary. We tried to solve these issues in Fuchsia, but for Linux, Darwin, and hell, even Windows, if you are making direct syscalls and it's not for something really special and bespoke, you are just flat-out wrong.

    • > The system ABI of Linux really isn't the syscall interface, its the system libc.

      You might have reasons to prefer to use libc; some software has reason to not use libc. Those preferences are in conflict, but one of them is not automatically right and the other wrong in all circumstances.

      Many UNIX systems did follow the premise that you must use libc and the syscall interface is unstable. Linux pointedly did not, and decided to have a stable syscall ABI instead. This means it's possible to have multiple C libraries, as well as other libraries, which have different needs or goals and interface with the system differently. That's a useful property of Linux.

      There are a couple of established mechanism on Linux for intercepting syscalls: ptrace, and BPF. If you want to intercept all uses of a syscall, intercept the syscall. If you want to intercept a particular glibc function in programs using glibc, or for that matter a musl function in a program using musl, go ahead and use LD_PRELOAD. But the Linux syscall interface is a valid and stable interface to the system, and that's why LD_PRELOAD is not a complete solution.

      10 replies →

    • You seem to be saying 'it was incorrect on Fuchsia, so it's incorrect on Linux'. No, it's correct on Linux, and incorrect on every other platform, as each platform's documentation is very clear on. Go did it incorrectly on FreeBSD, but that's Go being Go; they did it in the first place because it's a Linux-first system and it's correct on Linux. And glibc does not have any special privilege, the vdso optimizations it takes advantage of are just as easily taken advantage of by the Go compiler. There's no reason to bucket Linux with Windows on the subject of syscalls when the Linux manpages are very clear that syscalls are there to be used and exhaustively documents them, while MSDN is very clear that the system interface is kernel32.dll and ntdll.dll, and shuffles the syscall numbers every so often so you don't get any funny ideas.

    • > The system ABI of Linux really isn't the syscall interface, its the system libc.

      Which one? The Linux Kernel doesn't provide a libc. What if you're a static executable?

      Even on Operating Systems with a libc provided by the kernel, it's almost always allowed to upgrade the kernel without upgrading the userland (including libc); that works because the interface between userland and kernel is syscalls.

      That certainly ties something that makes syscalls to a narrow range of kernel versions, but it's not as if dynamically linking libc means your program will be compatible forever either.

      2 replies →

    • > And go is wrong for doing that, at least on Linux. It bypasses optimizations in the vDSO in some cases.

      Go's runtime does go through the vDSO for syscalls that support it, though (e.g., [0]). Of course, it won't magically adapt to new functions added in later kernel versions, but neither will a statically-linked libc. And it's not like it's a regular occurrence for Linux to new functions to the vDSO, in any case.

      [0] https://github.com/golang/go/blob/master/src/runtime/time_li...

    • Linux doesn't even have consensus on what libc to use, and ABI breakage between glibc and musl is not unheard of. (Probably not for syscalls but for other things.)

    • The proliferation of Docker containers seems to go against that. Those really only work well since the kernel has a stable syscall ABI. So much so that you see Microsoft switching to a stable syscall ABI with Windows 11.

      2 replies →