OpenBSD: Removing syscall(2) from libc and kernel

2 years ago (marc.info)

OpenBSD developers are making a serious effort to kill off indirect syscalls, the base system is completely clean, take a look at the work Andrew Fresh did to adapt Perl. He wrote a complete syscall "dispatcher" or emulator for the Perl syscall function so that it calls the libc stubs.

https://github.com/openbsd/src/commit/312e26c80be876012ae979...

The ports tree is being cleansed of syscall(2) usage, until they're all gone.

msyscall, pinsyscall, recent mandatory IBT/BTI, xonly. OpenBSD is making some waves, but people aren't really seeing them yet.

It looks like golang is going to have to deal with — again — OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel.

I wonder if there could be some way to sign a dynamic library to allow it to create direct system calls and then pass that as a kernel command line argument at boot?

  • OpenBSD is certainly taking it quite far. They want to disallow system calls from all non-libc code segments so that only whitelisted code can interface with it.

    It is not the only operating system in the "unstable kernel interface" group though. Linux is actually the only one with a stable system call interface.

    I've written somewhat at length about this:

    https://www.matheusmoreira.com/articles/linux-system-calls

  • > It looks like golang is going to have to deal with — again — OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel.

    Something they wouldn't have to do if they'd heeded the warnings they got since they first started going raw syscalls on non linux systems.

    But as usual, go is uniquely american, only doing the right thing after it has tried everything else.

    • You could also describe it as doing the sane thing (avoiding hardcoding against the libc ABI through FFI) until the only sane option is removed. Did you know many libc APIs are preprocessor macros?

      6 replies →

    • > go is uniquely american, doing the right thing after it has tried everything else.

      That doesn't seem like anything common to an American way of doing things. What an odd statement.

      3 replies →

    • I don't use Go, but I had the exact same reaction; C is the source of most of the problems and this is just codifies the use of C. The whole thing has a security by obscurity smell.

      4 replies →

    • > go is uniquely american, only doing the right thing after it has tried everything else.

      Perhaps this is why most innovations are American. We don't automatically fall for the bully pulpit of the gnostic class.

      > Something they wouldn't have to do

      Yea, but they'll get it done anyways, and the language will continue to be excellent. I'm sure Google can absorb the engineering challenge without subtracting anything from us or other languages.

  • > OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel

    Which is a reasonable thing, given that the libc interface is defined by a widely used IEEE standard while the kernel interface is not.

    • libC is an extremely leaky abstraction. Programming to it assumes you have a runtime that supports constructor functions, POSIX errno and locale, a global heap allocator singleton, and more. The design space of _Hello World_ is massively constrained by libC.

  • OpenBSD’s position is far from unique. It’s shared with MacOS (and iOS).

    • And windows.

      And that's just the platforms which technically enforce it. Linux is essentially the only platform which actually supports raw syscalls, in the sense that it's considered a normal system API.

  • > I wonder if there could be some way to sign a dynamic library to allow it to create direct system calls and then pass that as a kernel command line argument at boot?

    That sounds like an additional knob to tweak. The OpenBSD project is famously opposed to such knobs.

  • What about OSes that are not built around monolithic kernels and the syscall() paradigm?

I don't know a lot about this corner of the OS. Why are syscalls in libc, instead of something like a libsyscall? I could see why a language might want not to depend on what's at least notionally the C runtime. Is the fact that the kernel interface is in libc an accident of Unix being written in C, or is there something more fundamental there?

  • Libc is really a conflation of at least three different notional libraries. The first library is what its name suggests it is, a standard library for C. Another part of the library is in providing the userspace portion of system services--things like handling static initializers, dynamic loading, or the userspace side of things like creating new threads (not to mention, the actual raw functions you call to get the kernel to do something). The final part of the library is a collection of userspace services which are language agnostic, you might choose different implementations, but you'll always assume are somehow present--libm and malloc are the goto examples here.

    As for why the userspace system service library is part of libc instead of being a separate libsyscall or libkernel, that probably is due to Unix being a C operating system--written at a time when most operating systems also came with their own system language. It's definitely not the case for all OS's that the C runtime library is the same as the libsyscall/libkernel--most notably, on Windows, the former is MSVCRT*.dll and the latter is kernel32.dll (or ntdll.dll if you're looking very specifically at syscalls themselves, but ntdll.dll is largely an unstable interface).

    • That's the best explanation I have ever read! I even knew these things beforehand, nodding along, but now I can explain it also to someone else in a coherent way instead of rambling up implementation details like a madman.

      Thanks!

  • I'm pretty sure if you just repackage the syscalls into a basic module of your language, you will get more stability than by linking to the libc.

    I guess people don't do it because the difference is minimal and they use a lot of other features from the C runtime too.

    EDIT: Ops. Not on BSD! The entire thread is about BSD and here I am mindlessly talking about Linux.

    • Only on systems where the syscall ABI is stable, like on Linux. Others, like macOS and Windows[0], can and will change theirs between releases. OpenBSD even goes one step further and actively prevents code other than libc from performing syscalls[1]

      [0] I seem to remember that this changed in a recent Windows version, but I couldn't immediately find a source.

      [1] msyscall(2) or, if you don't have an OpenBSD system at hand, https://man.openbsd.org/msyscall

Why not remove syscall instructions altogether? When libc wants to do something, it traps on an undefined instruction and then the kernel looks at the program counter to see what it should do. Seems like this would be the ultimate application of this line of thought…

  • That is basically what we used to do, int 0x80 and such. I mean, I guess you do not read the PC since that is much more brittle than just having the caller say what they want to do, but it is structurally the same.

    Turns out, having a dedicated syscall instruction and trap pathway is just better design. Unlike a regular exception, this is a deliberate change of control to the kernel, so you can enforce a much stronger ABI requirement. In particular, you can define it to use a standard function call ABI with respect to preserved and non-preserved registers making it literally look like a standard function call.

    For similar reasons, having a dedicated hardware pathway like on x86-64 is also just better design. System calls are a synchronous, voluntary transfer of control that is expected to return in contrast to (1) interrupts which are a asynchronous involuntary transfer of control and (2) instruction stream exceptions which are a synchronous involuntary transfer of control with no guarantee of return. This fundamental distinction can be leveraged for more efficient and simpler implementations.

  • I don't think that helps much. OpenBSD already only allows syscalls originating out of the libc .text section, so whether the trap itself comes from a syscall instruction or some other trap mechanism doesn't really improve security AFAICT.

  • My understanding is this was done by one system, which then made the architecture's life hard because now they lost half their opcode encoding space due to it being used as pseudo-syscalls by their largest customer.

For language runtimes that don't normally have to deal with C baggage having to drag libc into your address space and going through libc code for syscalls makes this a less secure platform.

  • +1.. Why not create a standard/stable syscall interface instead of pushing an anachronistic libc interface.

How I would design it would be to have a system call which registers the addresses, in user space, where system calls are allowed to come from. After that, any system call not coming from an address in that list will fail.

The list would be editable, but there would be an operation which seals it from further editing.

With this feature, we could load a new plugin into a C program with dlopen, and it would not be able to make syscalls of its own, other than through the existing C library, no matter how prim and proper its call sequences look like.

Possibly, ranges could be used instead: syscalls can originate from several registered address ranges, and that's it. Programs that create dynamic code on the fly, which can make system calls, could allocate trampolines in a registered area, for that purpose.

Can someone explain the significance.

  • OpenBSD has been putting in a lot of work lately to harden the syscall ABI; a large component of that work has been constricting how a syscall is invoked from user space as a defense in depth technique to make shell code style exploits more difficult. That's previously taken the form of techniques like only allowing syscalls to be invoked from the libc .text section.

    This work is removing a very indirect morph of syscall where the arguments/sysnum are in a struct in memory, making it harder for exploits to invoke weird versions of syscalls on their own terms.

    • But aren't shellcode style exploits already fairly rare with W^X, so most end up using return-to-libc style attacks? Wouldn't CFI be a much better solution?

      2 replies →

    • Why aren't these changes made in kernel to keep the syscall ABI standardized and safe instead of requiring the use of an unsafe language wrapper? We should be discouraging more use of unsafe languages, not forcing it.

      4 replies →

It is security theatre. The instruction pattern by which a system call is invoked is not a security attribute. You can't trust something just because it looks like a libc call.

  • It's an indicator whether they instruction sequence was concerned together by an exploit or whether it's normal code execution. Attackers will have to use more complicated and brittle mechanisms to still achieve their goal. It's an arms race, but so is security in general.

> Piece by piece, I've been trying to remove the easiest of the terminal-actions that exploit code uses (ie. getting to execve, or performing other system calls, etc).

> I recognize we can never completely remove all mechanisms they use. However, I hope I am forcing attack coders into using increasingly more complicated methods

It's honestly so ridiculous an OS that claims to have security a a focus refuses to add even some sort of basic MAC/RBAC implementation. Even both OSX and Windows have had something for ages now.

  • These things are kind of orthogonal. OpenBSD maybe gets there eventually.

    OpenBSD is like a very hardened safe, made of steel and huge bolts and locks. Very polished, very smooth and hard surface.

    MAC/RBAC is like having security officers, interviews, checking of IDs, filling in forms and getting an OK from ones boss before performing work someplace in the building and so on.

    Both these things can be good. But OpenBSD was always about making a small system as hardened as possible. Evidently, they aren't completely done yet with making the core as hardened as possible.

    Windows has great architecture, but lacks instead severerly this hardness which OpenBSD possesses. What use is MAC/RBAC if someone can gain kernel access with a 0 day exploit?

    • > OpenBSD maybe gets there eventually.

      Nah they won't. The devs have an irrational resistance to the very idea.

      I disagree with your analogies. OpenBSD has a focus on auditing to remove all bugs, which is great, but they provide very little to help prevent what can be done if a bug is exploited, and they've certainly had no shortage of serious bugs.

      > What use is MAC/RBAC if someone can gain kernel access with a 0 day exploit?

      Kernel exploits are pretty rare. Most exploits are in userland.

      3 replies →