OpenBSD: Removing syscall(2) from libc and kernel

2 years ago (marc.info)

125 comments

ecliptik

OpenBSD developers are making a serious effort to kill off indirect syscalls, the base system is completely clean, take a look at the work Andrew Fresh did to adapt Perl. He wrote a complete syscall "dispatcher" or emulator for the Perl syscall function so that it calls the libc stubs.

https://github.com/openbsd/src/commit/312e26c80be876012ae979...

The ports tree is being cleansed of syscall(2) usage, until they're all gone.

msyscall, pinsyscall, recent mandatory IBT/BTI, xonly. OpenBSD is making some waves, but people aren't really seeing them yet.

__turbobrew__ 2 years ago

It looks like golang is going to have to deal with — again — OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel.

I wonder if there could be some way to sign a dynamic library to allow it to create direct system calls and then pass that as a kernel command line argument at boot?

matheusmoreira 2 years ago
OpenBSD is certainly taking it quite far. They want to disallow system calls from all non-libc code segments so that only whitelisted code can interface with it.
It is not the only operating system in the "unstable kernel interface" group though. Linux is actually the only one with a stable system call interface.
I've written somewhat at length about this:
https://www.matheusmoreira.com/articles/linux-system-calls
- LegionMammal978 2 years ago
  
  > It is not the only operating system in the "unstable kernel interface" group though. Linux is actually the only one with a stable system call interface.
  It's wrong to say that Linux stands alone in having a stable syscall interface. FreeBSD [0] and NetBSD [1] both retain syscall compatibility for old binaries (the former apparently with some exceptions permitted); DragonFly BSD also appears to keep old syscalls in place. In fact, I only know of Windows, OpenBSD, and presumably macOS as mainstream desktop OSes without mostly-stable syscalls.
  [0] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
  [1] https://www.netbsd.org/docs/internals/en/chap-processes.html...
  
  1 reply →
- masklinn 2 years ago
  
  FWIW your website is unusable on desktop: the font is unreadably huge. And while Firefox's reader mode does seem to mitigate the issue correctly, Safari's cuts off at the second paragraph.
  
  3 replies →
- KRAKRISMOTT 2 years ago
  
  We need more fat kernels and less libcs, libc ties us forcibly to the C legacy. Time for more type safe and richer interfaces with the kernel.
  
  15 replies →
masklinn 2 years ago
> It looks like golang is going to have to deal with — again — OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel.
Something they wouldn't have to do if they'd heeded the warnings they got since they first started going raw syscalls on non linux systems.
But as usual, go is uniquely american, only doing the right thing after it has tried everything else.
- tadfisher 2 years ago
  
  You could also describe it as doing the sane thing (avoiding hardcoding against the libc ABI through FFI) until the only sane option is removed. Did you know many libc APIs are preprocessor macros?
  
  6 replies →
- PrimeMcFly 2 years ago
  
  > go is uniquely american, doing the right thing after it has tried everything else.
  That doesn't seem like anything common to an American way of doing things. What an odd statement.
  
  3 replies →
- FullyFunctional 2 years ago
  
  I don't use Go, but I had the exact same reaction; C is the source of most of the problems and this is just codifies the use of C. The whole thing has a security by obscurity smell.
  
  4 replies →
- akira2501 2 years ago
  
  > go is uniquely american, only doing the right thing after it has tried everything else.
  Perhaps this is why most innovations are American. We don't automatically fall for the bully pulpit of the gnostic class.
  > Something they wouldn't have to do
  Yea, but they'll get it done anyways, and the language will continue to be excellent. I'm sure Google can absorb the engineering challenge without subtracting anything from us or other languages.
blueflow 2 years ago
> OpenBSD treating libc as the interface with the kernel instead of syscalls being the interface with the kernel
Which is a reasonable thing, given that the libc interface is defined by a widely used IEEE standard while the kernel interface is not.
- Conscat 2 years ago
  
  libC is an extremely leaky abstraction. Programming to it assumes you have a runtime that supports constructor functions, POSIX errno and locale, a global heap allocator singleton, and more. The design space of _Hello World_ is massively constrained by libC.
CHY872 2 years ago
OpenBSD’s position is far from unique. It’s shared with MacOS (and iOS).
- masklinn 2 years ago
  
  And windows.
  And that's just the platforms which technically enforce it. Linux is essentially the only platform which actually supports raw syscalls, in the sense that it's considered a normal system API.
- steveklabnik 2 years ago
  
  And Windows, in a sense: not that libc is the interface, but that the assembly-level API is not the interface.
  
  2 replies →
irdc 2 years ago

> I wonder if there could be some way to sign a dynamic library to allow it to create direct system calls and then pass that as a kernel command line argument at boot?
That sounds like an additional knob to tweak. The OpenBSD project is famously opposed to such knobs.
saagarjha 2 years ago
I am sure Cosmopolitan is going to love this change.
- masklinn 2 years ago
  
  I would assume they already have ways to bounce through a dynamically linked system library since cosmopolitan works on macos, and even more so windows.
  
  3 replies →
bregma 2 years ago

What about OSes that are not built around monolithic kernels and the syscall() paradigm?
LanzVonL 2 years ago

[dead]

brynet 2 years ago

Follow-up mail from Theo showing what the full removal looks like.

https://marc.info/?l=openbsd-tech&m=169842095809570&w=2

kstrauser 2 years ago

I don't know a lot about this corner of the OS. Why are syscalls in libc, instead of something like a libsyscall? I could see why a language might want not to depend on what's at least notionally the C runtime. Is the fact that the kernel interface is in libc an accident of Unix being written in C, or is there something more fundamental there?

jcranmer 2 years ago
Libc is really a conflation of at least three different notional libraries. The first library is what its name suggests it is, a standard library for C. Another part of the library is in providing the userspace portion of system services--things like handling static initializers, dynamic loading, or the userspace side of things like creating new threads (not to mention, the actual raw functions you call to get the kernel to do something). The final part of the library is a collection of userspace services which are language agnostic, you might choose different implementations, but you'll always assume are somehow present--libm and malloc are the goto examples here.
As for why the userspace system service library is part of libc instead of being a separate libsyscall or libkernel, that probably is due to Unix being a C operating system--written at a time when most operating systems also came with their own system language. It's definitely not the case for all OS's that the C runtime library is the same as the libsyscall/libkernel--most notably, on Windows, the former is MSVCRT*.dll and the latter is kernel32.dll (or ntdll.dll if you're looking very specifically at syscalls themselves, but ntdll.dll is largely an unstable interface).
- actionfromafar 2 years ago
  
  That's the best explanation I have ever read! I even knew these things beforehand, nodding along, but now I can explain it also to someone else in a coherent way instead of rambling up implementation details like a madman.
  Thanks!
monocasa 2 years ago
It's that libc is considered the stable interface for userland in general on openbsd.
- masklinn 2 years ago
  
  In pretty much every OS of the unix tradition. libc is the API, with a stable ABI. Although Windows has a something pretty much identical in — I believe — ntdll. The name and API differ, but the intent is the same.
  
  11 replies →
- kstrauser 2 years ago
  
  Is that different from Linux/Mac/etc?
  
  4 replies →
marcosdumay 2 years ago
I'm pretty sure if you just repackage the syscalls into a basic module of your language, you will get more stability than by linking to the libc.
I guess people don't do it because the difference is minimal and they use a lot of other features from the C runtime too.
EDIT: Ops. Not on BSD! The entire thread is about BSD and here I am mindlessly talking about Linux.
- dpassens 2 years ago
  
  Only on systems where the syscall ABI is stable, like on Linux. Others, like macOS and Windows[0], can and will change theirs between releases. OpenBSD even goes one step further and actively prevents code other than libc from performing syscalls[1]
  [0] I seem to remember that this changed in a recent Windows version, but I couldn't immediately find a source.
  [1] msyscall(2) or, if you don't have an OpenBSD system at hand, https://man.openbsd.org/msyscall

saagarjha 2 years ago

Why not remove syscall instructions altogether? When libc wants to do something, it traps on an undefined instruction and then the kernel looks at the program counter to see what it should do. Seems like this would be the ultimate application of this line of thought…

Veserv 2 years ago

That is basically what we used to do, int 0x80 and such. I mean, I guess you do not read the PC since that is much more brittle than just having the caller say what they want to do, but it is structurally the same.
Turns out, having a dedicated syscall instruction and trap pathway is just better design. Unlike a regular exception, this is a deliberate change of control to the kernel, so you can enforce a much stronger ABI requirement. In particular, you can define it to use a standard function call ABI with respect to preserved and non-preserved registers making it literally look like a standard function call.
For similar reasons, having a dedicated hardware pathway like on x86-64 is also just better design. System calls are a synchronous, voluntary transfer of control that is expected to return in contrast to (1) interrupts which are a asynchronous involuntary transfer of control and (2) instruction stream exceptions which are a synchronous involuntary transfer of control with no guarantee of return. This fundamental distinction can be leveraged for more efficient and simpler implementations.
monocasa 2 years ago
I don't think that helps much. OpenBSD already only allows syscalls originating out of the libc .text section, so whether the trap itself comes from a syscall instruction or some other trap mechanism doesn't really improve security AFAICT.
- saagarjha 2 years ago
  
  Yeah but it sounds super cool doesn’t it!
  
  2 replies →
jcranmer 2 years ago
My understanding is this was done by one system, which then made the architecture's life hard because now they lost half their opcode encoding space due to it being used as pseudo-syscalls by their largest customer.
- saagarjha 2 years ago
  
  Just one encoding, or maybe we’re thinking of different systems?
nine_k 2 years ago
Are there any performance implications on such traps on x64 or ARM?
IIRC, PDP11 and VAX used traps as the way to call the supervisor, and it was pretty cheap.
- saagarjha 2 years ago
  
  You're asking the wrong questions of OpenBSD :)
  
  1 reply →
- MBCook 2 years ago
  
  I think ARM (or at least ARM64) does the trap thing too.

throw2022110401 2 years ago

For language runtimes that don't normally have to deal with C baggage having to drag libc into your address space and going through libc code for syscalls makes this a less secure platform.

eikenberry 2 years ago

+1.. Why not create a standard/stable syscall interface instead of pushing an anachronistic libc interface.

kazinator 2 years ago

How I would design it would be to have a system call which registers the addresses, in user space, where system calls are allowed to come from. After that, any system call not coming from an address in that list will fail.

The list would be editable, but there would be an operation which seals it from further editing.

With this feature, we could load a new plugin into a C program with dlopen, and it would not be able to make syscalls of its own, other than through the existing C library, no matter how prim and proper its call sequences look like.

Possibly, ranges could be used instead: syscalls can originate from several registered address ranges, and that's it. Programs that create dynamic code on the fly, which can make system calls, could allocate trampolines in a registered area, for that purpose.

tiffanyh 2 years ago

Can someone explain the significance.

monocasa 2 years ago
OpenBSD has been putting in a lot of work lately to harden the syscall ABI; a large component of that work has been constricting how a syscall is invoked from user space as a defense in depth technique to make shell code style exploits more difficult. That's previously taken the form of techniques like only allowing syscalls to be invoked from the libc .text section.
This work is removing a very indirect morph of syscall where the arguments/sysnum are in a struct in memory, making it harder for exploits to invoke weird versions of syscalls on their own terms.
- krackers 2 years ago
  
  But aren't shellcode style exploits already fairly rare with W^X, so most end up using return-to-libc style attacks? Wouldn't CFI be a much better solution?
  
  2 replies →
- eikenberry 2 years ago
  
  Why aren't these changes made in kernel to keep the syscall ABI standardized and safe instead of requiring the use of an unsafe language wrapper? We should be discouraging more use of unsafe languages, not forcing it.
  
  4 replies →

pizlonator 2 years ago

A+

This is awesome.

Also, it’s rarely used in my experience. I wonder what kind of code would even notice?

actionfromafar 2 years ago

Exploits, I guess is the concern.
saagarjha 2 years ago
Code that uses syscalls that aren’t in libc but doesn’t want to have to drop to inline assembly?
- pizlonator 2 years ago
  
  Right, so exploits.
  
  2 replies →

kazinator 2 years ago

It is security theatre. The instruction pattern by which a system call is invoked is not a security attribute. You can't trust something just because it looks like a libc call.

samus 2 years ago
It's an indicator whether they instruction sequence was concerned together by an exploit or whether it's normal code execution. Attackers will have to use more complicated and brittle mechanisms to still achieve their goal. It's an arms race, but so is security in general.

ahoka 2 years ago

Just remove the kernel from the base install already?

PrimeMcFly 2 years ago

> Piece by piece, I've been trying to remove the easiest of the terminal-actions that exploit code uses (ie. getting to execve, or performing other system calls, etc).

> I recognize we can never completely remove all mechanisms they use. However, I hope I am forcing attack coders into using increasingly more complicated methods

It's honestly so ridiculous an OS that claims to have security a a focus refuses to add even some sort of basic MAC/RBAC implementation. Even both OSX and Windows have had something for ages now.

actionfromafar 2 years ago
These things are kind of orthogonal. OpenBSD maybe gets there eventually.
OpenBSD is like a very hardened safe, made of steel and huge bolts and locks. Very polished, very smooth and hard surface.
MAC/RBAC is like having security officers, interviews, checking of IDs, filling in forms and getting an OK from ones boss before performing work someplace in the building and so on.
Both these things can be good. But OpenBSD was always about making a small system as hardened as possible. Evidently, they aren't completely done yet with making the core as hardened as possible.
Windows has great architecture, but lacks instead severerly this hardness which OpenBSD possesses. What use is MAC/RBAC if someone can gain kernel access with a 0 day exploit?
- PrimeMcFly 2 years ago
  
  > OpenBSD maybe gets there eventually.
  Nah they won't. The devs have an irrational resistance to the very idea.
  I disagree with your analogies. OpenBSD has a focus on auditing to remove all bugs, which is great, but they provide very little to help prevent what can be done if a bug is exploited, and they've certainly had no shortage of serious bugs.
  > What use is MAC/RBAC if someone can gain kernel access with a 0 day exploit?
  Kernel exploits are pretty rare. Most exploits are in userland.
  
  3 replies →
LanzVonL 2 years ago

[dead]