Comment by josephg

9 hours ago

I've been wanting a capability based security model for years. Argued about it here in fact. Capabilities are kind of an object pointer with associated permissions - like a unix file descriptor.

We should have:

- OS level capabilities. Launched programs get passed a capability token from the shell (or wherever you launched the program from). All syscalls take a capability as the first argument. So, "open path /foo" becomes open(cap, "/foo"). The capability could correspond to a fake filesystem, real branch of your filesystem, network filesystem or really anything. The program doesn't get to know what kind of sandbox it lives inside.

- Library / language capabilities. When I pull in some 3rd party library - like an npm module - that library should also be passed a capability too, either at import time or per callsite. It shouldn't have read/write access to all other bytes in my program's address space. It shouldn't have access to do anything on my computer as if it were me! The question is: "What is the blast radius of this code?" If the library you're using is malicious or vulnerable, we need to have sane defaults for how much damage can be caused. Calling lib::add(1, 2) shouldn't be able to result in a persistent compromise of my entire computer.

SeL4 has fast, efficient OS level capabilities. Its had them for years. They work great. They're fast - faster than linux in many cases. And tremendously useful. They allow for transparent sandboxing, userland drivers, IPC, security improvements, and more. You can even run linux as a process in sel4. I want an OS that has all the features of my linux desktop, but works like SeL4.

Unfortunately, I don't think any programming language has the kind of language level capabilities I want. Rust is really close. We need a way to restrict a 3rd party crate from calling any unsafe code (including from untrusted dependencies). We need to fix the long standing soundness bugs in rust. And we need a capability based standard library. No more global open() / listen() / etc. Only openat(), and equivalents for all other parts of the OS.

If LLMs keep getting better, I'm going to get an LLM to build all this stuff in a few years if nobody else does it first. Security on modern desktop operating systems is a joke.

6 comments

josephg

mike_hearn 3 hours ago

Capabilities have a lot of serious design problems which is why no mainstream language has them. Because this comes up so often on HN I wrote an essay explaining the issues here:

https://blog.plan99.net/why-not-capability-languages-a8e6cbd...

But as pointed out by others, this particular exploit wouldn't be stopped by capabilities. Nor would it be stopped by micro-kernels. The filesystem is a trusted entity on any OS design I'm familiar with as it's what holds the core metadata about what components have what permissions. If you can exploit the filesystem code, you can trivially obtain any permission. That the code runs outside of the CPU's supervisor mode means nothing.

The only techniques we have to stop bugs like this are garbage collection or use of something like Rust's affine type system. You could in principle write a kernel in a language like C#, Java or Kotlin and it would be immune to these sorts of bugs.

theamk 9 hours ago

Note that capabilities would not help for those bugs we are discussing today.

Those exploits are in kernel, and the userspace is only calling the normal, allowed calls. Removing global open()/listen()/etc.. with capability-based versions would still allow one to invoke the same kernel bugs.

(Now, using microkernel like seL4 where the kernel drivers are isolated _would_ help, but (1) that's independent from what userspace does, you can have POSIX layer with seL4 and (2) that would be may more context switches, so a performance drop)

josephg 7 hours ago

> Note that capabilities would not help for those bugs we are discussing today.
Yes they would. Copyfail uses a bug in the linux kernel to write to arbitrary page table entries. A kernel like SeL4 puts the filesystem in a separate process. The kernel doesn't have a filesystem page table entry that it can corrupt.
Even if the bug somehow got in, the exploit chain uses the page table bug to overwrite the code in su. This can be used to get root because su has suid set. In a capability based OS, there is no "su" process to exploit like this.
A lot of these bugs seem to come from linux's monolithic nature meaning (complex code A) + (complex code B) leads to a bug. Microkernels make these sort of problems much harder to exploit because each component is small and easier to audit. And there's much bigger walls up between sections. Kernel ALG support wouldn't have raw access to overwrite page table entries in the first place.
> (2) that would be may more context switches, so a performance drop
I've heard this before. Is it actually true though? The SeL4 devs claim the context switching performance in sel4 is way better than it is in linux. There are only 11 syscalls - so optimising them is easier. Invoking a capability (like a file handle) in sel4 doesn't involve any complex scheduler lookups. Your process just hands your scheduler timeslice to the process on the other end of the invoked capability (like the filesystem driver).
But SeL4 will probably have more TLB flushes. I'm not really sure how expensive they are on modern silicon.
I'd love to see some real benchmarks doing heavy IO or something in linux and sel4. I'm not really sure how it would shake out.

grebc 7 hours ago

Have you heard of pledge in OpenBSD?

I prefer it’s model of declaring this is what I want to use, any calls to code outside that error out.

josephg 7 hours ago
Yes. But its nowhere near as powerful as capabilities.
- Pledge requires the program drop privileges. Process level caps move the "allowed actions" outside of an application. And they can do that without the application even knowing. This would - for example - let you sandbox an untrusted binary.
- Pledge still leaves an entire application in the same security zone. If your process needs network and disk access, every part of the process - including 3rd party libraries - gets access to the network and disk.
- You can reproduce pledge with caps very easily. Capability libraries generally let you make a child capability. So, cap A has access to resources x, y, z. Make cap B with access to only resource x. You could use this (combined with a global "root cap" in your process) to implement pledge. You can't use pledge to make caps.
- grebc 5 hours ago
  
  I’m not trying to say use pledge/unveil to make capabilities, I’m saying use pledge/unveil to limit exposure.
  To me it’s easier to get a program to let the system know what it needs vs. try to contain it from the outside.
  Anyway, have a good one.