Comment by josephg
2 days ago
> If the capabilities are very fine-grained, to make certain that IPC really cannot happen, that might be cumbersome to use, while coarse-grained capabilities could be circumvented.
In SeL4 it’s kinda like this: A capability is an opaque handle you can invoke to RPC into some other process or into the kernel. There’s no worry about how fine grained capabilities are because there’s no global table of permission bits or anything like that. Processes can invent capabilities whenever they want. Because caps just let other processes call your code, you can programmatically make them do anything.
Suppose I want to give a process read only access to a file I have RW access to. The OS doesn’t need a special “read only capability” type. It doesn’t need to. Instead, my process just makes capabilites for whatever I actually want on the fly. In this case, I just make a new capability. When it’s invoked I see the associated request, if the caller is making a read request, I proxy that request to the file handle I have. (Also another cap). And if it’s a write request, I can reject it. Voila!
This is how you can write the filesystem and drivers in userland. One process can be in charge of the block devices. That process creates some caps for reading and writing raw bytes to disk. It passes the “client side” of that cap to a filesystem process, which can produce its own file handle caps for interacting with directories and files, which can be passed to userland processes in turn. Its capabilities all the way down.
This kind of proxy capabilities has other benefits as well, e.g. you can implement a disk quota, or transparent compression, or logging, or ask the user (if you have a capability which can do that), or provide access to a part of the file as though it is the entire file, etc.
Or, if a program requests access to a camera, you can provide a capability with a still picture, a video file, a filter (e.g. that resizes the picture or modifies the colour) from some source (including, but not limited to, a camera), etc; this can be helpful in case e.g. you do not have a camera on your computer, or for testing.
(Other people have similar ideas, sometimes independently than I do.)
There is also a way to transmit capabilities across a network; I had thought of how a protocol would be made to do such a thing.
That works perfectly fine for an embedded computer, which is where systems like SeL4 are used.
On the other hand, I cannot see how this approach can be scaled to something like a personal computer.
For some programs that I run, e.g. for an Internet browser, I may want to not authorize them to access anything on SSDs/HDDs, except for a handful of configuration files and for any files they may create in some cache directories.
For other programs that I run, I may want to let them access most or all files in certain file systems. Any file system that I use contains typically many millions of files.
Therefore it is obvious that using one capability per file is not acceptable. Moreover, such programs may need to access immediately many thousands of files that have been just created by some other program that has run before them, for instance a linker running after a compiler.
Assuming that a pure capability-based system is used, not some hybrid between ACLs and capabilities, there must be some more general kinds of capabilities, that would grant access simultaneously to a great number of resources, based on some rules, e.g. all files in some directory can be read, all files with a certain extension or some other kind of name pattern from some directory may be written or deleted or renamed, and so on.
> On the other hand, I cannot see how this approach can be scaled to something like a personal computer.
Personally I think the biggest challenge is UX. The systems engineering is good, and it works just fine.
> For other programs that I run, I may want to let them access most or all files in certain file systems. Any file system that I use contains typically many millions of files. Therefore it is obvious that using one capability per file is not acceptable.
Yeah, of course! Just make a capability representing the containing directory or filesystem. Then the program is free to open and browse files within that directory, but nothing outside of it.
I agree with others in this thread. Think of the capability like a bearer token. You wouldn't make a token per file. Just make one for the directory.
Then make a userspace server to do that. If you want to see how this works in practice, macOS and iOS are great “pragmatic” implementations of this pattern. They use a Mach/BSD hybrid
you're absolutely right. this is just a terminology confusion I think. we can talk about capabilities as 'a replacement for ACLs', in which case, yes we need to think about policy rules and not a gigantic list of possible atoms.
from a mechanism point of view a 'capability' is really more a bearer token, the result of a policy decision, a credential that we can give to the OS to show that we have been given access without going through the rules-based machine for every operation.
>Its capabilities all the way down.
IIUC one problem with such layering of capability processing is that each passed layer results in a context switch (i.e. switch of memory mappings, thrashing of caches, etc.) and its on top of the cost of passing through the kernel. In other words, you may need to pay cost of N syscalls for one multi-layered capability-based operation.
True, but capability calls in SeL4 are supposedly faster than linux syscalls. Because caps are such an important primitive, they're extremely heavily optimised.
As an example, when you invoke a capability, your process hands the callee your scheduler time-slice. So its not like linux where your process yields to the scheduler. The same CPU core will handle the entire call -> process -> return computation pipeline between multiple processes.
I'm not sure how fast it ends up in practice compared to a similar system built on top of linux. I suspect a lot of the difference would come down to implementation choices. And if its still not fast enough, you can always just set up a ring buffer or something between processes to share data directly.
Doesn't that mean that your process is then responsible for ensuring that an app with a read-only capability cannot do a write ?
You're moving the burden of enforcement from the kernel to the user level ?