Comment by Rochus

7 days ago

What's the core concept compared to other kernels?

At its core, it's not too distinct from, say, seL4, but some of the distinctions are useful. I think Hongmeng's work on isolation classes (particularly in transferring components across classes), a performance-motivated partial alternative to capabilities, OS service coalescing and partitioning, and porting Linux drivers are valuable (see sections 4.2-4.4 and 5 [0]). It's not that these changes should be accepted wholesale, but I think they are a useful data point for alternate designs. I think the emphasis on access control (capability) performance and driver coverage are relevant for any production-grade microkernel.

I don't like the paging optimization described in section 4.5 [0]. It seems like a lot of added complexity for unequal gain.

In general, the authors make many good observations on the current designs of microkernels, particularly how the proliferation of small processes harms performance. Based on my reading of this paper and many others, I think there are some pragmatic considerations for building microkernel-based systems. The granularity of processes should be curtailed when performance is critical. Security is a spectrum, and such a system can still be more secure than the status quo. Limited kernels should be colocated next to processes again, not always across address spaces (since Meltdown), deferring to a cross-address space kernel on the harder-to-secure paths. If a process has a timer capability, and likely will for its remaining lifespan, an optimization could have a stub kernel accepting timer syscalls and forwarding the rest. Lastly, and this is a broader problem in most software, both code and state must be located in their proper places[1]. Use Parnas' criteria [2] for modular programming. If you believe in the power of the concept of microkernels, I have this to sell you; I believe it's even more basic and necessary. It's probably one of the most fundamental concepts we have on how to write good code.

[0] https://www.usenix.org/system/files/osdi24-chen-haibo.pdf [1] https://dl.acm.org/doi/10.1145/3064176.3064205 [2] https://wstomv.win.tue.nl/edu/2ip30/references/criteria_for_...

  • That's interesting, thanks for the explanation and the references. Of course I agree with Parna's modularization principles, and having spent a lot of time with different versions of the Oberon and Active Object system I think that a microkernel is a natural fit. You seem to be a scholar of microkernels; are you also developing microkernels?

    • An HNer after my own heart! If only Parnas' work had gotten more mindshare. Everyone knows about the benefits of modularity and layering, but most examples are unconvincing. Even simpler than "modularity", in a manner unifying it with the lessons of Hongmeng and Theseus (whose team wrote the state spill paper I linked), I now think of "putting code/data where it belongs". Where two units separated interact rarely. As parallelism in computing, lines that never touch won't interfere. Total parallelism is not possible for any useful program, because coordination is necessary, but the right arrangement of knots and crossings will make things go as smoothly as they can. A modular program should also be a fast program. The only real obstacle is developer headache.

      > You seem to be a scholar of microkernels; are you also developing microkernels?

      Nothing professional, and I haven't even gotten to actually developing. But I have a general design and many half-baked specifics. I like to push the limits of what's been done. Developer practicality is secondary to bare minimalism, especially because convenience can be built back up (if painstakingly). I'm mainly inspired by seL4 and Barrelfish.

      My most radical idea is making the kernel completely bare, without even a capability system or message passing. Similar to Barrelfish, I'd have a trusted userspace process (monitor). If I place it in the same address space as the kernel, every privileged interaction adds two mode switches, which I think (for I have not demonstrated it yet!) is well worth the greater programmability of kernel functionality. seL4's use of CNodes is elegant in one sense, but in another, it hamstrings both the user processes (fine, good, even) and the kernel itself (bad). seL4's approach is undeniably a better target for formal verification, but it restricts how efficient capabilities can be. Barrelfish, which targets multicore machines in a distributed manner, makes the capability system (as the load bearing core of these kinds of microkernels) even more contorted. The kernel is the multiplexer of last resort, standing in for the hardware. The sooner the kernel is not involved, the easier everyone breathes. Instead of trying to build a framework/foundation and the building itself all at once, the framework itself is plenty valuable. The monitor gets the control of the kernel but without the dependence on hardware or the rigid interface to userspace. This partition presents a meaningfully different level of multiplexing, where the kernel and the monitor each play their own part. The monitor's view of the virtual hardware offered by the kernel is much improved.

      Security and trust are not black and white, and the kernel itself should be flexible to adaptations. I could just implement seL4 or Barrelfish in the monitor instead, or diverge more and investigate the new tradeoffs. Capabilities are load-bearing here, too, so there is every reason to play around with them. How the capability system works will determine how the entire operating system works. (As an aside: I was pleased in noticing that object capabilities have a close relation to Parnas-style modules, being their interfaces. But I think what object capabilities are can be played with too.) How might capabilities be stored, or accessed, more efficiently? I think there's definitely a lot of room for improvement there. Composite offers some ideas there, though I still lean towards Barrelfish's ideas. And I imagine specialized kernels, paired with userspace processes in their address spaces (like the "true kernel" and monitor), reifying the capabilities granted to those processes. Traditional microkernel wisdom could be interpreted as requiring as little code running in kernel space as is feasible. However, I have many other parameters I wish to allow people to optimize for, not even just performance, so I offer this: the core kernel will be so minimal to the point it hurts, and the monitor picks up the slack. Then, if security is paramount, only the obviously safe, minimally augmented kernels will be exported to other processes. Programmatic generation of specialized kernels, coordinated on capabilities, even restricted to only some processes. But if willing, much more daring ventures can be tried. I even have the suspicion that one could place what amounts to Linux as a specialized kernel, as the ultimate mode of bona fide virtualization. No VirtualBox, no personality servers, or even syscall emulation. I wonder how hard the task would be. Although I should probably learn more about user-mode Linux, and similar works in other operating systems (DragonflyBSD, and seemingly future-Redox?) to just run them in user space. That's still a pipe dream for now.

      Having mentioned so much about seL4, and given this thread is originally about QNX, I should mention that I don't think my dream microkernel should put so much emphasis on kernel-facilitated message passing. I really am just offering a context switch this time. There isn't even a scheduler in the "true kernel". For all of the argumentation I've seen from the seL4 team for why any form of IPC less minimal than theirs is likely suspect, I don't see a good reason to not shoot seL4's IPC in the face too. Although some care is necessary, I could make it possible for seL4 IPC to be built exactly as-is, in the aspect of maximizing register use. The other main concern of seL4's IPC, that of threading (particularly thread priorities), I find even more suspect. No threads in my kernel either! I will take scheduler activations instead, please and thank you. I think people have been misguided into believing that "threads of execution" should be supported specifically by the kernel, when in reality, they are a much higher-level abstraction. The presence of an ongoing sequence of execution is another of those concepts that must be carefully captured in our design of software, a logical concept that informs how we should write code. Kernel threading is like supposing that a person on a smartphone doesn't view the multiple app boundary crossings and plethora of UI actions as one unified whole. The entire course must be mapped out, studied, and integrated. Kernel threading gives the illusion that we can manifest threads independently of programs, but the program determines the threading. Work instead from the hardware resources, the physical cores present, offering an interface above them, and meet the program as its developers distill its abstract formulation. The kernel's task is to bring the hardware from the bottom up to the developers, because that is necessarily how developers must interact with hardware. Otherwise, we really could invent more cores and memory to accomodate all those threads. Certainly, by removing threads from the kernel, I don't claim to have solved concurrency, or priority inversion, or anything like that. I merely want the hardware to be exposed as-is, but a bit friendlier, and people can build ever more friendly abstractions as they can and will, depending on the tradeoffs.

      All things should reside in their proper places. Push down accidental complexity, bring up the essential complexity, letting everything that bears the burden of supporting things above itself (chief among them are the primary multiplexers of the kernel and system services) only do so to the extent it needs to. In the kernel's case, being simply the trampoline between the hardware and the program, Liedtke's minimality principle is perfect. Putting anything else in the kernel can only be beneficial for performance, if even that, so the tradeoff is quite plain. Even trust is not gained; it may seem horrific to have a trusted userspace process such as the monitor, but really, does the first process of any operating system not have such privilege? My monitor simply has a more defined responsibility, but given that the kernel proper is naked, the overall trust has been preserved, I think. And so on, the investigation can go. In the end, I may make the edges somewhat sharper, but they were sharp to begin with, and I offer tools to dull them. But please do note if you disagree with my conclusions! This is still just my own thinking, developed without dialogue.

      </rant>

      4 replies →