Comment by vacuity
7 days ago
An HNer after my own heart! If only Parnas' work had gotten more mindshare. Everyone knows about the benefits of modularity and layering, but most examples are unconvincing. Even simpler than "modularity", in a manner unifying it with the lessons of Hongmeng and Theseus (whose team wrote the state spill paper I linked), I now think of "putting code/data where it belongs". Where two units separated interact rarely. As parallelism in computing, lines that never touch won't interfere. Total parallelism is not possible for any useful program, because coordination is necessary, but the right arrangement of knots and crossings will make things go as smoothly as they can. A modular program should also be a fast program. The only real obstacle is developer headache.
> You seem to be a scholar of microkernels; are you also developing microkernels?
Nothing professional, and I haven't even gotten to actually developing. But I have a general design and many half-baked specifics. I like to push the limits of what's been done. Developer practicality is secondary to bare minimalism, especially because convenience can be built back up (if painstakingly). I'm mainly inspired by seL4 and Barrelfish.
My most radical idea is making the kernel completely bare, without even a capability system or message passing. Similar to Barrelfish, I'd have a trusted userspace process (monitor). If I place it in the same address space as the kernel, every privileged interaction adds two mode switches, which I think (for I have not demonstrated it yet!) is well worth the greater programmability of kernel functionality. seL4's use of CNodes is elegant in one sense, but in another, it hamstrings both the user processes (fine, good, even) and the kernel itself (bad). seL4's approach is undeniably a better target for formal verification, but it restricts how efficient capabilities can be. Barrelfish, which targets multicore machines in a distributed manner, makes the capability system (as the load bearing core of these kinds of microkernels) even more contorted. The kernel is the multiplexer of last resort, standing in for the hardware. The sooner the kernel is not involved, the easier everyone breathes. Instead of trying to build a framework/foundation and the building itself all at once, the framework itself is plenty valuable. The monitor gets the control of the kernel but without the dependence on hardware or the rigid interface to userspace. This partition presents a meaningfully different level of multiplexing, where the kernel and the monitor each play their own part. The monitor's view of the virtual hardware offered by the kernel is much improved.
Security and trust are not black and white, and the kernel itself should be flexible to adaptations. I could just implement seL4 or Barrelfish in the monitor instead, or diverge more and investigate the new tradeoffs. Capabilities are load-bearing here, too, so there is every reason to play around with them. How the capability system works will determine how the entire operating system works. (As an aside: I was pleased in noticing that object capabilities have a close relation to Parnas-style modules, being their interfaces. But I think what object capabilities are can be played with too.) How might capabilities be stored, or accessed, more efficiently? I think there's definitely a lot of room for improvement there. Composite offers some ideas there, though I still lean towards Barrelfish's ideas. And I imagine specialized kernels, paired with userspace processes in their address spaces (like the "true kernel" and monitor), reifying the capabilities granted to those processes. Traditional microkernel wisdom could be interpreted as requiring as little code running in kernel space as is feasible. However, I have many other parameters I wish to allow people to optimize for, not even just performance, so I offer this: the core kernel will be so minimal to the point it hurts, and the monitor picks up the slack. Then, if security is paramount, only the obviously safe, minimally augmented kernels will be exported to other processes. Programmatic generation of specialized kernels, coordinated on capabilities, even restricted to only some processes. But if willing, much more daring ventures can be tried. I even have the suspicion that one could place what amounts to Linux as a specialized kernel, as the ultimate mode of bona fide virtualization. No VirtualBox, no personality servers, or even syscall emulation. I wonder how hard the task would be. Although I should probably learn more about user-mode Linux, and similar works in other operating systems (DragonflyBSD, and seemingly future-Redox?) to just run them in user space. That's still a pipe dream for now.
Having mentioned so much about seL4, and given this thread is originally about QNX, I should mention that I don't think my dream microkernel should put so much emphasis on kernel-facilitated message passing. I really am just offering a context switch this time. There isn't even a scheduler in the "true kernel". For all of the argumentation I've seen from the seL4 team for why any form of IPC less minimal than theirs is likely suspect, I don't see a good reason to not shoot seL4's IPC in the face too. Although some care is necessary, I could make it possible for seL4 IPC to be built exactly as-is, in the aspect of maximizing register use. The other main concern of seL4's IPC, that of threading (particularly thread priorities), I find even more suspect. No threads in my kernel either! I will take scheduler activations instead, please and thank you. I think people have been misguided into believing that "threads of execution" should be supported specifically by the kernel, when in reality, they are a much higher-level abstraction. The presence of an ongoing sequence of execution is another of those concepts that must be carefully captured in our design of software, a logical concept that informs how we should write code. Kernel threading is like supposing that a person on a smartphone doesn't view the multiple app boundary crossings and plethora of UI actions as one unified whole. The entire course must be mapped out, studied, and integrated. Kernel threading gives the illusion that we can manifest threads independently of programs, but the program determines the threading. Work instead from the hardware resources, the physical cores present, offering an interface above them, and meet the program as its developers distill its abstract formulation. The kernel's task is to bring the hardware from the bottom up to the developers, because that is necessarily how developers must interact with hardware. Otherwise, we really could invent more cores and memory to accomodate all those threads. Certainly, by removing threads from the kernel, I don't claim to have solved concurrency, or priority inversion, or anything like that. I merely want the hardware to be exposed as-is, but a bit friendlier, and people can build ever more friendly abstractions as they can and will, depending on the tradeoffs.
All things should reside in their proper places. Push down accidental complexity, bring up the essential complexity, letting everything that bears the burden of supporting things above itself (chief among them are the primary multiplexers of the kernel and system services) only do so to the extent it needs to. In the kernel's case, being simply the trampoline between the hardware and the program, Liedtke's minimality principle is perfect. Putting anything else in the kernel can only be beneficial for performance, if even that, so the tradeoff is quite plain. Even trust is not gained; it may seem horrific to have a trusted userspace process such as the monitor, but really, does the first process of any operating system not have such privilege? My monitor simply has a more defined responsibility, but given that the kernel proper is naked, the overall trust has been preserved, I think. And so on, the investigation can go. In the end, I may make the edges somewhat sharper, but they were sharp to begin with, and I offer tools to dull them. But please do note if you disagree with my conclusions! This is still just my own thinking, developed without dialogue.
</rant>
I appreciate the effort that went into this rant a lot (I need to reread it after coffee has fully kicked in), and it touches some ideas I’ve accumulated over the years.
If you still feel in sharing mood, feel free to post links to interesting papers or proof of concepts in the space for further education.
EDIT: I quite like your idea of making the kernel unaware of threading, though I'm not sure how to go about implementing that. This is more radical than the other great idea of moving the scheduler and the concept of time(sharing) itself to userspace (I've seen a few talks about it on YT, I forget the name of the project that explored this avenue). So effectively ring 0 should only have to deal with enforcing capability security, while everything else lives in userspace.
I certainly can't claim to have discovered these ideas, though perhaps I am one of the earliest to propose gutting the kernel as heavily as I am (the riskiest manoeuver out of my proposals).
> This is more radical than the other great idea of moving the scheduler and the concept of time(sharing) itself to userspace
The idea of userspace scheduling has been explored widely. Hydra took the plunge, but the L4 community is still reluctant. For good reason, since this typically increases latency on a latency-critical path. This is one of the strongest motivations for optimizing context switches by increasing kernel-user colocality in the same address space.
> I quite like your idea of making the kernel unaware of threading
See scheduler activations[0]. Even seL4 has kernel threads, which I think developed mainly due to being used to it, when the alternative would be better for formal verification.
> So effectively ring 0 should only have to deal with enforcing capability security, while everything else lives in userspace.
That's the idea, but unlike seL4 and Barrelfish, I think wholly implementing the capability system is very inflexible. The capability representations are rigid, which fixes (i.e., makes static) performance and fixes policy (all mechanisms restrict policy somewhat). It defies programmability. That's why I want to move most of the work to the trusted userspace process, though for the specific architecture I'm thinking of, it could be another module in kernelspace instead.
Further reading:
[0] https://dl.acm.org/doi/10.5555/2685048.2685051 | A Barrelfish paper that modularizes the kernel further, allowing superb flexibility such as easily swapping out the kernel running on a core
That all sounds very interesting, but it goes far beyond the scope of my current concern with microkernels. At the moment, I am satisfied with extending a well-documented approach such as Minix with sel4 (or other candidates) in such a way that the performance meets contemporary expectations. With regard to Parnas' approach, there is still considerable scope for possible solutions, as he has formulated conceptual ideas rather prescriptions and policies. I can understand your fascination with new, previously untried approaches, but considering that Tanenbaum's book is already twenty years old, there is obviously a gap in the literature on the proven state of the art that should be closed.
This is mostly untreaded territory, true, but I think going to extremes is instructive for moderation. In this kernel or others, abstractions can be created to build back up to what we are used to, but going far afield is necessary to remake the core. I was also thinking about the process of proving out solutions, and my arguments largely appeal to the existing groundwork of seL4 and Barrelfish, though of course they are still fairly unproven. I use many other works for inspiration for smaller parts of the design, such as scheduler activations or Nemesis' self paging, and they're easier to apply to other OSes. It is great to see Linux gradually proving technologies that have long been imagined, such as increasing userspace agency in handling segfaults, and the longstanding question of modular schedulers. The nice thing about principles such as what Parnas described is that they are always worth considering. My ideas must be concretely implemented, but the true benefit will lie in how they inform our understanding of the abstract solution space. I'll be happy if my work leads to more microkernel-like work in Linux!