Comment by m132

1 day ago

I see this "hardware isolation" benefit of virtual machines brought up a lot, but if you look a little deeper into it, putting that label exclusively on VMs is very much unfair.

Just like containers, VMs are very loosely defined and, under the hood, composed of mechanisms that can be used in isolation (paging, trapping, IOMMU vs individual cgroups and namespaces). It's those mechanisms that give you the actual security benefits.

And most of them are used outside of VMs, to isolate processes on a bare kernel. The system call/software interrupt trapping and "regular" virtual memory of gVisor (or even a bare Linux kernel) are just as much of a "hardware boundary" as the hyper calls and SLAT virtual memory are in the case of VMs, just without the hacks needed to make the isolated side believe it's in control of real hardware. One traps into Sentry, the other traps into QEMU, but ultimately, both are user-space processes running on the host kernel. And they themselves are isolated, using the same very primitives, by the host kernel.

As you clarified here, the real difference lies in what's on the other side of these boundaries. gVisor will probably have some more overhead, at least in the systrap mode, as every trapped call has to go through the host kernel's dispatcher before landing in Sentry. QEMU/KVM has this benefit of letting the guest's user-space call the guest kernel directly, and only the kernel typically can then call QEMU. The attack surface, too, differs a lot in both cases. gVisor is a niche Google project, KVM is a business-critical component of many public cloud providers.

It may sound like I'm nitpicking, but I believe that it's important to understand this to make an informed decision and avoid the mistake of stacking up useless layers, as it is plaguing today's software engineering.

Thanks for your reply and post by the way! I was looking for something like gVisor.