← Back to context

Comment by gigatexal

5 hours ago

This is really interesting. Could it be used to carve up a host GPU for use in a guest VM?

Depends on the use-case. For the standard hardware-accelerated guest GPU in virtualized environments, there's already QEMU's virtio-gpu device.[1]

For "carving up" there are technologies like SR-IOV (Single Root I/O Virtualization).[2]

For advanced usage, like prototyping new hardware (host driver), you could use PCIem to emulate a not-yet-existing SR-IOV-capable GPU. This would allow you to develop and test the host-side driver (the one that manages the VFs) in QEMU without needing the actual hardware.

Another advanced use-case could be a custom vGPU solution: Instead of SR-IOV, you could try to build a custom paravirtualized GPU from scratch. PCIem would let you design the low-level PCIe interface for this new device, while you write a corresponding driver on the guest. This would require significant effort but it'd provide you complete control.

[1] https://qemu.readthedocs.io/en/v8.2.10/system/devices/virtio...

[2] https://en.wikipedia.org/wiki/Single-root_input/output_virtu...

As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.

> carve up

Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.

Note that the GPU vendors all deliberately include this feature as part of their market segmentation.

  • It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.

    Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.