As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.
Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.
Note that the GPU vendors all deliberately include this feature as part of their market segmentation.
It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.
Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.
As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.
> carve up
Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.
Note that the GPU vendors all deliberately include this feature as part of their market segmentation.
It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.
Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.