Comment by cakehonolulu
6 hours ago
Hi! Author here! You can technically offload the transactions the real driver on your host does to wherever you want really. PCI is very delay-tolerant and it usually negotiates with the device so I see not much of an issue doing that proven that you can efficiently and performantly manage the throughput throughout the architecture. The thing that kinda makes PCIem special is that you are pretty much free to do whatever you want with the accesses the driver does, you have total freedom. I have made a simple NVME controller (With a 1GB drive I basically malloc'd) which pops up on the local PCI bus (And the regular Linux's nvme block driver attaches to it just fine). You can format it, mount it, create files, folders... it's kinda neat. I also have a simple dumb rasteriser that I made inside QEMU that I wanted to write a driver for, but since it doesn't exist, I used PCIem to help me redirect the driver writes to the QEMU instance hosting the card (Thus was able to run software-rendered DOOM, OpenGL 1.X-based Quake and Half-Life ports).
Just to hijack this thread on how resilient PCIe is. PS4 Linux hackers ran PCIe over UART serial connection to reverse engineer the GPU. [0] [1]
[0] https://www.psdevwiki.com/ps4/PCIe
[1] https://fail0verflow.com/blog/2016/console-hacking-2016-post...
> PCI is very delay-tolerant
That fascinates me. Intel deserves a lot of credit for PCI. They built in future proofing for use cases that wouldn't emerge for years, when their bread and butter was PC processors and peripheral PC chips, and they could have done far less. The platform independence and general openness (PCI-SIG) are also notable for something that came from 1990 Intel.
Can one make a PCIe analyzer out of your code base by proxy all transactions thru a virtual PCIem driver to a real driver?
You can definitely proxy the transactions wherever you may see fit, but I'm not completely sure how that'd work.
As in, PCIem is going to populate the bus with virtually the same card (At least, in terms of capabilities, vendor/product id... and whanot) so I don't see how you'd then add another layer of indirection that somehow can transparently process the unfiltered transaction stream PCIem provides to it to an actual PCIe card on the bus. I feel like there's many colliding responsabilities in this.
I would instead suggest to have some sort of behavioural model (As in, have a predefined set of data to feed from/to) and have PCIem log all the accesses your real driver does. That way the driver would have enough infrastructure not to crash and at the same time you'd get the transport layer information.
Fantastic tool, thank you for making this it is one of those things that you never knew you needed until someone took the time to put it together.
This is really interesting. Could it be used to carve up a host GPU for use in a guest VM?
Depends on the use-case. For the standard hardware-accelerated guest GPU in virtualized environments, there's already QEMU's virtio-gpu device.[1]
For "carving up" there are technologies like SR-IOV (Single Root I/O Virtualization).[2]
For advanced usage, like prototyping new hardware (host driver), you could use PCIem to emulate a not-yet-existing SR-IOV-capable GPU. This would allow you to develop and test the host-side driver (the one that manages the VFs) in QEMU without needing the actual hardware.
Another advanced use-case could be a custom vGPU solution: Instead of SR-IOV, you could try to build a custom paravirtualized GPU from scratch. PCIem would let you design the low-level PCIe interface for this new device, while you write a corresponding driver on the guest. This would require significant effort but it'd provide you complete control.
[1] https://qemu.readthedocs.io/en/v8.2.10/system/devices/virtio...
[2] https://en.wikipedia.org/wiki/Single-root_input/output_virtu...
As in, getting the PCIem shim to show up on a VM (Like, passthrough)? If that's what you're asking for, then; it's something being explored currently. Main challenges come from the subsystem that has to "unbind" the device from the host and do the reconfiguration (IOMMU, interrupt routing... and whatnot). But from my initial gatherings, it doesn't look like an impossible task.
> carve up
Passthru or time sharing? The latter is difficult because you need something to manage the timeslices and enforce process isolation. I'm no expert but I understand it to be somewhere between nontrivial and not realistic without GPU vendor cooperation.
Note that the GPU vendors all deliberately include this feature as part of their market segmentation.
It would need to implement a few dozen ioctls, correctly stub the kernel module in guests, do a probably memory-safe assignment of GPU memory to guest, and then ultimately map that info to BAR/MSI-X semantics of a real kernel module. You could get VFIO pretty fast for a full start by correctly masking LTR bits, but to truly make it free you'd need a user space io_uring broker that had survived hundreds of millions of adversarial fuzz runs because there's only so fast the firmware blob can run even if it's preloaded into initramfs.
Serious work, detail intense, but not so different in design to e.g. Carmack's Trinity engine. Doable.