← Back to context

Comment by smj-edison

9 days ago

How performant is that in practice? I thought setting pages was a fairly expensive process. Using a statically mapped circular buffer makes more sense to me at least.

Disclaimer: I don't actually know what I'm talking about, lol

To be clear, since the other replies to you don't seem to be mentioning it, the major costs of MMU page-based virtual memory are never about setting the page metadata. In any instance of remapping, TLB shootdowns and subsequent misses hurt. Page remapping is still very useful for large buffers, and other costs can be controlled based on intended usage, but smaller buffers should use other methods.

(Of course I'm being vague about the cutoff for "large" and "smaller" buffers. Always benchmark!)

You can pretty reliably do it on the order of 1 us on a modern desktop processor. If you use a level 2 sized mapping table entry of say 2 MB, that is a transfer speed on the order of 2 TB/s or ~32x faster than RAM for a single core even if you only move a single level 2 sized mapping table entry. If you transfer multiple in one go or use say a level 3 sized mapping table entry of 1 GB that would be 1 PB/s or ~16,000x faster than RAM or ~20x the full memory bandwidth of a entire H200 GPU.

Pretty quick, far faster than inter-process memory copy. The only way to be sure would be to set it up and to measure it, but on a 486/33 I could do this ~200K per second, on modern systems it should be a lot faster than that, more so if the processe(s) do not use FP. But I never actually tried setting up say a /dev/null implementation that used this, it would be an interesting experiment.