← Back to context

Comment by Veserv

9 days ago

You can pretty reliably do it on the order of 1 us on a modern desktop processor. If you use a level 2 sized mapping table entry of say 2 MB, that is a transfer speed on the order of 2 TB/s or ~32x faster than RAM for a single core even if you only move a single level 2 sized mapping table entry. If you transfer multiple in one go or use say a level 3 sized mapping table entry of 1 GB that would be 1 PB/s or ~16,000x faster than RAM or ~20x the full memory bandwidth of a entire H200 GPU.