Comment by jeffbee
4 hours ago
Of course, it only helps workloads that exhibit high rates of page table walking per instruction. But those are really common.
4 hours ago
Of course, it only helps workloads that exhibit high rates of page table walking per instruction. But those are really common.
Yes, I understand that. It is implied that there's a high TLB miss rate. However, I'm wondering if the penalty which we can quantify as O(4) memory accesses for 4-level page table, which amounts to ~20 cycles if pages are already in L1 cache, or ~60-200 cycles if they are in L2/L3, would be noticeable in workloads which are IO bound. In other words, would such workloads benefit from switching to the huge pages when most of the time CPU anyways sits waiting on the data to arrive from the storage.
In a multi-tenant environment, yes. The faster they can get off the CPU and yield to some other tenant, the better it is.