Comment by jeffbee

4 hours ago

Of course, it only helps workloads that exhibit high rates of page table walking per instruction. But those are really common.

2 comments

jeffbee

menaerus 43 minutes ago

Yes, I understand that. It is implied that there's a high TLB miss rate. However, I'm wondering if the penalty which we can quantify as O(4) memory accesses for 4-level page table, which amounts to ~20 cycles if pages are already in L1 cache, or ~60-200 cycles if they are in L2/L3, would be noticeable in workloads which are IO bound. In other words, would such workloads benefit from switching to the huge pages when most of the time CPU anyways sits waiting on the data to arrive from the storage.

jeffbee 10 minutes ago

In a multi-tenant environment, yes. The faster they can get off the CPU and yield to some other tenant, the better it is.