Comment by Vegemeister
1 year ago
>For example, your average CPU might look fine at 50%, but in truth you're using 200% for 500ms followed by 0% for 500ms, and when CPU is scarce your latency unexpectedly doubles.
That is exactly the behavior that cgroups' cpu.max has, except it'd have to be 50 ms instead of 500 with the default period.
The problem with cpu.max is that people want a "50%" CPU limit to make the kernel force-idle your threads in the same timeslice size you'd get with something else competing for the other 50% of the CPU, but that is not actually what cpu.max does. Perhaps that is what it should do, but unfortunately, the `echo $maxruntime_ns $period_ns >cpu.max` thing is UAPI. Although, I don't know if anyone would complain if one day the kernel started interpreting that as a rational fraction and ignoring the absolute values of the numbers.
This makes me really want to write a program that RDTSCs in a loop into an array, and then autocorr(diff()) the result. That'd probably expose all kinds of interesting things about scheduler timeslices, frequency scaling, and TSC granularity.
Yes, in that scenario of 500ms of 200% CPU for a request / response type workload, 50% of responses will have an extra 25ms response time tacked on as the system is sleeping during the remaining portion of each scheduling period.
This goes into detail: https://docs.kernel.org/scheduler/sched-bwc.html