Comment by ec109685

1 year ago

In an ideal world, it’s far better to not use Limits but instead have applications set their CPU requests. That way, if the system has CPU available, applications can use more than their requested CPU (and won’t get throttled), but if CPU becomes saturated, the Kernel will ensure no process gets more than their fair share.

Unfortunately in practice, without Limits, noisy neighbors can interfere with well behaving apps. For example, if you are on a 64 core machine, if you have a process that requests 2 CPU’s and another process using all the rest of the cores, the 2 CPU process’s CPU share will not be perfectly consistent and for latency sensitive apps (like redis), you’ll see response time fluctuates.

It’s probably better to use newer Kubernetes features for extremely latency sensitive application to pin them to particular CPU’s. That way, their latency shouldn’t be affected by noisy neighbors, and those apps can fight for the rest of the host’s CPU’s.

With Limits, unless you can guarantee your app will never use more than its assigned max cpu, any temporary burst of cpu utilization will hit throttling (your app will sleep until the next scheduling period), which can destroy p95 response times. Having an app essentially melt down when the box has gobs of CPU available is never fun.

4 comments

ec109685

ekimekim 1 year ago

The other problem with not setting limits is that it's very easy to use more than your requests routinely, and you won't know that you're misconfigured until the one day you have a noisy neighbor and you only get what you asked for.

Monitoring helps, but requires some nuance. For example, your average CPU might look fine at 50%, but in truth you're using 200% for 500ms followed by 0% for 500ms, and when CPU is scarce your latency unexpectedly doubles.

While it doesn't eliminate it entirely (as you rightly point out), enforcing limits even when there's excess CPU available will mostly ensure that your performance doesn't suddenly change due to outside factors, which IMO is more valuable than having higher performance most-but-not-all of the time.

Vegemeister 1 year ago
>For example, your average CPU might look fine at 50%, but in truth you're using 200% for 500ms followed by 0% for 500ms, and when CPU is scarce your latency unexpectedly doubles.
That is exactly the behavior that cgroups' cpu.max has, except it'd have to be 50 ms instead of 500 with the default period.
The problem with cpu.max is that people want a "50%" CPU limit to make the kernel force-idle your threads in the same timeslice size you'd get with something else competing for the other 50% of the CPU, but that is not actually what cpu.max does. Perhaps that is what it should do, but unfortunately, the `echo $maxruntime_ns $period_ns >cpu.max` thing is UAPI. Although, I don't know if anyone would complain if one day the kernel started interpreting that as a rational fraction and ignoring the absolute values of the numbers.
This makes me really want to write a program that RDTSCs in a loop into an array, and then autocorr(diff()) the result. That'd probably expose all kinds of interesting things about scheduler timeslices, frequency scaling, and TSC granularity.
- ec109685 1 year ago
  
  Yes, in that scenario of 500ms of 200% CPU for a request / response type workload, 50% of responses will have an extra 25ms response time tacked on as the system is sleeping during the remaining portion of each scheduling period.
  This goes into detail: https://docs.kernel.org/scheduler/sched-bwc.html
ec109685 1 year ago

If you don’t let people burst, you lose a benefit of multi-tenancy. Each workload stays conservative ensuring they never throttle, and your nodes end up very underutilized since you can’t share that buffer amongst workloads.
With auto scaling, if a workload is using more their allocated CPU, more containers will be brought online to bring down cpu utilization, which will get the system back into balance.