Comment by turbobrew

9 hours ago

A good reason to health check the kubelet process and restart it when the checks fail.

2 comments

__turbobrew__

What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:

  # cat /proc/$(pgrep kubelet)/oom_score_adj
  -999
  
  (from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)

With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.

nijave 7 hours ago

At the metrics level, you can compare old vs new release. Have been bitten before by resource requirements dramatically change (regardless of whether it's a bug or functionality change)

Comment by __turbobrew__