Memory Limit of POD and OOM Killer

Image for post
Image for post
https://unsplash.com/photos/BFRdqVAMAhU

In the past few days, some of my Pods kept on crashing and OS Syslog shows the OOM killer kills the container process. I did some research to find out how these thing works.

Pod memory limit and cgroup memory settings

Let's test on K3s. Create a pod setting the memory limit to 123Mi, a number that can be recognized easily.

In another shell, find out the uid of the pods,

At the server where the pod is running, check the cgroup settings based on the uid of the pods,

cat memory.limit_in_bytes
128974848

The number 128974848 is exact 123Mi (123*1024*1024). So its more clear now, Kubernetes set the memory limit through cgroup. Once the pod consume more memory than the limit, cgroup will start to kill the container process.

Stress test

Let's install the stress tools on the Pod through the opened shell session.

In the meantime, monitoring the Syslog by running dmesg -Tw

Run the stress tool with the memory within the limit 100M first. It’s launched successfully.

Now trigger the second stress test,

The first stress process (process id 271) was killed immediately with the signal 9.

In the meantime, the syslogs shows

The process id 32308 on the host is OOM killed. The more interesting stuff is at the last part of the log,

Image for post
Image for post

For this pod, there are processes that are the candidates that the OOM killer would select to kill. The basic process pause , which holds the network namespaces is having a oom_score_adj value of -998, is guaranteed not be killed. The rest of the processes in the container are all having the oom_score_adj value of 939. We can validate this value based on the formula from the Kubernetes document as below,

Find out the node allocatable memory by

The request memory is by default the same as limit value if its not set. So we have the oom_score_adj value as 1000–123*1024/2041888=938.32, which is close to the value 939 in the syslog. (I am not sure how exact the 939 is obtained in the OOM killer)

It’s noticed that all the process in the container has the same oom_score_adj value. The OOM killer will calculate the OOM value based on the memory usage and fine tuned with the oom_score_adj value. Finally it kill the first stress process which use the most of the memory, 100M whose oom_score value is 1718.

Conclusion

Kubernetes manages the Pod memory limit with cgroup and OOM killer. We need to be careful to separate the OS OOM and the pods OOM.

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store