Default Toleration at Namespace Level

I have an OpenShift cluster where I dedicate some of the nodes to run my workload by tainting these nodes. To run my normal pods on these nodes, I just need to define the tolerations based on the taint keys.

However, the workload is operator-based, and too bad not all the CRD has the tolerations defined. The “brute-force change” on the Deployment or the Statefulset will not take effect in the end as “the big brother” will rectify it based on the definition in its original mind ;)

I need the missing toleration to be inserted after the operator creates the K8s resource and before the resource persists and submits for scheduling. This is what the admission controllers, particularly PodTolerationRestriction can help.

The admission controller PodTolerationRestriction will check if the Pod tolerations conflict with the predefined whitelist, and it is able to define default tolerations at the namespace level with annotation. If the pod doesn’t have the toleration then this default toleration will be applied.

Let’s check it out.

Taint all the worker nodes non-schedulable.

kubectl taint nodes {{ .node }} reservedFor=myApp:NoSchedule

Create a namespace “tolerant”. Deploy a test app in this namespace as shown below,

apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: tolerant
labels:
app: app
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: app
image: alpine
command:
- sh
- -c
- while true; do sleep 10; done

After applying it, the pod is in pending mode. Describe it, we see the following,

Warning  FailedScheduling  10s (x3 over 27s)  default-scheduler  0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {reservedFor: myApp}, that the pod didn't tolerate.

In my four nodes cluster, the master node is not schedulable, the 3 workers are also not schedulable as the deployment is lacking the toleration for the reservedFor taint key.

Now, let’s annotate the namespace with the default tolerations.

oc annotate namespace tolerant 'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "reservedFor"}]'

Delete the pending pod, and watch the pod is scheduled and running. Describe it again and check the default toleration was added by the admission controller automatically,

...
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
reservedFor:NoSchedule
...

The problem of lacking toleration in the operator-based CRD is resolved.

It’s noticed that not all the admission controllers are turned on by default for the standard Kubernetes. In my 4.x OpenShift cluster, the enabled admission controller can be referred to the default YAML file.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store