Create alerts for OpenShift user workload

Zhimin Wen
5 min readApr 17, 2021

Starting from OpenShift 4.6, user workload monitoring is formally supported by introducing a second Prometheus operator instance in a new namespace called openshift-user-workload-monitoring. This paper demonstrates how a user workload can be monitored and the alerts can be created.

Turn on user workload monitoring

Create or update the following configMap in openshift-monitoring

apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true

With the enableUserWorkload key set as true, the 2nd Prometheus operator will be installed, which will create a Prometheus and Thanos Ruler as shown as below,

oc -n openshift-user-workload-monitoring get pods
NAME READY STATUS RESTARTS AGE
prometheus-operator-7bd67b9d5d-znr8r 2/2 Running 0 19h
prometheus-user-workload-0 5/5 Running 1 19h
prometheus-user-workload-1 5/5 Running 1 19h
thanos-ruler-user-workload-0 3/3 Running 0 19h
thanos-ruler-user-workload-1 3/3 Running 0 19h

The monitoring architecture is depicted in the following diagram.

OpenShift Monitoring

For a user workload, if it has Prometheus compatible monitoring capability, the user Prometheus is able to scrape the application’s metric. By using Thanos Querier, user workload metrics can be queried with the main monitoring Prometheus also.

Notice the user Prometheus will share the main Alertmanager for alerts. If the metric expression for alerts can be evaluated within the user Prometheus, alert can be generated and forward to the main alertmanager. On the other hand, a Thanos Ruler is introduced in the namespace. It will be able to evaluate both cluster level and user Prometheus metrics expressions. It is multitenancy aware, the metric will be the metric of the specific user’s project only.

--

--