Create alerts for OpenShift user workload

Starting from OpenShift 4.6, user workload monitoring is formally supported by introducing a second Prometheus operator instance in a new namespace called openshift-user-workload-monitoring. This paper demonstrates how a user workload can be monitored and the alerts can be created.

Turn on user workload monitoring

Create or update the following configMap in openshift-monitoring

With the enableUserWorkload key set as true, the 2nd Prometheus operator will be installed, which will create a Prometheus and Thanos Ruler as shown as below,

The monitoring architecture is depicted in the following diagram.

For a user workload, if it has Prometheus compatible monitoring capability, the user Prometheus is able to scrape the application’s metric. By using Thanos Querier, user workload metrics can be queried with the main monitoring Prometheus also.

Notice the user Prometheus will share the main Alertmanager for alerts. If the metric expression for alerts can be evaluated within the user Prometheus, alert can be generated and forward to the main alertmanager. On the other hand, a Thanos Ruler is introduced in the namespace. It will be able to evaluate both cluster level and user Prometheus metrics expressions. It is multitenancy aware, the metric will be the metric of the specific user’s project only.

Let’s check it out.

A sample app exposes Prometheus metric

Create a simple HTTP server using Prometheus golang client with the following code as shown in my past paper.

Compile the code, build the image and push it into the OpenShift private registry in the namespace of app-mon

Create a PVC to allocate 1M storage. We will use the free space metrics from the cluster-level Prometheus to construct the alerts for our user workload.

Deploy it with the following deployment resource,

Expose the service,

Notice we label the service so that the Prometheus operator can select this service to scrape the metrics. Create the following CRD,

Monitor the metrics

There is no OpenShift route created for the user Prometheus, we can port-forward with oc port-forward svc/prometheus-operated 9090:9090

Validate the service is discovered and active, and the metrics are collected.

Meantime, the metrics can be accessed from the Thanos querier at the cluster level.

Create an alert with user metrics only

Now using the Prometheus operator CRD, create a PrometheusRule to monitor the app.

As the metrics are within the user Prometheus, we can sink the rule to the user Prometheus by using the label of openshift.io/prometheus-rule-evaluation-scope: leaf-prometheus.

Check the user Prometheus, you can see the rule is created.

Pumping in some load, monitor the alert is created,

In the meantime, on the OpenShift Web Console, select the Source = User filter, the alert is shown here also.

The cluster-level notification can then be performed.

Create the same alert evaluated by Thano Ruler

For the same PrometheusRule resource, remove the label of leaf-prometheus, apply it again.

You will notice that the rule disappears from user Prometheus. Open the Thanos Ruler, the rule will be evaluated here instead.

Notice the namespace label is added automatically so that the rule can only see the data in its own namespace.

Create user alert with cluster metrics

Understand where the rule is evaluated, we know that in order to create alert with cluster level metrics, the rule has to be evaluated by Thanos ruler. We must not use the “lead-prometheus” label.

Create the following rule to monitor the PVC volume usage.

As we set the threshold very low, the alert is created,

Tips: You can watch the PrometheusRule resource is picked up and the Rule is created either in the configMap prometheus-user-workload-rulefiles-0 or thanos-ruler-user-workload-rulefiles-0 in the project of openshift-user-workload-monitoring.

Conclusion

A more complete monitoring solution for user workload is now available in OpenShift.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store