Member-only story

Fault Injection on Ceph RBD Block — Simulating a Pod Failure

Zhimin Wen
5 min readNov 5, 2023

--

Image by Stan Madoré from Pixabay

With the popularity of Kubernetes Operators, its becoming difficult to inject fault into the application by manipulating its YAML definitions. To simulate an error, understand the problem, and therefore respond better as part of the chaos engineering process, we have to start from somewhere the operator doesn’t control.

This paper explores some approaches for fault injection into those applications that are using block based persistent volume claim (PVC) with Ceph RBD (Rados Block Device).

Assuming we are using OpenShift Data Foundation (ODF) as the storage solution for K8s. The same applies to the Rook Operator.

Launch of the RBD Tool

For ODF, we can bring up the ceph tool by updating the CR as below,

 oc patch OCSInitialization ocsinit -n openshift-storage --type json \
--patch '[{ "op": "replace", \
"path": "/spec/enableCephTools", \
"value": true \
}]'

Once the rook-ceph-tool pod is running, you can exec inside and run the rbd tool.

The otherway is to use the tool in the rook-ceph-operator pod, using a predefined configuration. For an example,

oc -n openshift-storage exec -it rook-ceph-operator-67649c8794-bfk6b…

--

--

Responses (1)