Troubleshoot Ceph Storage with Ceph Tool for OpenShift Data Foundation
You may want to troubleshoot some Ceph storage issue that are coming from OpenShift Data Foundation (ODF). In the rook operator based Ceph storage solution, you can deploy the “rook-ceph-tools” deployment to have the ceph command line tool for the troublshooting purpose. In the ODF operator the ceph command line tool is actually available.
Launch of the Ceph Command Line Tool
Exec into the
rook-ceph-operator pod in the ODF namespace,
oc -n openshift-storage exec -it rook-ceph-operator-5ff898fc8-pppgd -- bash
openshift-storage.config availalble in the
/var/lib/rook/openshift-storage directory can be used to as the ceph config file where the MON hosts and the admin client’s keyring are defined.
fsid = 520035a-0552-461a-a194-46dca45ba8fc
mon initial members = j k h
mon host = [v2:172.30.26.44:3300, v1:172.30.26.44:6789], [v2:172.30.16.22:3300, v1:172.30.16.22:6789], [v2:172.30.140.78:3300, v1:172.30.140.78:6789]
osd_memory_target_cgroup_limit_ratio = 0.8
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
With this, we can run the ceph command line tool for troubleshooting, continue in the exec-ed shell, check the status with the command below,
ceph -c openshift-storage.config -s
health: HEALTH OK
mon: 3 daemons, quorum h, j,k (age 3d)
mgr: a(active, since 8d)
mds: 1/1 daemons up, 1 hot standby
osd: 12 osds: 12 up (since 8d), 12 in (since 11d)
volumes: 1/1 healthy
pools: 4 pools, 417 pgs
objects: 905.53k objects, 1.3 TiB
usage: 4.0 TiB used, 8.0 TiB / 12 TiB avail
pgs: 417 active+clean
client: 12 MiB/s rd, 53 MiB/s wI, 43 op/s rd, 1.31k op/s wr
Fixing a Warning Issue
In one the ODF setup, from the OpenShift Console, I have the following warning image for the storage.
ceph -s tool in the operator pod, it indicated that the health status is HEALTH WARNING, where the
mds.ocs-storagecluster-cephfilesystem-b is crashed before.
Check the crash by listing the crash IDs, check the details with the ID
ceph -c openshift-storage.config crash ls
ceph -c openshift-storage.config crash info <CRASH_ID>
Silence the warning alarm by achive the crash ID.
ceph -c openshift-storage.config crash archive <CRASH_ID>
The warning message is then cleared in the OpenShift Console.