Troubleshoot Ceph Storage with Ceph Tool for OpenShift Data Foundation
--
You may want to troubleshoot some Ceph storage issue that are coming from OpenShift Data Foundation (ODF). In the rook operator based Ceph storage solution, you can deploy the “rook-ceph-tools” deployment to have the ceph command line tool for the troublshooting purpose. In the ODF operator the ceph command line tool is actually available.
Launch of the Ceph Command Line Tool
Exec into the rook-ceph-operator
pod in the ODF namespace,
oc -n openshift-storage exec -it rook-ceph-operator-5ff898fc8-pppgd -- bash
cd /var/lib/rook/openshift-storage
ls
client.admin.keyring openshift-storage.config
The file openshift-storage.config
availalble in the /var/lib/rook/openshift-storage
directory can be used to as the ceph config file where the MON hosts and the admin client’s keyring are defined.
[global]
fsid = 520035a-0552-461a-a194-46dca45ba8fc
mon initial members = j k h
mon host = [v2:172.30.26.44:3300, v1:172.30.26.44:6789], [v2:172.30.16.22:3300, v1:172.30.16.22:6789], [v2:172.30.140.78:3300, v1:172.30.140.78:6789]
...
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
With this, we can run the ceph command line tool for troubleshooting, continue in the exec-ed shell, check the status with the command below,
cd /var/lib/rook/openshift-storage
ceph -c openshift-storage.config -s
cluster:
id: 5e20035a-0552-461a-a194-46dca45ba8£c
health: HEALTH OK
services:
mon: 3 daemons, quorum h, j,k (age 3d)
mgr: a(active, since 8d)
mds: 1/1 daemons up, 1 hot standby
osd: 12 osds: 12 up (since 8d), 12 in (since 11d)
data:
volumes: 1/1 healthy
pools: 4 pools, 417 pgs
objects: 905.53k objects, 1.3 TiB
usage: 4.0 TiB used, 8.0 TiB / 12 TiB avail
pgs: 417 active+clean
io:
client: 12 MiB/s rd, 53 MiB/s wI, 43 op/s rd, 1.31k op/s wr
Fixing a Warning Issue
In one the ODF setup, from the OpenShift Console, I have the following warning image for the storage.
Running the ceph -s
tool in the operator pod, it indicated that the health status is HEALTH WARNING, where the mds.ocs-storagecluster-cephfilesystem-b
is crashed before.
Check the crash by listing the crash IDs, check the details with the ID
ceph -c openshift-storage.config crash ls
ceph -c openshift-storage.config crash info <CRASH_ID>
Silence the warning alarm by achive the crash ID.
ceph -c openshift-storage.config crash archive <CRASH_ID>
The warning message is then cleared in the OpenShift Console.