Cleanup of pg_wal directory
I have a Postgres cluster deployed on OpenShift using the CNPG operator. It has been running well for the past 90 days. But recently one of the secondary nodes is having disk full issue where the pg_wal folder is hogging the filesystem.
First attempt
Following the advice never to remove any pg_wal files, let’s increase the PVC dynamically since we have the allowVolumeExpansion
set as true in the storage class.
My database is small with only 1GB PVC allocated at initial. Increase the PVC to 2GB and update the operator CR storage size. Good, disk resized, problem resolved.
But within one day, the 2GB disk is full again. Increase to 5GB, 10GB, it is getting full again. We have to resolve the problem fundamentally.
The real problem
Identify the primary node with the command oc get cluster
Exec into the primary pod which doesn’t have the file system full issue.
bash-5.1$ pg_controldata
pg_control version number: 1300
Catalog version number: 202209061
Database system identifier: 7362028205886652442
Database cluster state: in production
pg_control last modified: Thu 01 Aug 2024 01:06:09 PM UTC
Latest checkpoint location: 6D/3A089050
Latest checkpoint's REDO…