This is the next post in the series about CloudNativePG (the previous one are here, here, here, here, here and here). In this post we’ll look at storage, and if you ask me, this is the most important topic when it comes to deploying PostgreSQL on Kubernetes. In the past we’ve seen a lot of deployments which used NFS as a shared storage for their deployments. While this works, it is usually not a good choice for PostgreSQL workloads in such a setup, and this is because of performance. But a lot of things changed in the recent years, and today there is plenty of choice when it comes to storage with Kubernetes.

What you usually have with PostgreSQL, is a streaming replication setup as we’ve seen it in the previous posts. In such a setup there is no need for shared storage, as data is replicated by PostgreSQL. This means you need persistent local storage, which means local storage on the Kubernetes worker nodes. This storage is then mapped into the containers and can be used by PostgreSQL to store data persistently. Using the CSI (Container Storage Interface) everybody can plugin some kind of storage into the Kubernetes system and containers can then use this for storing their data. You can find a list of storage drivers here.

As mentioned in the CloudNativePG documentation, you should chose a driver which supports snapshots, because this is used for backing up your PostgreSQL instance. What we’ve tested recently at dbi services is OpenEBS and this gave very good results. This solution comes with two types of storage services, local and replicated. As we don’t need a replicated storage for the PostgreSQL deployment, we’ll obviously go for the local one. There are additional choices for the local one, which are LVM, ZFS or raw device. For our use case, LVM fits best, so let’s start by setting this up.

In all the previous posts we’ve used minicube, but as this is a single node deployment I’ve changed my environment to a more production grade setup by deploying a vanilla Kubernetes with one Control Plane and three Worker Nodes (in a real production setup you should have three ore more Control Planes for high availability):

k8s@k8s1:~$ kubectl get nodes
NAME                       STATUS   ROLES           AGE     VERSION
k8s1                       Ready    control-plane   4d23h   v1.30.2
k8s2.it.dbi-services.com   Ready    worker          4d23h   v1.30.2
k8s3.it.dbi-services.com   Ready    worker          4d23h   v1.30.2
k8s4.it.dbi-services.com   Ready    worker          4d23h   v1.30.2

The CloudNativePG operator and the Kubernetes Dashboard are already deployed (Calico is the network component):

k8s@k8s1:~$ kubectl get pods -A
NAMESPACE              NAME                                                    READY   STATUS    RESTARTS       AGE
cnpg-system            cnpg-controller-manager-6ddc45757d-fql27                1/1     Running   1 (24m ago)    4d19h
kube-system            calico-kube-controllers-564985c589-jtm5j                1/1     Running   1 (24m ago)    4d22h
kube-system            calico-node-52qxf                                       1/1     Running   1 (24m ago)    4d22h
kube-system            calico-node-6f4v4                                       1/1     Running   1 (24m ago)    4d22h
kube-system            calico-node-jfj7s                                       1/1     Running   1 (24m ago)    4d22h
kube-system            calico-node-l92mf                                       1/1     Running   1 (24m ago)    4d22h
kube-system            coredns-7db6d8ff4d-98x5z                                1/1     Running   1 (24m ago)    4d23h
kube-system            coredns-7db6d8ff4d-mf7xq                                1/1     Running   1 (24m ago)    4d23h
kube-system            etcd-k8s1                                               1/1     Running   20 (24m ago)   4d23h
kube-system            kube-apiserver-k8s1                                     1/1     Running   19 (24m ago)   4d23h
kube-system            kube-controller-manager-k8s1                            1/1     Running   26 (24m ago)   4d23h
kube-system            kube-proxy-h6fsv                                        1/1     Running   1 (24m ago)    4d23h
kube-system            kube-proxy-jqmkl                                        1/1     Running   1 (24m ago)    4d23h
kube-system            kube-proxy-sz9lx                                        1/1     Running   1 (24m ago)    4d23h
kube-system            kube-proxy-wg7nx                                        1/1     Running   1 (24m ago)    4d23h
kube-system            kube-scheduler-k8s1                                     1/1     Running   29 (24m ago)   4d23h
kubernetes-dashboard   kubernetes-dashboard-api-bf787c6f4-2c4bw                1/1     Running   1 (24m ago)    4d22h
kubernetes-dashboard   kubernetes-dashboard-auth-6765c66c7c-7xzbx              1/1     Running   1 (24m ago)    4d22h
kubernetes-dashboard   kubernetes-dashboard-kong-7696bb8c88-cc462              1/1     Running   1 (24m ago)    4d22h
kubernetes-dashboard   kubernetes-dashboard-metrics-scraper-5485b64c47-cz75d   1/1     Running   1 (24m ago)    4d22h
kubernetes-dashboard   kubernetes-dashboard-web-84f8d6fff4-vzdcw               1/1     Running   1 (24m ago)    4d22h

For being able able to create a LVM physical volume, there is an additional small disk (vdb) on all the worker nodes:

k8s@k8s1:~$ ssh k8s2 'lsblk'
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom  
vda    254:0    0   20G  0 disk 
├─vda1 254:1    0   19G  0 part /
├─vda2 254:2    0    1K  0 part 
└─vda5 254:5    0  975M  0 part 
vdb    254:16   0    5G  0 disk 
k8s@k8s1:~$ ssh k8s3 'lsblk'
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom  
vda    254:0    0   20G  0 disk 
├─vda1 254:1    0   19G  0 part /
├─vda2 254:2    0    1K  0 part 
└─vda5 254:5    0  975M  0 part 
vdb    254:16   0    5G  0 disk 
k8s@k8s1:~$ ssh k8s4 'lsblk'
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom  
vda    254:0    0   20G  0 disk 
├─vda1 254:1    0   19G  0 part /
├─vda2 254:2    0    1K  0 part 
└─vda5 254:5    0  975M  0 part 
vdb    254:16   0    5G  0 disk 

As LVM is not installed by default in a Debian 12 minimal installation (which is used here on all the nodes), this needs to be installed:

k8s@k8s1:~$ ssh k8s2 'sudo apt install -y lvm2'
k8s@k8s1:~$ ssh k8s3 'sudo apt install -y lvm2'
k8s@k8s1:~$ ssh k8s4 'sudo apt install -y lvm2'

Now the physical volume and the volume group can be created, on all the worker nodes:

k8s@k8s1:~$ ssh k8s2 'sudo pvcreate /dev/vdb'
  Physical volume "/dev/vdb" successfully created.
k8s@k8s1:~$ ssh k8s3 'sudo pvcreate /dev/vdb'
  Physical volume "/dev/vdb" successfully created.
k8s@k8s1:~$ ssh k8s4 'sudo pvcreate /dev/vdb'
  Physical volume "/dev/vdb" successfully created.
k8s@k8s1:~$ ssh k8s2 'sudo vgcreate vgopenebs /dev/vdb'
  Volume group "vgopenebs" successfully created
k8s@k8s1:~$ ssh k8s3 'sudo vgcreate vgopenebs /dev/vdb'
  Volume group "vgopenebs" successfully created
k8s@k8s1:~$ ssh k8s4 'sudo vgcreate vgopenebs /dev/vdb'
  Volume group "vgopenebs" successfully created
k8s@k8s1:~$ ssh k8s2 'sudo vgs'
  VG        #PV #LV #SN Attr   VSize  VFree 
  vgopenebs   1   0   0 wz--n- <5.00g <5.00g

That’s it from the LVM side. The next step is to install the OpenEBS LVM2 LocalPV-LVM driver, which is done with Helm:

k8s@k8s1:~$ helm repo add openebs https://openebs.github.io/openebs
"openebs" has been added to your repositories
k8s@k8s1:~$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "openebs" chart repository
...Successfully got an update from the "kubernetes-dashboard" chart repository
Update Complete. ⎈Happy Helming!⎈
k8s@k8s1:~$ helm install openebs --namespace openebs openebs/openebs --create-namespace
NAME: openebs
LAST DEPLOYED: Wed Jun 19 08:43:14 2024
NAMESPACE: openebs
STATUS: deployed
REVISION: 1
NOTES:
Successfully installed OpenEBS.

Check the status by running: kubectl get pods -n openebs

The default values will install both Local PV and Replicated PV. However,
the Replicated PV will require additional configuration to be fuctional.
The Local PV offers non-replicated local storage using 3 different storage
backends i.e HostPath, LVM and ZFS, while the Replicated PV provides one replicated highly-available
storage backend i.e Mayastor.

For more information, 
- view the online documentation at https://openebs.io/docs
- connect with an active community on our Kubernetes slack channel.
        - Sign up to Kubernetes slack: https://slack.k8s.io
        - #openebs channel: https://kubernetes.slack.com/messages/openebs

By looking at what’s happening in the “openebs” namespace, we can see that OpenEBS is being deployed and created:

k8s@k8s1:~$ kubectl get pods -n openebs
NAME                                              READY   STATUS              RESTARTS   AGE
init-pvc-1f4f8c25-5523-44f4-94ad-8aa896bcd382     0/1     ContainerCreating   0          53s
init-pvc-39544e8c-3c0c-4b0b-ba07-dd23502acaa1     0/1     ContainerCreating   0          53s
init-pvc-5ebe4c41-9f77-408c-afde-0a2213c10f0f     0/1     ContainerCreating   0          53s
init-pvc-a33f5f49-7366-4f0d-986b-c8a282bde36e     0/1     ContainerCreating   0          53s
openebs-agent-core-b48f4fbc4-r94wc                0/2     Init:0/1            0          71s
openebs-agent-ha-node-89fcp                       0/1     Init:0/1            0          71s
openebs-agent-ha-node-bn7wt                       0/1     Init:0/1            0          71s
openebs-agent-ha-node-w574q                       0/1     Init:0/1            0          71s
openebs-api-rest-74954d444-cdfwt                  0/1     Init:0/2            0          71s
openebs-csi-controller-5d4fc97648-znvph           0/6     Init:0/1            0          71s
openebs-csi-node-2kwlx                            0/2     Init:0/1            0          71s
openebs-csi-node-8sct6                            0/2     Init:0/1            0          71s
openebs-csi-node-bjknj                            0/2     Init:0/1            0          71s
openebs-etcd-0                                    0/1     Pending             0          71s
openebs-etcd-1                                    0/1     Pending             0          71s
openebs-etcd-2                                    0/1     Pending             0          71s
openebs-localpv-provisioner-7cd9f85f8f-5vnvp      1/1     Running             0          71s
openebs-loki-0                                    0/1     Pending             0          71s
openebs-lvm-localpv-controller-64946b785c-dnvh4   0/5     ContainerCreating   0          71s
openebs-lvm-localpv-node-42n8f                    0/2     ContainerCreating   0          71s
openebs-lvm-localpv-node-h47r8                    0/2     ContainerCreating   0          71s
openebs-lvm-localpv-node-ndgwk                    2/2     Running             0          71s
openebs-nats-0                                    0/3     ContainerCreating   0          71s
openebs-nats-1                                    0/3     ContainerCreating   0          71s
openebs-nats-2                                    0/3     ContainerCreating   0          71s
openebs-obs-callhome-5b7fdb675-8f85b              0/2     ContainerCreating   0          71s
openebs-operator-diskpool-794596c9b7-jtg5t        0/1     Init:0/2            0          71s
openebs-promtail-2mfgt                            1/1     Running             0          71s
openebs-promtail-8np7q                            0/1     ContainerCreating   0          71s
openebs-promtail-lv4ht                            1/1     Running             0          71s
openebs-zfs-localpv-controller-7fdcd7f65-mnnhf    0/5     ContainerCreating   0          71s
openebs-zfs-localpv-node-6pd4v                    2/2     Running             0          71s
openebs-zfs-localpv-node-c5vld                    0/2     ContainerCreating   0          71s
openebs-zfs-localpv-node-kqxrg                    0/2     ContainerCreating   0          71s

After a while, you should see the following state:

k8s@k8s1:~$ kubectl get pods -n openebs -l role=openebs-lvm
NAME                                              READY   STATUS    RESTARTS   AGE
openebs-lvm-localpv-controller-64946b785c-dnvh4   5/5     Running   0          12m
openebs-lvm-localpv-node-42n8f                    2/2     Running   0          12m
openebs-lvm-localpv-node-h47r8                    2/2     Running   0          12m
openebs-lvm-localpv-node-ndgwk                    2/2     Running   0          12m

Now we are ready to create the storage class:

  • We want the storage class to be named “openebs-lvmpv”
  • We want to allow volume expansion
  • We reference the volume group we’ve created above
  • We want ext4 as the file system
  • We restrict this to our worker nodes
k8s@k8s1:~$ cat sc.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvmpv
allowVolumeExpansion: true
parameters:
  storage: "lvm"
  volgroup: "vgopenebs"
  fsType: "ext4"
provisioner: local.csi.openebs.io
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
      - k8s2.it.dbi-services.com
      - k8s3.it.dbi-services.com
      - k8s4.it.dbi-services.com

k8s@k8s1:~$ kubectl apply -f sc.yaml 
storageclass.storage.k8s.io/openebs-lvmpv created
k8s@k8s1:~$ kubectl get sc
NAME                     PROVISIONER               RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
mayastor-etcd-localpv    openebs.io/local          Delete          WaitForFirstConsumer   false                  21m
mayastor-loki-localpv    openebs.io/local          Delete          WaitForFirstConsumer   false                  21m
openebs-hostpath         openebs.io/local          Delete          WaitForFirstConsumer   false                  21m
openebs-lvmpv            local.csi.openebs.io      Delete          Immediate              true                   22s
openebs-single-replica   io.openebs.csi-mayastor   Delete          Immediate              true                   21m

Once this is ready we need to modify our cluster definition to use the new storage class by adding a PVC template:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-pg-cluster
spec:
  instances: 3
 
  bootstrap:
    initdb:
      database: db1
      owner: db1
      dataChecksums: true
      walSegmentSize: 32
      localeCollate: 'en_US.utf8'
      localeCType: 'en_US.utf8'
      postInitSQL:
      - create user db2
      - create database db2 with owner = db2
  storage:
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: openebs-lvmpv
      volumeMode: Filesystem

As usual deploy the cluster and wait until the pods are up and running:

k8s@k8s1:~$ kubectl apply -f pg.yaml 
cluster.postgresql.cnpg.io/my-pg-cluster created
k8s@k8s1:~$ kubectl get pods 
NAME              READY   STATUS    RESTARTS   AGE
my-pg-cluster-1   1/1     Running   0          2m55s
my-pg-cluster-2   1/1     Running   0          111s
my-pg-cluster-3   1/1     Running   0          52s

If you go to one of the worker nodes, you can see the mount and it’s content:

root@k8s2:/home/k8s$ df -h | grep ebs
/dev/mapper/vgopenebs-pvc--2bcc48bc--4600--4c6a--a13f--6dfc1e9ea081  974M  230M  728M  24% /var/lib/kubelet/pods/5b9441b5-039e-4a75-8865-0ccd053f08fc/volumes/kubernetes.io~csi/pvc-2bcc48bc-4600-4c6a-a13f-6dfc1e9ea081/mount
root@k8s2:/home/k8s$ ls /var/lib/kubelet/pods/5b9441b5-039e-4a75-8865-0ccd053f08fc/volumes/kubernetes.io~csi/pvc-2bcc48bc-4600-4c6a-a13f-6dfc1e9ea081/mount
lost+found  pgdata
root@k8s2:/home/k8s$ ls /var/lib/kubelet/pods/5b9441b5-039e-4a75-8865-0ccd053f08fc/volumes/kubernetes.io~csi/pvc-2bcc48bc-4600-4c6a-a13f-6dfc1e9ea081/mount/pgdata/
base              global         pg_dynshmem    pg_logical    pg_replslot   pg_stat      pg_tblspc    pg_wal                postgresql.conf
current_logfiles  override.conf  pg_hba.conf    pg_multixact  pg_serial     pg_stat_tmp  pg_twophase  pg_xact               postmaster.opts
custom.conf       pg_commit_ts   pg_ident.conf  pg_notify     pg_snapshots  pg_subtrans  PG_VERSION   postgresql.auto.conf  postmaster.pid

As mentioned initially: The storage part is critical and you need to carefully select what you want to use and really test it. This will be the topic for the next post.