By now, if you followed the previous posts (here, here, here, here and here), we know quite a bit about how to use CloudNativePG to deploy a PostgreSQL cluster and how to get detailed information about the deployment. What we’ll look at in this post is how you can leverage this deployment to scale the cluster up and down. This might be important if you have changing workloads throughout the day or the week and your application is able to distribute read only workloads across the PostgreSQL replicas.
When we look at what we have now, we do see this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster Cluster Summary Name: my-pg-cluster Namespace: default System ID: 7378131726640287762 PostgreSQL Image: ghcr.io /cloudnative-pg/postgresql :16.2 Primary instance: my-pg-cluster-1 Primary start time : 2024-06-08 13:59:26 +0000 UTC (uptime 88h35m7s) Status: Cluster in healthy state Instances: 3 Ready instances: 3 Current Write LSN: 0 /26000000 (Timeline: 1 - WAL File: 000000010000000000000012) Certificates Status Certificate Name Expiration Date Days Left Until Expiration ---------------- --------------- -------------------------- my-pg-cluster-ca 2024-09-06 13:54:17 +0000 UTC 86.31 my-pg-cluster-replication 2024-09-06 13:54:17 +0000 UTC 86.31 my-pg-cluster-server 2024-09-06 13:54:17 +0000 UTC 86.31 Continuous Backup status Not configured Physical backups No running physical backups found Streaming Replication status Replication Slots Enabled Name Sent LSN Write LSN Flush LSN Replay LSN Write Lag Flush Lag Replay Lag State Sync State Sync Priority Replication Slot ---- -------- --------- --------- ---------- --------- --------- ---------- ----- ---------- ------------- ---------------- my-pg-cluster-2 0 /26000000 0 /26000000 0 /26000000 0 /26000000 00:00:00 00:00:00 00:00:00 streaming async 0 active my-pg-cluster-3 0 /26000000 0 /26000000 0 /26000000 0 /26000000 00:00:00 00:00:00 00:00:00 streaming async 0 active Unmanaged Replication Slot Status No unmanaged replication slots found Managed roles status No roles managed Tablespaces status No managed tablespaces Pod Disruption Budgets status Name Role Expected Pods Current Healthy Minimum Desired Healthy Disruptions Allowed ---- ---- ------------- --------------- ----------------------- ------------------- my-pg-cluster replica 2 2 1 1 my-pg-cluster-primary primary 1 1 1 0 Instances status Name Database Size Current LSN Replication role Status QoS Manager Version Node ---- ------------- ----------- ---------------- ------ --- --------------- ---- my-pg-cluster-1 37 MB 0 /26000000 Primary OK BestEffort 1.23.1 minikube my-pg-cluster-2 37 MB 0 /26000000 Standby (async) OK BestEffort 1.23.1 minikube my-pg-cluster-3 37 MB 0 /26000000 Standby (async) OK BestEffort 1.23.1 minikube |
We have a primary instance running in pod my-pg-cluster-1, and we have two replicas in asynchronous mode running in pods my-pg-cluster-2 and my-pg-cluster-3. Let’s assume we have an increasing workload and we want to have two more replicas. There are two ways in which you can do this. The first one is to change the configuration of the cluster in the yaml and then re-apply the configuration. This is the configuration as it is now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: my-pg-cluster spec: instances: 3 bootstrap: initdb: database: db1 owner: db1 dataChecksums: true walSegmentSize: 32 localeCollate: 'en_US.utf8' localeCType: 'en_US.utf8' postInitSQL: - create user db2 - create database db2 with owner = db2 postgresql: parameters: work_mem: "12MB" pg_stat_statements.max: "2500" pg_hba: - host all all 192.168.122.0/24 scram-sha-256 storage: size: 1Gi |
All we need to do is to change the number of instances we want to have. With the current value of three, we get one primary and two replicas. If we want to have two more replicas, change this to five and re-apply:
1 2 3 4 | minicube@micro-minicube:~> grep instances pg.yaml instances: 5 minicube@micro-minicube:~> kubectl apply -f pg.yaml cluster.postgresql.cnpg.io /my-pg-cluster configured |
By monitoring the pods you can follow the progress of bringing up two new pods and attaching the replicas to the current cluster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (32m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (32m ago) 2d my-pg-cluster-3 1 /1 Running 1 (32m ago) 2d my-pg-cluster-4 0 /1 PodInitializing 0 3s my-pg-cluster-4- join -kqgwp 0 /1 Completed 0 11s minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (33m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (33m ago) 2d my-pg-cluster-3 1 /1 Running 1 (33m ago) 2d my-pg-cluster-4 1 /1 Running 0 42s my-pg-cluster-5 1 /1 Running 0 19s |
Now we see five pods, as requested, and looking at the PostgreSQL streaming replication configuration confirms that we now have four replicas:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster Cluster Summary Name: my-pg-cluster Namespace: default System ID: 7378131726640287762 PostgreSQL Image: ghcr.io /cloudnative-pg/postgresql :16.2 Primary instance: my-pg-cluster-1 Primary start time : 2024-06-08 13:59:26 +0000 UTC (uptime 88h43m54s) Status: Cluster in healthy state Instances: 5 Ready instances: 5 Current Write LSN: 0 /2C000060 (Timeline: 1 - WAL File: 000000010000000000000016) Certificates Status Certificate Name Expiration Date Days Left Until Expiration ---------------- --------------- -------------------------- my-pg-cluster-ca 2024-09-06 13:54:17 +0000 UTC 86.30 my-pg-cluster-replication 2024-09-06 13:54:17 +0000 UTC 86.30 my-pg-cluster-server 2024-09-06 13:54:17 +0000 UTC 86.30 Continuous Backup status Not configured Physical backups No running physical backups found Streaming Replication status Replication Slots Enabled Name Sent LSN Write LSN Flush LSN Replay LSN Write Lag Flush Lag Replay Lag State Sync State Sync Priority Replication Slot ---- -------- --------- --------- ---------- --------- --------- ---------- ----- ---------- ------------- ---------------- my-pg-cluster-2 0 /2C000060 0 /2C000060 0 /2C000060 0 /2C000060 00:00:00 00:00:00 00:00:00 streaming async 0 active my-pg-cluster-3 0 /2C000060 0 /2C000060 0 /2C000060 0 /2C000060 00:00:00 00:00:00 00:00:00 streaming async 0 active my-pg-cluster-4 0 /2C000060 0 /2C000060 0 /2C000060 0 /2C000060 00:00:00 00:00:00 00:00:00 streaming async 0 active my-pg-cluster-5 0 /2C000060 0 /2C000060 0 /2C000060 0 /2C000060 00:00:00 00:00:00 00:00:00 streaming async 0 active Unmanaged Replication Slot Status No unmanaged replication slots found Managed roles status No roles managed Tablespaces status No managed tablespaces Pod Disruption Budgets status Name Role Expected Pods Current Healthy Minimum Desired Healthy Disruptions Allowed ---- ---- ------------- --------------- ----------------------- ------------------- my-pg-cluster replica 4 4 3 1 my-pg-cluster-primary primary 1 1 1 0 Instances status Name Database Size Current LSN Replication role Status QoS Manager Version Node ---- ------------- ----------- ---------------- ------ --- --------------- ---- my-pg-cluster-1 37 MB 0 /2C000060 Primary OK BestEffort 1.23.1 minikube my-pg-cluster-2 37 MB 0 /2C000060 Standby (async) OK BestEffort 1.23.1 minikube my-pg-cluster-3 37 MB 0 /2C000060 Standby (async) OK BestEffort 1.23.1 minikube my-pg-cluster-4 37 MB 0 /2C000060 Standby (async) OK BestEffort 1.23.1 minikube my-pg-cluster-5 37 MB 0 /2C000060 Standby (async) OK BestEffort 1.23.1 minikube |
If you want to scale this down again (maybe because the workload decreased), you can do that in the same way by reducing the number of instances from five to three in the cluster definition, or by directly scaling the cluster down with kubectl:
1 2 | minicube@micro-minicube:~> kubectl scale --replicas=2 -f pg.yaml cluster.postgresql.cnpg.io /my-pg-cluster scaled |
Attention: Replicas in this context does not mean streaming replication replicas. It means replicas in the context of Kubernetes, so if you do it like above, the result will be one primary and one replica:
1 2 3 4 | minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (39m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (39m ago) 2d1h |
What you probably really want is this (to get back to the initial state of the cluster):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | minicube@micro-minicube:~> kubectl scale --replicas=3 -f pg.yaml cluster.postgresql.cnpg.io /my-pg-cluster scaled minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (41m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (41m ago) 2d1h my-pg-cluster-6- join -747nx 0 /1 Pending 0 1s minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (41m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (41m ago) 2d1h my-pg-cluster-6- join -747nx 1 /1 Running 0 5s minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (42m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (42m ago) 2d1h my-pg-cluster-6 0 /1 Running 0 5s my-pg-cluster-6- join -747nx 0 /1 Completed 0 14s ... minicube@micro-minicube:~> kubectl get pods NAME READY STATUS RESTARTS AGE my-pg-cluster-1 1 /1 Running 1 (42m ago) 2d1h my-pg-cluster-2 1 /1 Running 1 (42m ago) 2d1h my-pg-cluster-6 1 /1 Running 0 16s |
What you shouldn’t do is to mix both ways of scaling, for one reason: If you scale up or down by using “kubectl scale”, this will not modify your cluster configuration file. There we still have five instances:
1 2 | minicube@micro-minicube:~> grep instances pg.yaml instances: 5 |
Our recommendation is, to do this only by modifying the configuration and re-apply afterwards. This ensures, that you always have the “reality” in the configuration file, and not a mix of live state and desired state.
In the next we’ll look into storage, because you want your databases to be persistent and fast.