As we’re getting more and more familiar with CloudNativePG, now it’s time to get more information about our cluster, either for monitoring or troubleshooting purposes. Getting information about the general state of our cluster can be easily done by using kubectl.
For listing the global state of our cluster, you can do:
minicube@micro-minicube:~> kubectl get cluster -A
NAMESPACE NAME AGE INSTANCES READY STATUS PRIMARY
default my-pg-cluster 41h 3 3 Cluster in healthy state my-pg-cluster-1
As we’ve seen in the previous posts (here, here, here and here) kubectl can also be used to get information about the pods and services of the deployment:
minicube@micro-minicube:~> kubectl get pods
NAME READY STATUS RESTARTS AGE
my-pg-cluster-1 1/1 Running 0 108m
my-pg-cluster-2 1/1 Running 0 103m
my-pg-cluster-3 1/1 Running 0 103m
minicube@micro-minicube:~> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d
my-pg-cluster-r ClusterIP 10.111.113.4 <none> 5432/TCP 41h
my-pg-cluster-ro ClusterIP 10.110.137.246 <none> 5432/TCP 41h
my-pg-cluster-rw ClusterIP 10.100.77.15 <none> 5432/TCP 41h
What we cannot see easily with kubectl is information related to PostgreSQL itself. But as kubectl can be extended with plugins, CloudNativePG comes with a plugin for kubectl which is called “cnpg“. There are several installation methods available, we’ll go for the scripted version:
minicube@micro-minicube:~> curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sudo sh -s -- -b /usr/local/bin
cloudnative-pg/cloudnative-pg info checking GitHub for latest tag
cloudnative-pg/cloudnative-pg info found version: 1.23.1 for v1.23.1/linux/x86_64
cloudnative-pg/cloudnative-pg info installed /usr/local/bin/kubectl-cnpg
A very nice feature of this plugin is, that it comes with support for auto completion of the available commands, but this needs to be configured before you can use it. You can use the plugin itself to generate the completion script for one of the supported shells (bash in my case):
minicube@micro-minicube:~> kubectl cnpg completion
Generate the autocompletion script for kubectl-cnpg for the specified shell.
See each sub-command's help for details on how to use the generated script.
Usage:
kubectl cnpg completion [command]
Available Commands:
bash Generate the autocompletion script for bash
fish Generate the autocompletion script for fish
powershell Generate the autocompletion script for powershell
zsh Generate the autocompletion script for zsh
...
minicube@micro-minicube:~> kubectl cnpg completion bash > kubectl_complete-cnpg
minicube@micro-minicube:~> chmod +x kubectl_complete-cnpg
minicube@micro-minicube:~> sudo mv kubectl_complete-cnpg /usr/local/bin/
From now, tab completion is working:
minicube@micro-minicube:~> kubectl-cnpg [TAB][TAB]
backup (Request an on-demand backup for a PostgreSQL Cluster)
certificate (Create a client certificate to connect to PostgreSQL using TLS and Certificate authentication)
completion (Generate the autocompletion script for the specified shell)
destroy (Destroy the instance named [cluster]-[node] or [node] with the associated PVC)
fencing (Fencing related commands)
fio (Creates a fio deployment, pvc and configmap)
help (Help about any command)
hibernate (Hibernation related commands)
install (CNPG installation commands)
logs (Collect cluster logs)
maintenance (Sets or removes maintenance mode from clusters)
pgadmin4 (Creates a pgadmin deployment)
pgbench (Creates a pgbench job)
promote (Promote the pod named [cluster]-[node] or [node] to primary)
psql (Start a psql session targeting a CloudNativePG cluster)
publication (Logical publication management commands)
reload (Reload the cluster)
report (Report on the operator)
restart (Restart a cluster or a single instance in a cluster)
snapshot (command removed)
status (Get the status of a PostgreSQL cluster)
subscription (Logical subscription management commands)
version (Prints version, commit sha and date of the build)
As you can see, quite a few commands are available, but for the scope of this post, we’ll only use the commands for getting logs and detailed information about our cluster. Obviously the “status” command should give us some global information about the cluster, and actually it will give us much more:
minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster
Cluster Summary
Name: my-pg-cluster
Namespace: default
System ID: 7378131726640287762
PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance: my-pg-cluster-1
Primary start time: 2024-06-08 13:59:26 +0000 UTC (uptime 42h49m23s)
Status: Cluster in healthy state
Instances: 3
Ready instances: 3
Current Write LSN: 0/1E000000 (Timeline: 1 - WAL File: 00000001000000000000000E)
Certificates Status
Certificate Name Expiration Date Days Left Until Expiration
---------------- --------------- --------------------------
my-pg-cluster-ca 2024-09-06 13:54:17 +0000 UTC 88.21
my-pg-cluster-replication 2024-09-06 13:54:17 +0000 UTC 88.21
my-pg-cluster-server 2024-09-06 13:54:17 +0000 UTC 88.21
Continuous Backup status
Not configured
Physical backups
No running physical backups found
Streaming Replication status
Replication Slots Enabled
Name Sent LSN Write LSN Flush LSN Replay LSN Write Lag Flush Lag Replay Lag State Sync State Sync Priority Replication Slot
---- -------- --------- --------- ---------- --------- --------- ---------- ----- ---------- ------------- ----------------
my-pg-cluster-2 0/1E000000 0/1E000000 0/1E000000 0/1E000000 00:00:00 00:00:00 00:00:00 streaming async 0 active
my-pg-cluster-3 0/1E000000 0/1E000000 0/1E000000 0/1E000000 00:00:00 00:00:00 00:00:00 streaming async 0 active
Unmanaged Replication Slot Status
No unmanaged replication slots found
Managed roles status
No roles managed
Tablespaces status
No managed tablespaces
Pod Disruption Budgets status
Name Role Expected Pods Current Healthy Minimum Desired Healthy Disruptions Allowed
---- ---- ------------- --------------- ----------------------- -------------------
my-pg-cluster replica 2 2 1 1
my-pg-cluster-primary primary 1 1 1 0
Instances status
Name Database Size Current LSN Replication role Status QoS Manager Version Node
---- ------------- ----------- ---------------- ------ --- --------------- ----
my-pg-cluster-1 37 MB 0/1E000000 Primary OK BestEffort 1.23.1 minikube
my-pg-cluster-2 37 MB 0/1E000000 Standby (async) OK BestEffort 1.23.1 minikube
my-pg-cluster-3 37 MB 0/1E000000 Standby (async) OK BestEffort 1.23.1 minikube
This is quite some amount of information and tells us a lot about our cluster, including:
- We have one primary node and two replicas in asynchronous replication (this comes from the three instances we specified in the cluster configuration)
- All instances are healthy and there is no replication lag
- The version of PostgreSQL is 16.2
- The configuration is using replication slots
- Information about the certificates used for encrypted traffic
- We do not have configured any backups (this will be the topic of one of the next posts)
If you want too see even more information, including e.g. the configuration of PostgreSQL, pass the “–verbose” flag to the status command:
minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster --verbose
Cluster Summary
Name: my-pg-cluster
Namespace: default
System ID: 7378131726640287762
PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance: my-pg-cluster-1
Primary start time: 2024-06-08 13:59:26 +0000 UTC (uptime 42h57m30s)
Status: Cluster in healthy state
Instances: 3
Ready instances: 3
Current Write LSN: 0/20000110 (Timeline: 1 - WAL File: 000000010000000000000010)
PostgreSQL Configuration
archive_command = '/controller/manager wal-archive --log-destination /controller/log/postgres.json %p'
archive_mode = 'on'
archive_timeout = '5min'
cluster_name = 'my-pg-cluster'
dynamic_shared_memory_type = 'posix'
full_page_writes = 'on'
hot_standby = 'true'
listen_addresses = '*'
log_destination = 'csvlog'
log_directory = '/controller/log'
log_filename = 'postgres'
log_rotation_age = '0'
log_rotation_size = '0'
log_truncate_on_rotation = 'false'
logging_collector = 'on'
max_parallel_workers = '32'
max_replication_slots = '32'
max_worker_processes = '32'
pg_stat_statements.max = '2500'
port = '5432'
restart_after_crash = 'false'
shared_memory_type = 'mmap'
shared_preload_libraries = 'pg_stat_statements'
ssl = 'on'
ssl_ca_file = '/controller/certificates/client-ca.crt'
ssl_cert_file = '/controller/certificates/server.crt'
ssl_key_file = '/controller/certificates/server.key'
ssl_max_protocol_version = 'TLSv1.3'
ssl_min_protocol_version = 'TLSv1.3'
unix_socket_directories = '/controller/run'
wal_keep_size = '512MB'
wal_level = 'logical'
wal_log_hints = 'on'
wal_receiver_timeout = '5s'
wal_sender_timeout = '5s'
work_mem = '12MB'
cnpg.config_sha256 = 'db8a255b574978eb43a479ec688a1e8e72281ec3fa03b59bcb3cf3bf9b997e67'
PostgreSQL HBA Rules
#
# FIXED RULES
#
# Grant local access ('local' user map)
local all all peer map=local
# Require client certificate authentication for the streaming_replica user
hostssl postgres streaming_replica all cert
hostssl replication streaming_replica all cert
hostssl all cnpg_pooler_pgbouncer all cert
#
# USER-DEFINED RULES
#
host all all 192.168.122.0/24 scram-sha-256
#
# DEFAULT RULES
#
host all all all scram-sha-256
Certificates Status
Certificate Name Expiration Date Days Left Until Expiration
---------------- --------------- --------------------------
my-pg-cluster-ca 2024-09-06 13:54:17 +0000 UTC 88.21
my-pg-cluster-replication 2024-09-06 13:54:17 +0000 UTC 88.21
my-pg-cluster-server 2024-09-06 13:54:17 +0000 UTC 88.21
Continuous Backup status
Not configured
Physical backups
No running physical backups found
Streaming Replication status
Replication Slots Enabled
Name Sent LSN Write LSN Flush LSN Replay LSN Write Lag Flush Lag Replay Lag State Sync State Sync Priority Replication Slot Slot Restart LSN Slot WAL Status Slot Safe WAL Size
---- -------- --------- --------- ---------- --------- --------- ---------- ----- ---------- ------------- ---------------- ---------------- --------------- ------------------
my-pg-cluster-2 0/20000110 0/20000110 0/20000110 0/20000110 00:00:00 00:00:00 00:00:00 streaming async 0 active 0/20000110 reserved NULL
my-pg-cluster-3 0/20000110 0/20000110 0/20000110 0/20000110 00:00:00 00:00:00 00:00:00 streaming async 0 active 0/20000110 reserved NULL
Unmanaged Replication Slot Status
No unmanaged replication slots found
Managed roles status
No roles managed
Tablespaces status
No managed tablespaces
Pod Disruption Budgets status
Name Role Expected Pods Current Healthy Minimum Desired Healthy Disruptions Allowed
---- ---- ------------- --------------- ----------------------- -------------------
my-pg-cluster replica 2 2 1 1
my-pg-cluster-primary primary 1 1 1 0
Instances status
Name Database Size Current LSN Replication role Status QoS Manager Version Node
---- ------------- ----------- ---------------- ------ --- --------------- ----
my-pg-cluster-1 37 MB 0/20000110 Primary OK BestEffort 1.23.1 minikube
my-pg-cluster-2 37 MB 0/20000110 Standby (async) OK BestEffort 1.23.1 minikube
my-pg-cluster-3 37 MB 0/20000110 Standby (async) OK BestEffort 1.23.1 minikube
The other important command when it comes to troubleshooting is the “logs” command (the “-f” is for tail):
minicube@micro-minicube:~> kubectl-cnpg logs cluster my-pg-cluster -f
...
{"level":"info","ts":"2024-06-10T08:51:59Z","logger":"wal-archive","msg":"Backup not configured, skip WAL archiving via Barman Cloud","logging_pod":"my-pg-cluster-1","walName":"pg_wal/00000001000000000000000F","currentPrimary":"my-pg-cluster-1","targetPrimary":"my-pg-cluster-1"}
{"level":"info","ts":"2024-06-10T08:52:00Z","logger":"postgres","msg":"record","logging_pod":"my-pg-cluster-1","record":{"log_time":"2024-06-10 08:52:00.121 UTC","process_id":"1289","session_id":"66669223.509","session_line_num":"4","session_start_time":"2024-06-10 05:41:55 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"checkpoint complete: wrote 10 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.005 s, sync=0.006 s, total=1.111 s; sync files=5, longest=0.002 s, average=0.002 s; distance=64233 kB, estimate=64233 kB; lsn=0/20000060, redo lsn=0/1E006030","backend_type":"checkpointer","query_id":"0"}}
{"level":"info","ts":"2024-06-10T08:56:59Z","logger":"wal-archive","msg":"Backup not configured, skip WAL archiving via Barman Cloud","logging_pod":"my-pg-cluster-1","walName":"pg_wal/000000010000000000000010","currentPrimary":"my-pg-cluster-1","targetPrimary":"my-pg-cluster-1"}
This gives you the PostgreSQL as well as the operator logs. Both, the “status” and the “logs” command are essential for troubleshooting.
In the next post we’ll look at scaling the cluster up and down.
Gabriele
13.06.2024Thanks Daniel for this article and the series you're writing on CNPG. I thought you (or the readers at least) might be happy to know that now CNPG is also available on Homebrew.
Daniel Westermann
14.06.2024Thank you, Gabriele, see you next week
Cheers,
Daniel