As we’re getting more and more familiar with CloudNativePG, now it’s time to get more information about our cluster, either for monitoring or troubleshooting purposes. Getting information about the general state of our cluster can be easily done by using kubectl.

For listing the global state of our cluster, you can do:

minicube@micro-minicube:~> kubectl get cluster -A
NAMESPACE   NAME            AGE   INSTANCES   READY   STATUS                     PRIMARY
default     my-pg-cluster   41h   3           3       Cluster in healthy state   my-pg-cluster-1

As we’ve seen in the previous posts (here, here, here and here) kubectl can also be used to get information about the pods and services of the deployment:

minicube@micro-minicube:~> kubectl get pods
NAME              READY   STATUS    RESTARTS   AGE
my-pg-cluster-1   1/1     Running   0          108m
my-pg-cluster-2   1/1     Running   0          103m
my-pg-cluster-3   1/1     Running   0          103m
minicube@micro-minicube:~> kubectl get services
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes         ClusterIP   10.96.0.1        <none>        443/TCP    4d
my-pg-cluster-r    ClusterIP   10.111.113.4     <none>        5432/TCP   41h
my-pg-cluster-ro   ClusterIP   10.110.137.246   <none>        5432/TCP   41h
my-pg-cluster-rw   ClusterIP   10.100.77.15     <none>        5432/TCP   41h

What we cannot see easily with kubectl is information related to PostgreSQL itself. But as kubectl can be extended with plugins, CloudNativePG comes with a plugin for kubectl which is called “cnpg“. There are several installation methods available, we’ll go for the scripted version:

minicube@micro-minicube:~> curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sudo sh -s -- -b /usr/local/bin
cloudnative-pg/cloudnative-pg info checking GitHub for latest tag
cloudnative-pg/cloudnative-pg info found version: 1.23.1 for v1.23.1/linux/x86_64
cloudnative-pg/cloudnative-pg info installed /usr/local/bin/kubectl-cnpg

A very nice feature of this plugin is, that it comes with support for auto completion of the available commands, but this needs to be configured before you can use it. You can use the plugin itself to generate the completion script for one of the supported shells (bash in my case):

minicube@micro-minicube:~> kubectl cnpg completion
Generate the autocompletion script for kubectl-cnpg for the specified shell.
See each sub-command's help for details on how to use the generated script.

Usage:
  kubectl cnpg completion [command]

Available Commands:
  bash        Generate the autocompletion script for bash
  fish        Generate the autocompletion script for fish
  powershell  Generate the autocompletion script for powershell
  zsh         Generate the autocompletion script for zsh

...
minicube@micro-minicube:~> kubectl cnpg completion bash > kubectl_complete-cnpg
minicube@micro-minicube:~> chmod +x kubectl_complete-cnpg
minicube@micro-minicube:~> sudo mv kubectl_complete-cnpg /usr/local/bin/

From now, tab completion is working:

minicube@micro-minicube:~> kubectl-cnpg [TAB][TAB]
backup        (Request an on-demand backup for a PostgreSQL Cluster)
certificate   (Create a client certificate to connect to PostgreSQL using TLS and Certificate authentication)
completion    (Generate the autocompletion script for the specified shell)
destroy       (Destroy the instance named [cluster]-[node] or [node] with the associated PVC)
fencing       (Fencing related commands)
fio           (Creates a fio deployment, pvc and configmap)
help          (Help about any command)
hibernate     (Hibernation related commands)
install       (CNPG installation commands)
logs          (Collect cluster logs)
maintenance   (Sets or removes maintenance mode from clusters)
pgadmin4      (Creates a pgadmin deployment)
pgbench       (Creates a pgbench job)
promote       (Promote the pod named [cluster]-[node] or [node] to primary)
psql          (Start a psql session targeting a CloudNativePG cluster)
publication   (Logical publication management commands)
reload        (Reload the cluster)
report        (Report on the operator)
restart       (Restart a cluster or a single instance in a cluster)
snapshot      (command removed)
status        (Get the status of a PostgreSQL cluster)
subscription  (Logical subscription management commands)
version       (Prints version, commit sha and date of the build)

As you can see, quite a few commands are available, but for the scope of this post, we’ll only use the commands for getting logs and detailed information about our cluster. Obviously the “status” command should give us some global information about the cluster, and actually it will give us much more:

minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster
Cluster Summary
Name:                my-pg-cluster
Namespace:           default
System ID:           7378131726640287762
PostgreSQL Image:    ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance:    my-pg-cluster-1
Primary start time:  2024-06-08 13:59:26 +0000 UTC (uptime 42h49m23s)
Status:              Cluster in healthy state 
Instances:           3
Ready instances:     3
Current Write LSN:   0/1E000000 (Timeline: 1 - WAL File: 00000001000000000000000E)

Certificates Status
Certificate Name           Expiration Date                Days Left Until Expiration
----------------           ---------------                --------------------------
my-pg-cluster-ca           2024-09-06 13:54:17 +0000 UTC  88.21
my-pg-cluster-replication  2024-09-06 13:54:17 +0000 UTC  88.21
my-pg-cluster-server       2024-09-06 13:54:17 +0000 UTC  88.21

Continuous Backup status
Not configured

Physical backups
No running physical backups found

Streaming Replication status
Replication Slots Enabled
Name             Sent LSN    Write LSN   Flush LSN   Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot
----             --------    ---------   ---------   ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------
my-pg-cluster-2  0/1E000000  0/1E000000  0/1E000000  0/1E000000  00:00:00   00:00:00   00:00:00    streaming  async       0              active
my-pg-cluster-3  0/1E000000  0/1E000000  0/1E000000  0/1E000000  00:00:00   00:00:00   00:00:00    streaming  async       0              active

Unmanaged Replication Slot Status
No unmanaged replication slots found

Managed roles status
No roles managed

Tablespaces status
No managed tablespaces

Pod Disruption Budgets status
Name                   Role     Expected Pods  Current Healthy  Minimum Desired Healthy  Disruptions Allowed
----                   ----     -------------  ---------------  -----------------------  -------------------
my-pg-cluster          replica  2              2                1                        1
my-pg-cluster-primary  primary  1              1                1                        0

Instances status
Name             Database Size  Current LSN  Replication role  Status  QoS         Manager Version  Node
----             -------------  -----------  ----------------  ------  ---         ---------------  ----
my-pg-cluster-1  37 MB          0/1E000000   Primary           OK      BestEffort  1.23.1           minikube
my-pg-cluster-2  37 MB          0/1E000000   Standby (async)   OK      BestEffort  1.23.1           minikube
my-pg-cluster-3  37 MB          0/1E000000   Standby (async)   OK      BestEffort  1.23.1           minikube

This is quite some amount of information and tells us a lot about our cluster, including:

  • We have one primary node and two replicas in asynchronous replication (this comes from the three instances we specified in the cluster configuration)
  • All instances are healthy and there is no replication lag
  • The version of PostgreSQL is 16.2
  • The configuration is using replication slots
  • Information about the certificates used for encrypted traffic
  • We do not have configured any backups (this will be the topic of one of the next posts)

If you want too see even more information, including e.g. the configuration of PostgreSQL, pass the “–verbose” flag to the status command:

minicube@micro-minicube:~> kubectl-cnpg status my-pg-cluster --verbose
Cluster Summary
Name:                my-pg-cluster
Namespace:           default
System ID:           7378131726640287762
PostgreSQL Image:    ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance:    my-pg-cluster-1
Primary start time:  2024-06-08 13:59:26 +0000 UTC (uptime 42h57m30s)
Status:              Cluster in healthy state 
Instances:           3
Ready instances:     3
Current Write LSN:   0/20000110 (Timeline: 1 - WAL File: 000000010000000000000010)

PostgreSQL Configuration
archive_command = '/controller/manager wal-archive --log-destination /controller/log/postgres.json %p'
archive_mode = 'on'
archive_timeout = '5min'
cluster_name = 'my-pg-cluster'
dynamic_shared_memory_type = 'posix'
full_page_writes = 'on'
hot_standby = 'true'
listen_addresses = '*'
log_destination = 'csvlog'
log_directory = '/controller/log'
log_filename = 'postgres'
log_rotation_age = '0'
log_rotation_size = '0'
log_truncate_on_rotation = 'false'
logging_collector = 'on'
max_parallel_workers = '32'
max_replication_slots = '32'
max_worker_processes = '32'
pg_stat_statements.max = '2500'
port = '5432'
restart_after_crash = 'false'
shared_memory_type = 'mmap'
shared_preload_libraries = 'pg_stat_statements'
ssl = 'on'
ssl_ca_file = '/controller/certificates/client-ca.crt'
ssl_cert_file = '/controller/certificates/server.crt'
ssl_key_file = '/controller/certificates/server.key'
ssl_max_protocol_version = 'TLSv1.3'
ssl_min_protocol_version = 'TLSv1.3'
unix_socket_directories = '/controller/run'
wal_keep_size = '512MB'
wal_level = 'logical'
wal_log_hints = 'on'
wal_receiver_timeout = '5s'
wal_sender_timeout = '5s'
work_mem = '12MB'
cnpg.config_sha256 = 'db8a255b574978eb43a479ec688a1e8e72281ec3fa03b59bcb3cf3bf9b997e67'

PostgreSQL HBA Rules

#
# FIXED RULES
#

# Grant local access ('local' user map)
local all all peer map=local

# Require client certificate authentication for the streaming_replica user
hostssl postgres streaming_replica all cert
hostssl replication streaming_replica all cert
hostssl all cnpg_pooler_pgbouncer all cert

#
# USER-DEFINED RULES
#


host all all 192.168.122.0/24 scram-sha-256



#
# DEFAULT RULES
#
host all all all scram-sha-256


Certificates Status
Certificate Name           Expiration Date                Days Left Until Expiration
----------------           ---------------                --------------------------
my-pg-cluster-ca           2024-09-06 13:54:17 +0000 UTC  88.21
my-pg-cluster-replication  2024-09-06 13:54:17 +0000 UTC  88.21
my-pg-cluster-server       2024-09-06 13:54:17 +0000 UTC  88.21

Continuous Backup status
Not configured

Physical backups
No running physical backups found

Streaming Replication status
Replication Slots Enabled
Name             Sent LSN    Write LSN   Flush LSN   Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot  Slot Restart LSN  Slot WAL Status  Slot Safe WAL Size
----             --------    ---------   ---------   ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------  ----------------  ---------------  ------------------
my-pg-cluster-2  0/20000110  0/20000110  0/20000110  0/20000110  00:00:00   00:00:00   00:00:00    streaming  async       0              active            0/20000110        reserved         NULL
my-pg-cluster-3  0/20000110  0/20000110  0/20000110  0/20000110  00:00:00   00:00:00   00:00:00    streaming  async       0              active            0/20000110        reserved         NULL

Unmanaged Replication Slot Status
No unmanaged replication slots found

Managed roles status
No roles managed

Tablespaces status
No managed tablespaces

Pod Disruption Budgets status
Name                   Role     Expected Pods  Current Healthy  Minimum Desired Healthy  Disruptions Allowed
----                   ----     -------------  ---------------  -----------------------  -------------------
my-pg-cluster          replica  2              2                1                        1
my-pg-cluster-primary  primary  1              1                1                        0

Instances status
Name             Database Size  Current LSN  Replication role  Status  QoS         Manager Version  Node
----             -------------  -----------  ----------------  ------  ---         ---------------  ----
my-pg-cluster-1  37 MB          0/20000110   Primary           OK      BestEffort  1.23.1           minikube
my-pg-cluster-2  37 MB          0/20000110   Standby (async)   OK      BestEffort  1.23.1           minikube
my-pg-cluster-3  37 MB          0/20000110   Standby (async)   OK      BestEffort  1.23.1           minikube

The other important command when it comes to troubleshooting is the “logs” command (the “-f” is for tail):

minicube@micro-minicube:~> kubectl-cnpg logs cluster my-pg-cluster -f
...
{"level":"info","ts":"2024-06-10T08:51:59Z","logger":"wal-archive","msg":"Backup not configured, skip WAL archiving via Barman Cloud","logging_pod":"my-pg-cluster-1","walName":"pg_wal/00000001000000000000000F","currentPrimary":"my-pg-cluster-1","targetPrimary":"my-pg-cluster-1"}
{"level":"info","ts":"2024-06-10T08:52:00Z","logger":"postgres","msg":"record","logging_pod":"my-pg-cluster-1","record":{"log_time":"2024-06-10 08:52:00.121 UTC","process_id":"1289","session_id":"66669223.509","session_line_num":"4","session_start_time":"2024-06-10 05:41:55 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"checkpoint complete: wrote 10 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.005 s, sync=0.006 s, total=1.111 s; sync files=5, longest=0.002 s, average=0.002 s; distance=64233 kB, estimate=64233 kB; lsn=0/20000060, redo lsn=0/1E006030","backend_type":"checkpointer","query_id":"0"}}
{"level":"info","ts":"2024-06-10T08:56:59Z","logger":"wal-archive","msg":"Backup not configured, skip WAL archiving via Barman Cloud","logging_pod":"my-pg-cluster-1","walName":"pg_wal/000000010000000000000010","currentPrimary":"my-pg-cluster-1","targetPrimary":"my-pg-cluster-1"}

This gives you the PostgreSQL as well as the operator logs. Both, the “status” and the “logs” command are essential for troubleshooting.

In the next post we’ll look at scaling the cluster up and down.