When it comes to backing up and and restoring a PostgreSQL cluster, most of our customers rely either on pgBackRest or Barman. The reason for this is pretty clear: Both of those tools provide you with a backup repository, full and incremental backups, tons of other features and both automate most of the tasks you expect from a backup solution. When it comes to Community PostgreSQL all can do up to PostgreSQL 16 is taking full backups and combine that with archiving of the WAL segments to get point in time recovery. While this might be sufficient for smaller installations this becomes a real pain when your data set gets larger or you have plenty of cluster to manage. No speaking about all the scripting you need around that.

At least for incremental backups this will most probably change with PostgreSQL 17, as Robert Haas committed this yesterday:

Add support for incremental backup.

To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Wow, I’ve waited for this for a very long time and I really appreciate all the work which went into this. So, before looking at how this works: Thank you Robert, thank you Matthias, Dilip, Jakub, Peter, Álvaro, thanks to all the others involved in this. Great job.

Don’t expect any internals in this post, as it will be long enough anyway. The scope here is just to give you a starting point to play around with this. More details on how this works internally will follow in another post.

Before we have something to backup we obviously need some data, and if you want play with this you obviously need to be on the current development branch of PostgreSQL (We’ll use pgbench to generate the sample schema):

postgres=# select version();
                              version                               
--------------------------------------------------------------------
 PostgreSQL 17devel on x86_64-linux, compiled by gcc-12.2.0, 64-bit
(1 row)
postgres=# \! pgbench -i -s 10 d
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...                                                                                
creating primary keys...
done in 1.14 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 0.85 s, vacuum 0.06 s, primary keys 0.22 s).
postgres=# \q

Creating the base backup from this cluster is in no way different than before PostgreSQL 17. We need an empty directory and can store the backup there:

postgres@debian12-pg:/home/postgres/ [pgdev] mkdir backups
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup -D backups/

Now we have our first full/base backup. Let’s create some additional data in our sample:

postgres@debian12-pg:/home/postgres/ [pgdev] psql -c "create table t as select * from pgbench_accounts" d
SELECT 1000000

Creating an incremental backup should only contain the additional table and data we’ve just created above (non relation files will always be included in an incremental backup). Again, we need an empty directory for our incremental backup and once we have that an incremental backup can be taken by asking pg_basebackup to do so based on the backup manifest of the full backup:

postgres@debian12-pg:/home/postgres/ [pgdev] mkdir backups_incr1
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup --incremental=backups/backup_manifest -D backups_incr1/
pg_basebackup: error: could not initiate base backup: ERROR:  incremental backups cannot be taken unless WAL summarization is enabled
pg_basebackup: removing contents of data directory "backups_incr1/"

According to the error message there is something new which is called “WAL summarization”. There are two new parameters which came with the incremental backup feature, both described here:

postgres@debian12-pg:/home/postgres/ [pgdev] psql -c "\dconfig *summa*"
List of configuration parameters
       Parameter       | Value 
-----------------------+-------
 summarize_wal         | off
 wal_summary_keep_time | 10d
(2 rows)

WAL summarization allows the server to detect which blocks have changed since the last full/base backup. The details are described in here.

Let’s enable that and try again:

postgres@debian12-pg:/home/postgres/ [pgdev] psql -c "alter system set summarize_wal='on'"
ALTER SYSTEM
postgres@debian12-pg:/home/postgres/ [pgdev] pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2023-12-21 09:49:53.092 CET - 1 - 2100 -  - @ - 0LOG:  redirecting log output to logging collector process
2023-12-21 09:49:53.092 CET - 2 - 2100 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup --incremental=backups/backup_manifest -D backups_incr1/
WARNING:  aborting backup due to backend exiting before pg_backup_stop was called
pg_basebackup: error: could not initiate base backup: ERROR:  WAL summaries are required on timeline 1 from 0/A000028 to 0/16000028, but the summaries for that timeline and LSN range are incomplete
DETAIL:  The first unsummarized LSN is this range is 0/A000028.
pg_basebackup: removing contents of data directory "backups_incr1/"

This fails again, and if you read the error message this translates to: As we’ve enabled the WAL summarization after we did the first full/base backup this cannot work. Staring from scratch by doing a fresh full backup, generating another set of data and then doing an incremental backup afterwards:

postgres@debian12-pg:/home/postgres/ [pgdev] rm -rf backups/*
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup -D backups/
postgres@debian12-pg:/home/postgres/ [pgdev] psql -c "create table t2 as select * from pgbench_accounts" d
SELECT 1000000
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup --incremental=backups/backup_manifest -D backups_incr1/

We’ve successfully created our first incremental backup. Let’s compare the size of the full against the incremental backup:

postgres@debian12-pg:/home/postgres/ [pgdev] du -sh backups
325M backups
postgres@debian12-pg:/home/postgres/ [pgdev] du -sh backups_incr1/
281M backups_incr1/

This is quite an improvement compared to the full backup. A next incremental backup can either be based on a full/base backup or an incremental backup. We’ll use the incremental backup we just did as the starting point for the next incremental backup:

postgres@debian12-pg:/home/postgres/ [pgdev] mkdir backups_incr2
postgres@debian12-pg:/home/postgres/ [pgdev] pg_basebackup --incremental=backups_incr1/backup_manifest -D backups_incr2/
postgres@debian12-pg:/home/postgres/ [pgdev] du -sh backups_incr2/
153M    backups_incr2/

So far for the backups. How can we restore that? In case of a full/base backup the procedure is simple: Copy back the full/base backup, restore the archived WAL segments (if you want to roll forward), ask PostgreSQL to get into recovery mode when it is starting up (if you want to roll forward) and you’re done. In case of incremental backups there needs to be another procedure to follow and this is where pg_combinebackup comes into the game. As the name implies, pg_combineback will go through the chain of the full/base backup and all the incrementals you give it and construct a synthetic full backup.

If we want to restore all what we have now, starting from the first full backup and the two incremental backups we’ve created afterwards, how can we archive this? The procedure is pretty simple: What pg_combinebackup needs is, all the backups to combine, meaning: Start with your full backup and then provide all the incrementals you want to have included in the final synthetic full backup, finally specify the output directory you want to have the result written to:

postgres@debian12-pg:/home/postgres/ [pgdev] pg_combinebackup backups/ backups_incr1/ backups_incr2/ -o /var/tmp/restore
postgres@debian12-pg:/home/postgres/ [pgdev] ls /var/tmp/restore
backup_label      global        pg_ident.conf  pg_notify     pg_stat      pg_twophase  postgresql.auto.conf
backup_manifest   pg_commit_ts  pg_log         pg_replslot   pg_stat_tmp  PG_VERSION   postgresql.conf
base              pg_dynshmem   pg_logical     pg_serial     pg_subtrans  pg_wal
current_logfiles  pg_hba.conf   pg_multixact   pg_snapshots  pg_tblspc    pg_xact

The result is a consistent full/base backup and can be started:

postgres@debian12-pg:/home/postgres/ [pgdev] chmod 700 /var/tmp/restore/
postgres@debian12-pg:/home/postgres/ [pgdev] echo "port=9999" >> /var/tmp/restore/postgresql.auto.conf
postgres@debian12-pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/restore/ start
waiting for server to start....2023-12-21 10:23:31.599 CET - 1 - 2449 -  - @ - 0LOG:  redirecting log output to logging collector process
2023-12-21 10:23:31.599 CET - 2 - 2449 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres@debian12-pg:/home/postgres/ [pgdev] psql -p 9999 -c "select count(*) from t2" d
  count  
---------
 1000000
(1 row)

Really cool. That’s it for the first post about this new feature of PostgreSQL 17.