In the first post about incremental backups we’ve looked at how you can do incremental backups in general starting with PostgreSQL 17, probably, if it does not get reverted for some reason until PostgreSQL 17 is released. In the second post about incremental backups we’ve looked at backup chains and the choices you have for combining backups to create a starting point for new incremental backups. In this post we’ll look at the new system information functions which have been added with this feature.

We’ll start with a complete new cluster having “summarize_wal” enabled:

postgres=# \dconfig *summa*
List of configuration parameters
       Parameter       | Value 
-----------------------+-------
 summarize_wal         | on
 wal_summary_keep_time | 10d
(2 rows)

WAL summaries are written to pg_wal/summaries and of course we do not see anything there right now as we did not do anything with this cluster:

postgres=# \! ls -la $PGDATA/pg_wal/
total 16400
drwx------  4 postgres postgres     4096 Jan 18 10:51 .
drwx------ 20 postgres postgres     4096 Jan 18 10:51 ..
-rw-------  1 postgres postgres 16777216 Jan 18 10:53 000000010000000000000001
drwx------  2 postgres postgres     4096 Jan 18 10:51 archive_status
drwx------  2 postgres postgres     4096 Jan 18 10:51 summaries
postgres=# \! ls -la $PGDATA/pg_wal/summaries
total 44
drwx------ 2 postgres postgres 4096 Jan 18 10:51 .
drwx------ 4 postgres postgres 4096 Jan 18 10:51 ..
-rw------- 1 postgres postgres 4808 Jan 18 10:51 00000001000000000100002800000000010B3098.summary
-rw------- 1 postgres postgres 4944 Jan 18 10:51 0000000100000000010B309800000000014E8838.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014E883800000000014E8938.summary
-rw------- 1 postgres postgres  230 Jan 18 10:51 0000000100000000014E893800000000014EF360.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014EF36000000000014EF460.summary
-rw------- 1 postgres postgres   88 Jan 18 10:51 0000000100000000014EF46000000000014EF798.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014EF79800000000014EF810.summary

Oh, this is not true at all. We do see quite some files there even though nothing happened from the user perspective. Well, this is the perfect use case to start with the three system information functions which have been added with this feature. Let’s start with “pg_available_wal_summaries ()”, which gives us the same picture than the file system but comes with additional information:

postgres=# select * from pg_available_wal_summaries ();
 tli | start_lsn |  end_lsn  
-----+-----------+-----------
   1 | 0/14EF360 | 0/14EF460
   1 | 0/14E8938 | 0/14EF360
   1 | 0/1000028 | 0/10B3098
   1 | 0/14EF460 | 0/14EF798
   1 | 0/14E8838 | 0/14E8938
   1 | 0/10B3098 | 0/14E8838
   1 | 0/14EF798 | 0/14EF810
(7 rows)

This looks a bit strange as we do not see any connection from one row to the next row. In other words: Isn’t it expected that this describes a chain? The “end_lsn” of one row should be the “start_lsn” of the next row? Let’s sort by “start_lsn”, as we are on the same timeline (tli) this gives us the result which is expected:

postgres=# select * from pg_available_wal_summaries () order by start_lsn;
 tli | start_lsn |  end_lsn  
-----+-----------+-----------
   1 | 0/1000028 | 0/10B3098
   1 | 0/10B3098 | 0/14E8838
   1 | 0/14E8838 | 0/14E8938
   1 | 0/14E8938 | 0/14EF360
   1 | 0/14EF360 | 0/14EF460
   1 | 0/14EF460 | 0/14EF798
   1 | 0/14EF798 | 0/14EF810
(7 rows)

Now we have the same picture as on the file system, e.g. the “end_lsn” of the first row is the “start_lsn” of the next row and this continues to the end of the summaries:

postgres=# \! ls -la $PGDATA/pg_wal/summaries
total 44
drwx------ 2 postgres postgres 4096 Jan 18 10:51 .
drwx------ 4 postgres postgres 4096 Jan 18 10:51 ..
-rw------- 1 postgres postgres 4808 Jan 18 10:51 00000001000000000100002800000000010B3098.summary
-rw------- 1 postgres postgres 4944 Jan 18 10:51 0000000100000000010B309800000000014E8838.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014E883800000000014E8938.summary
-rw------- 1 postgres postgres  230 Jan 18 10:51 0000000100000000014E893800000000014EF360.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014EF36000000000014EF460.summary
-rw------- 1 postgres postgres   88 Jan 18 10:51 0000000100000000014EF46000000000014EF798.summary
-rw------- 1 postgres postgres   32 Jan 18 10:51 0000000100000000014EF79800000000014EF810.summary

This is also how the file names on disk are made up: First you have the beginning and ending timelime, after the beginning and ending lsn (Log Sequence Number).

The next function we can have a look at it pg_wal_summary_contents ( tli bigint, start_lsn pg_lsn, end_lsn pg_lsn ). This function will give us some insights into what is in a summary, e.g. using the first entry from above:

postgres=# select * from pg_wal_summary_contents ( 1, '0/1000028', '0/10B3098' ) ;
 relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | is_limit_block 
-------------+---------------+-------------+---------------+----------------+----------------
         112 |          1663 |           1 |             0 |              0 | f
         113 |          1663 |           1 |             0 |              0 | f
         174 |          1663 |           1 |             0 |              0 | f
         175 |          1663 |           1 |             0 |              0 | f
         548 |          1663 |           1 |             0 |              0 | f
         549 |          1663 |           1 |             0 |              0 | f
...

What does this mean? First of all this is about the template1 database:

postgres=# select oid,datname from pg_database;
 oid |  datname  
-----+-----------
   5 | postgres
   1 | template1
   4 | template0
(3 rows)

The tablespace is pg_default:

postgres=# select oid,spcname from pg_tablespace ;
 oid  |  spcname   
------+------------
 1663 | pg_default
 1664 | pg_global
(2 rows)

The “relfinode” column describes the file on disk, e.g.:

postgres=# \! ls -la $PGDATA/base/1/112*
-rw------- 1 postgres postgres 8192 Jan 18 10:51 /u02/pgdata/PGDEV/base/1/112

“relblocknumber” mentions the block which has changed. “is_limit_block” means “Each row with is_limit_block false indicates that the block identified by the remaining output columns was modified by at least one WAL record within the range of records summarized by this file. Each row with is_limit_block true indicates either that (a) the relation fork was truncated to the length given by relblocknumber within the relevant range of WAL records or (b) that the relation fork was created or dropped within the relevant range of WAL records; in such cases, relblocknumber will be zero.”

Lets do a simple test to see what this really means. We create a simple table with one row:

postgres=# create table t ( a int );
CREATE TABLE
postgres=# insert into t values (1);
INSERT 0 1
postgres=# select pg_relation_filepath('t');
 pg_relation_filepath 
----------------------
 base/5/16388
(1 row)

For this we should see a new entry in the output of pg_wal_summary_contents if we ask for the last summary file:

postgres=# select * from pg_wal_summary_contents ( 1, '0/14EF810', '0/1557C00' ) where relfilenode = 16388;
 relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | is_limit_block 
-------------+---------------+-------------+---------------+----------------+----------------
(0 rows)

We don’t see anything and the reason is that a new summary file has not yet been written, lets force this and check again:

postgres=# checkpoint;
CHECKPOINT
postgres=# select * from pg_available_wal_summaries () order by start_lsn;
 tli | start_lsn |  end_lsn  
-----+-----------+-----------
   1 | 0/1000028 | 0/10B3098
   1 | 0/10B3098 | 0/14E8838
   1 | 0/14E8838 | 0/14E8938
   1 | 0/14E8938 | 0/14EF360
   1 | 0/14EF360 | 0/14EF460
   1 | 0/14EF460 | 0/14EF798
   1 | 0/14EF798 | 0/14EF810
   1 | 0/14EF810 | 0/1557C00
   1 | 0/1557C00 | 0/15598C8
(9 rows)

postgres=# select * from pg_wal_summary_contents ( 1, '0/1557C00', '0/15598C8' ) where relfilenode = 16388;
 relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | is_limit_block 
-------------+---------------+-------------+---------------+----------------+----------------
       16388 |          1663 |           5 |             0 |              0 | f
(1 row)

Lets add some more data and see how this is recorded in the wal summaries:

postgres=# insert into t select * from generate_series(1,1000000);
INSERT 0 1000000
postgres=# checkpoint;
CHECKPOINT
postgres=# select * from pg_available_wal_summaries () order by start_lsn;
 tli | start_lsn |  end_lsn  
-----+-----------+-----------
   1 | 0/1000028 | 0/10B3098
   1 | 0/10B3098 | 0/14E8838
   1 | 0/14E8838 | 0/14E8938
   1 | 0/14E8938 | 0/14EF360
   1 | 0/14EF360 | 0/14EF460
   1 | 0/14EF460 | 0/14EF798
   1 | 0/14EF798 | 0/14EF810
   1 | 0/14EF810 | 0/1557C00
   1 | 0/1557C00 | 0/15598C8
   1 | 0/15598C8 | 0/5290968
(10 rows)

postgres=# select * from pg_wal_summary_contents ( 1, '0/15598C8', '0/5290968' ) where relfilenode = 16388;
 relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | is_limit_block 
-------------+---------------+-------------+---------------+----------------+----------------
       16388 |          1663 |           5 |             0 |              0 | f
       16388 |          1663 |           5 |             0 |              1 | f
       16388 |          1663 |           5 |             0 |              2 | f
       16388 |          1663 |           5 |             0 |              3 | f
       16388 |          1663 |           5 |             0 |              4 | f
       16388 |          1663 |           5 |             0 |              5 | f
       16388 |          1663 |           5 |             0 |              6 | f
...
       16388 |          1663 |           5 |             0 |           4420 | f
       16388 |          1663 |           5 |             0 |           4421 | f
       16388 |          1663 |           5 |             0 |           4422 | f
       16388 |          1663 |           5 |             0 |           4423 | f
       16388 |          1663 |           5 |             0 |           4424 | f
(4425 rows)

Does this mean we’ve modified 4424 blocks in that table? Yes, it does:

postgres=# select pg_relation_filepath('t');
 pg_relation_filepath 
----------------------
 base/5/16388
(1 row)
postgres=# \! ls -la $PGDATA/base/5/16388*
-rw------- 1 postgres postgres 36249600 Jan 18 11:23 /u02/pgdata/PGDEV/base/5/16388
-rw------- 1 postgres postgres    32768 Jan 18 11:23 /u02/pgdata/PGDEV/base/5/16388_fsm
-rw------- 1 postgres postgres     8192 Jan 18 11:23 /u02/pgdata/PGDEV/base/5/16388_vm
postgres=# select 36249600/8192;
 ?column? 
----------
     4425
(1 row)

An incremental backup would need to contain all those blocks as they’ve changed since the last full backup.

The last function gives the state of the summarizer process:

postgres=# select * from pg_get_wal_summarizer_state () ;
 summarized_tli | summarized_lsn | pending_lsn | summarizer_pid 
----------------+----------------+-------------+----------------
              1 | 0/77DC3C0      | 0/77DE570   |           1638
(1 row)

Pretty straight forward: The timeline, the last LSN up to which summaries have been written and pending_lsn means the last lsn the summarizer has consumed.