Currently a lot of stuff is being committed for PostgreSQL and what we will look at in this post is a feature, I am sure, a lot of PostgreSQL users have been waiting for for a long time: Finally there is a native way to validate your base backups: pg_validatebackup. This is a new binary that can be used to validate base backups against a backup manifest, that is written automatically when you do backup using pg_basebackup. Lets see how that works.
When you do a base backup without any specific flags in PostgreSQL 13 there will be a new file in the directory that holds the backup:
1 2 3 4 | postgres@centos8pg: /home/postgres/ [pgdev] mkdir /var/tmp/backup postgres@centos8pg: /home/postgres/ [pgdev] pg_basebackup -D /var/tmp/backup/ postgres@centos8pg: /home/postgres/ [pgdev] ls /var/tmp/backup/backup_manifest /var/tmp/backup/backup_manifest |
This is the so called backup manifest and when you have a look at it, you’ll notice that it is a simple json file:
1 2 3 4 5 6 7 8 9 10 11 | postgres@centos8pg: /home/postgres/ [pgdev] head -n 10 /var/tmp/backup/backup_manifest { "PostgreSQL-Backup-Manifest-Version" : 1, "Files" : [ { "Path" : "backup_label" , "Size" : 225, "Last-Modified" : "2020-04-03 19:51:48 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "3cbc1336" }, { "Path" : "global/1262" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:13 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "f98856b1" }, { "Path" : "global/2964" , "Size" : 0, "Last-Modified" : "2020-04-03 18:52:12 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "00000000" }, { "Path" : "global/1213" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:12 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "860d02d5" }, { "Path" : "global/1260" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:12 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "3b8ad06a" }, { "Path" : "global/1261" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:12 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "968c06d9" }, { "Path" : "global/1214" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:12 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "2f187a01" }, { "Path" : "global/2396" , "Size" : 8192, "Last-Modified" : "2020-04-03 18:52:13 GMT" , "Checksum-Algorithm" : "CRC32C" , "Checksum" : "d3229ead" }, |
It contains a list of all the files in the backup with the size, the last modified timestamp and a check sum that was generated with CRC32C. This is the default but pg_basebackup comes with new options you can use to change the algorithm that is used to created the checksums:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | postgres@centos8pg:/home/postgres/ [pgdev] pg_basebackup --help pg_basebackup takes a base backup of a running PostgreSQL server. Usage: pg_basebackup [ OPTION ]... Options controlling the output : -D, --pgdata=DIRECTORY receive base backup into directory -F, --format=p|t output format (plain (default), tar) -r, --max-rate=RATE maximum transfer rate to transfer data directory ( in kB/s, or use suffix "k" or "M" ) -R, --write-recovery-conf write configuration for replication -T, --tablespace-mapping=OLDDIR=NEWDIR relocate tablespace in OLDDIR to NEWDIR --waldir=WALDIR location for the write-ahead log directory -X, --wal-method=none|fetch|stream include required WAL files with specified method -z, --gzip compress tar output -Z, --compress=0-9 compress tar output with given compression level General options: -c, --checkpoint=fast|spread set fast or spread checkpointing -C, --create-slot create replication slot -l, --label=LABEL set backup label -n, --no-clean do not clean up after errors -N, --no-sync do not wait for changes to be written safely to disk -P, --progress show progress information -S, --slot=SLOTNAME replication slot to use -v, --verbose output verbose messages -V, --version output version information, then exit --no-slot prevent creation of temporary replication slot --no-verify-checksums do not verify checksums --no-estimate-size do not estimate backup size in server side --no-manifest suppress generation of backup manifest --manifest-force-encode hex encode all filenames in manifest --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE use algorithm for manifest checksums -?, --help show this help, then exit |
1 |
You can also go back to the previous behavior and disable the generation of the backup manifest altogether. Once the backup is there and the manifest is generated you can use pg_validatebackup to check integrity of what was written by pg_basebackup:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | postgres@centos8pg: /home/postgres/ [pgdev] pg_validatebackup --help pg_validatebackup validates a backup against the backup manifest. Usage: pg_validatebackup [OPTION]... BACKUPDIR Options: -e, -- exit -on-error exit immediately on error -i, --ignore=RELATIVE_PATH ignore indicated path -m, --manifest=PATH use specified path for manifest -n, --no-parse-wal do not try to parse WAL files -s, --skip-checksums skip checksum verification -w, --wal-directory=PATH use specified path for WAL files -V, --version output version information, then exit -?, --help show this help, then exit |
In the most simple form this is just:
1 2 | postgres@centos8pg: /home/postgres/ [pgdev] pg_validatebackup /var/tmp/backup/ backup successfully verified |
That is a really cool feature.