This is the fourth part of the Greenplum blog series, the previous ones are here: Getting started with Greenplum – 1 – Installation, Getting started with Greenplum – 2 – Initializing and bringing up the cluster, Getting started with Greenplum – 3 – Behind the scenes. In this blog we’ll look at how you are supposed to backup and restore a Greenplum cluster.

If you restarted the cluster nodes and log on to the systems again, you’ll notice that the instances are not running:

[gpadmin@cdw ~]$ ps -ef | grep postgres
gpadmin     1285    1233  0 09:43 pts/0    00:00:00 grep --color=auto potsgres
[gpadmin@cdw ~]$ psql postgres
psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Starting and stopping the cluster is done with “gpstart” and “gpstop”, so starting it up is just a matter of this:

[gpadmin@cdw ~]$ which gpstart
/usr/local/greenplum-db-7.1.0/bin/gpstart
[gpadmin@cdw ~]$ gpstart
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Starting gpstart with args: 
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 7.1.0 build commit:e7c2b1f14bb42a1018ac57d14f4436880e0a0515 Open Source'
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Greenplum Catalog Version: '302307241'
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Starting Coordinator instance in admin mode
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1/ -l /data/coordinator/gpseg-1//log/startup.log -w -t 600 -o " -c gp_role=utility " start
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Obtaining Greenplum Coordinator catalog information
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Setting new coordinator era
20240301:09:45:37:001290 gpstart:cdw:gpadmin-[INFO]:-Coordinator Started...
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Shutting down coordinator
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:---------------------------
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Coordinator instance parameters
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:---------------------------
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Database                 = template1
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Coordinator Port              = 5432
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Coordinator directory         = /data/coordinator/gpseg-1/
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Timeout                  = 600 seconds
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Coordinator standby           = Off 
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:---------------------------------------
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-Segment instances that will be started
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:---------------------------------------
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-   Host   Datadir                Port   Role
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-   sdw1   /data/primary/gpseg0   6000   Primary
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-   sdw2   /data/mirror/gpseg0    7000   Mirror
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-   sdw2   /data/primary/gpseg1   6000   Primary
20240301:09:45:38:001290 gpstart:cdw:gpadmin-[INFO]:-   sdw1   /data/mirror/gpseg1    7000   Mirror

Continue with Greenplum instance startup Yy|Nn (default=N):
> Y
20240301:09:45:49:001290 gpstart:cdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Process results...
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-   Successful segment starts                                            = 4
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Successfully started 4 of 4 segment instances 
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-----------------------------------------------------
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Starting Coordinator instance cdw directory /data/coordinator/gpseg-1/ 
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=5519b53b4b2c1dab_240301094537 $GPHOME/bin/pg_ctl -D /data/coordinator/gpseg-1/ -l /data/coordinator/gpseg-1//log/startup.log -w -t 600 -o " -c gp_role=dispatch " start
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Command pg_ctl reports Coordinator cdw instance active
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Connecting to db template1 on host localhost
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-No standby coordinator configured.  skipping...
20240301:09:45:50:001290 gpstart:cdw:gpadmin-[INFO]:-Database successfully started

Before we look at how you can recover from failed coordinator or segment nodes we’ll look at how you are supposed to backup the database(s). The tool for backup databases in Greenplum is called “gpbackup“.

We’ll use the database we’ve created in the previous post to create a table containing some sample data. In a Greenplum system tables are distributed across the segments and when you create a table you have three options on how you want this to happen:

  • DISTRIBUTED BY: You choose the column which will be used to distribute the data
  • DISTRIBUTED RANDOMLY: Use this if there is no unique column
  • DISTRIBUTED REPLICATED: Every row is distributed to all segments

As the system works best (performance wise) when you have the same amount of data on all the segment nodes we’ll go with the first method and create the table like this:

[gpadmin@cdw ~]$ psql d
psql (12.12)
Type "help" for help.

d=# create table t1 ( id int primary key
                    , dummy text 
                    ) distributed by (id);
CREATE TABLE

For populating the table we will use the standard generate_series PostgreSQL function:

d=# insert into t1 
    select i, md5(i::text) 
      from generate_series(1,1000000) i;
INSERT 0 1000000

When we try to backup this database with “gpbackup”, there is our first surprise. This utility is not available by default in the open source version of Greenplum:

[gpadmin@cdw ~]$ which gpbackup
/usr/bin/which: no gpbackup in (/usr/local/greenplum-db-7.1.0/bin:/home/gpadmin/.local/bin:/home/gpadmin/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
[gpadmin@cdw ~]$ find /usr/local/ -name gpbackup 

We could still use the standard pg_dump utility from PostgreSQL:

[gpadmin@cdw ~]$ pg_dump d > d.sql
[gpadmin@cdw ~]$ tail -10 d.sql 
--

ALTER TABLE ONLY public.t1
    ADD CONSTRAINT t1_pkey PRIMARY KEY (id);


--
-- Greenplum Database database dump complete
--

In the same way you can use pg_dumpall to get the global objects:

[gpadmin@cdw ~]$ pg_dumpall --globals-only > globals.sql
[gpadmin@cdw ~]$ cat globals.sql 
--
-- Greenplum Database cluster dump
--

SET default_transaction_read_only = off;

SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;

--
-- Roles
--

CREATE ROLE gpadmin;
ALTER ROLE gpadmin WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS PASSWORD 'md5b44a9b06d576a0b083cd60e5f875cf48';

--
-- PostgreSQL database cluster dump complete
--

As this is standard PostgreSQL stuff we’ll not look into that any further. Re-loading can either be done with “psql” or “pg_restore” as usual.

For getting the gpbackup onto the system we need Go. The default version of Go which comes with Rocky Linux 9 is fine, so we can install it with dnf and fetch the latest release of gpbackup afterwards:

[gpadmin@cdw ~]$ sudo dnf install golang -y
[gpadmin@cdw ~]$ go version
go version go1.20.10 linux/amd64
[gpadmin@cdw ~]$ wget https://github.com/greenplum-db/gpbackup/releases/download/1.30.3/gpbackup_binaries_rhel9.tar.gz
[gpadmin@cdw ~]$ tar axf gpbackup_binaries_rhel9.tar.gz
[gpadmin@cdw ~]$ ./gpbackup --version
gpbackup version 1.30.3

In the most simple form a backup can be taken like this:

[gpadmin@cdw ~]$ mkdir backup
[gpadmin@cdw ~]$ ./gpbackup --dbname d --backup-dir backup/
20240301:11:01:33 gpbackup:gpadmin:cdw:004933-[CRITICAL]:-backup/ is not an absolute path.
[gpadmin@cdw ~]$ ./gpbackup --dbname d --backup-dir ~/backup/
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-gpbackup version = 1.30.3
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Greenplum Database Version = 7.1.0 build commit:e7c2b1f14bb42a1018ac57d14f4436880e0a0515 Open Source
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Starting backup of database d
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Backup Timestamp = 20240301110143
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Backup Database = d
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Gathering table state information
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Acquiring ACCESS SHARE locks on tables
Locks acquired:  1 / 1 [================================================================] 100.00% 0s
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Gathering additional table metadata
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Getting storage information
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Metadata will be written to /home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gpbackup_20240301110143_metadata.sql
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Writing global database metadata
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Global database metadata backup complete
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Writing pre-data metadata
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Pre-data metadata metadata backup complete
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Writing post-data metadata
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Post-data metadata backup complete
20240301:11:01:43 gpbackup:gpadmin:cdw:004942-[INFO]:-Writing data to file
Tables backed up:  1 / 1 [=================================================================] 100.00%
[-----------------------------------------------------------------------------------------=]   0.00%
20240301:11:01:44 gpbackup:gpadmin:cdw:004942-[INFO]:-Data backup complete
20240301:11:01:45 gpbackup:gpadmin:cdw:004942-[INFO]:-Found neither /usr/local/greenplum-db-7.1.0/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20240301:11:01:45 gpbackup:gpadmin:cdw:004942-[INFO]:-Email containing gpbackup report /home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gpbackup_20240301110143_report will not be sent
20240301:11:01:45 gpbackup:gpadmin:cdw:004942-[INFO]:-Backup completed successfully

The global stuff went into “/home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gpbackup_20240301110143_metadata.sql”:

[gpadmin@cdw ~]$ cat /home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gpbackup_20240301110143_metadata.sql | egrep -v "^$|^#"
SET client_encoding = 'UTF8';
ALTER RESOURCE QUEUE pg_default WITH (ACTIVE_STATEMENTS=20);
ALTER RESOURCE GROUP admin_group SET CPU_MAX_PERCENT 1;
ALTER RESOURCE GROUP admin_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP default_group SET CPU_MAX_PERCENT 1;
ALTER RESOURCE GROUP default_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP system_group SET CPU_MAX_PERCENT 1;
ALTER RESOURCE GROUP system_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP default_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP default_group SET CONCURRENCY 20;
ALTER RESOURCE GROUP default_group SET CPU_MAX_PERCENT 20;
ALTER RESOURCE GROUP admin_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP admin_group SET CONCURRENCY 10;
ALTER RESOURCE GROUP admin_group SET CPU_MAX_PERCENT 10;
ALTER RESOURCE GROUP system_group SET CPU_WEIGHT 100;
ALTER RESOURCE GROUP system_group SET CONCURRENCY 0;
ALTER RESOURCE GROUP system_group SET CPU_MAX_PERCENT 10;
CREATE ROLE gpadmin;
ALTER ROLE gpadmin WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION PASSWORD 'md5b44a9b06d576a0b083cd60e5f875cf48' RESOURCE QUEUE pg_default RESOURCE GROUP admin_group;
CREATE DATABASE d TEMPLATE template0;
ALTER DATABASE d OWNER TO gpadmin;
COMMENT ON SCHEMA public IS 'standard public schema';
ALTER SCHEMA public OWNER TO gpadmin;
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM gpadmin;
GRANT ALL ON SCHEMA public TO PUBLIC;
GRANT ALL ON SCHEMA public TO gpadmin;
CREATE SCHEMA IF NOT EXISTS gp_toolkit;
SET search_path=gp_toolkit,pg_catalog;
CREATE EXTENSION IF NOT EXISTS gp_toolkit WITH SCHEMA gp_toolkit;
SET search_path=pg_catalog;
COMMENT ON EXTENSION gp_toolkit IS 'various GPDB administrative views/functions';
CREATE TABLE public.t1 (
        id integer NOT NULL,
        dummy text
) DISTRIBUTED BY (id);
ALTER TABLE public.t1 OWNER TO gpadmin;
ALTER TABLE ONLY public.t1 ADD CONSTRAINT t1_pkey PRIMARY KEY (id);

The local backup directory on the coordinator does not contain any user data, only meta data:

[gpadmin@cdw ~]$ ls -la backup/gpseg-1/backups/20240301/20240301110143/
total 16
drwxr-xr-x 2 gpadmin gpadmin  171 Mar  1 11:01 .
drwxr-xr-x 3 gpadmin gpadmin   28 Mar  1 11:01 ..
-r--r--r-- 1 gpadmin gpadmin  742 Mar  1 11:01 gpbackup_20240301110143_config.yaml
-r--r--r-- 1 gpadmin gpadmin 1935 Mar  1 11:01 gpbackup_20240301110143_metadata.sql
-r--r--r-- 1 gpadmin gpadmin 1965 Mar  1 11:01 gpbackup_20240301110143_report
-r--r--r-- 1 gpadmin gpadmin 4048 Mar  1 11:01 gpbackup_20240301110143_toc.yaml

The actual data is on the segment nodes:

[gpadmin@sdw1 ~]$ ls backup/gpseg0/backups/20240301/20240301110143/
gpbackup_0_20240301110143_17122.gz
[gpadmin@sdw2 ~]$ ls  backup/gpseg1/backups/20240301/20240301110143/
gpbackup_1_20240301110143_17122.gz

Restoring that is done with “gprestore” passing in the timestamp (the directory name) of the backup:

[gpadmin@cdw ~]$ psql -c "drop database d" postgres
DROP DATABASE
[gpadmin@cdw ~]$ ./gprestore --backup-dir ~/backup/ --timestamp 20240301110143
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-Restore Key = 20240301110143
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-gpbackup version = 1.30.3
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-gprestore version = 1.30.3
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-Greenplum Database Version = 7.1.0 build commit:e7c2b1f14bb42a1018ac57d14f4436880e0a0515 Open Source
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[CRITICAL]:-Database "d" does not exist. Use the --create-db flag to create "d" as part of the restore process.
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-Found neither /usr/local/greenplum-db-7.1.0/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20240301:11:18:41 gprestore:gpadmin:cdw:005223-[INFO]:-Email containing gprestore report /home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gprestore_20240301110143_20240301111841_report will not be sent
[gpadmin@cdw ~]$ ./gprestore --backup-dir ~/backup/ --timestamp 20240301110143 --create-db
20240301:11:18:47 gprestore:gpadmin:cdw:005245-[INFO]:-Restore Key = 20240301110143
20240301:11:18:47 gprestore:gpadmin:cdw:005245-[INFO]:-gpbackup version = 1.30.3
20240301:11:18:47 gprestore:gpadmin:cdw:005245-[INFO]:-gprestore version = 1.30.3
20240301:11:18:47 gprestore:gpadmin:cdw:005245-[INFO]:-Greenplum Database Version = 7.1.0 build commit:e7c2b1f14bb42a1018ac57d14f4436880e0a0515 Open Source
20240301:11:18:48 gprestore:gpadmin:cdw:005245-[INFO]:-Creating database
20240301:11:18:52 gprestore:gpadmin:cdw:005245-[INFO]:-Database creation complete for: d
20240301:11:18:52 gprestore:gpadmin:cdw:005245-[INFO]:-Restoring pre-data metadata
Pre-data objects restored:  8 / 8 [=====================================================] 100.00% 0s
20240301:11:18:52 gprestore:gpadmin:cdw:005245-[INFO]:-Pre-data metadata restore complete
Tables restored:  1 / 1 [==================================================================] 100.00%
[-----------------------------------------------------------------------------------------=]   0.00%
20240301:11:18:53 gprestore:gpadmin:cdw:005245-[INFO]:-Data restore complete
20240301:11:18:53 gprestore:gpadmin:cdw:005245-[INFO]:-Restoring post-data metadata
Post-data objects restored:  1 / 1 [====================================================] 100.00% 0s
20240301:11:18:54 gprestore:gpadmin:cdw:005245-[INFO]:-Post-data metadata restore complete
20240301:11:18:54 gprestore:gpadmin:cdw:005245-[INFO]:-Found neither /usr/local/greenplum-db-7.1.0/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20240301:11:18:54 gprestore:gpadmin:cdw:005245-[INFO]:-Email containing gprestore report /home/gpadmin/backup/gpseg-1/backups/20240301/20240301110143/gprestore_20240301110143_20240301111847_report will not be sent
20240301:11:18:54 gprestore:gpadmin:cdw:005245-[INFO]:-Restore completed successfully

Of course you should make sure that the backup directories are separate mount points and are not local on the nodes. There are also some storage plugins you might want to consider.

According to the documentation you should not use pg_basebackup to backup segment instances, so doing physical backups and point in time recoveries is not an option.

In the next post we’ll look at how we can recover from a failed segment node.