Recently at a customer we faced a somehow strange issue: We’ve setup logical replication from a public cloud managed PostgreSQL service to a PostgreSQL instance on a VM hosted by the same public cloud provider. For most of the tables that went fine, but for a few of them, throughput for synchronizing the initial snapshot (or initial load) dropped to a degree that it would have taken weeks for this process to complete. PostgreSQL as a managed service in public clouds is fine, as long as all goes smooth. Once you need to troubleshoot an issue, you are limited to what the cloud provider gives you. Usually the limitations start inside PostgreSQL because you don’t get a real super user. When you want to troubleshoot something from the operating system, you’re anyway lost. So the only options you have if you really don’t know what the real issue is, are the dashboards you get from the provider, or creating support tickets, or the PostgreSQL log file. Those can, or cannot, give you the details you need.

In our case, we wanted to move fast and explored the options we had left:

  • Creating a physical replica to a self managed PostgreSQL on a VM just for being able to troubleshoot the real issue from there on: First we though that might be an option, but it quickly turned it is not: No superuser access
  • Separating the problematic tables into separate subscriptions and publications did not help as well, we saw the same slowdown in transfer speed
  • As only four out of a dozen tables had issues we didn’t want to totally get rid of logical replication
  • Can we dump / reload those four tables and start the logical replication based on the time of the dump? This way we would not loose the benefit of logical replication at all. It turned out this is possible, and this is what this post is about.

The source in the customer’s setup was PostgreSQL 11, so we’ll use the same version here. For this simple demo we’ll use pgbench to initialize a simple schema and use these standard tables for the replication:

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# \! pgbench -i -s 10 
dropping old tables...
creating tables...
generating data...
100000 of 1000000 tuples (10%) done (elapsed 0.10 s, remaining 0.93 s)
200000 of 1000000 tuples (20%) done (elapsed 0.33 s, remaining 1.31 s)
300000 of 1000000 tuples (30%) done (elapsed 0.52 s, remaining 1.22 s)
400000 of 1000000 tuples (40%) done (elapsed 0.74 s, remaining 1.10 s)
500000 of 1000000 tuples (50%) done (elapsed 0.96 s, remaining 0.96 s)
600000 of 1000000 tuples (60%) done (elapsed 1.20 s, remaining 0.80 s)
700000 of 1000000 tuples (70%) done (elapsed 1.44 s, remaining 0.62 s)
800000 of 1000000 tuples (80%) done (elapsed 1.61 s, remaining 0.40 s)
900000 of 1000000 tuples (90%) done (elapsed 1.74 s, remaining 0.19 s)
1000000 of 1000000 tuples (100%) done (elapsed 1.98 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
postgres=# \d
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)

Let’s assume the problematic table is the large one, which is pgbench_accounts. The other three tables replicate fine and we can put them into the same publication:

postgres=# create publication pub_test for table pgbench_branches,pgbench_history,pgbench_tellers;
CREATE PUBLICATION
postgres=# select * from pg_publication;
 pubname  | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate 
----------+----------+--------------+-----------+-----------+-----------+-------------
 pub_test |       10 | f            | t         | t         | t         | t
(1 row)

On the target instance, which is PostgreSQL 15 in this case, we need to prepare the same table structures as on the source. You can either do that manually, use pg_dump to create only the schema, or just use pgbench and truncate the tables:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# \! pgbench -i -s 1
dropping old tables...
creating tables...
generating data (client-side)...
100000 of 100000 tuples (100%) done (elapsed 0.19 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 0.44 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 0.21 s, vacuum 0.04 s, primary keys 0.16 s).
postgres=# truncate table pgbench_accounts;
TRUNCATE TABLE
postgres=# truncate table pgbench_branches;
TRUNCATE TABLE
postgres=# truncate table pgbench_history;
TRUNCATE TABLE
postgres=# truncate table pgbench_tellers;
TRUNCATE TABLE
postgres=# 

Once we have the structures, we can create the subscription to attach to the just created publication on the source, and wait for the initial snapshot / load to complete:

postgres=# create subscription sub_test connection 'host=localhost port=5435 user=postgres dbname=postgres' publication pub_test;
CREATE SUBSCRIPTION
postgres=# select * from pg_subscription;
  oid  | subdbid | subskiplsn | subname  | subowner | subenabled | subbinary | substream | subtwophasestate | subdisableonerr |     >
-------+---------+------------+----------+----------+------------+-----------+-----------+------------------+-----------------+----->
 16589 |   16559 | 0/0        | sub_test |       10 | t          | f         | f         | d                | f               | host>
(1 row)

postgres=# select count(*) from pgbench_branches;
 count 
-------
    10
(1 row)

The synchronization for the three tables is up and running and changes are replicated on the fly. This can easily be verified by inserting a row on the source:

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

postgres=# insert into pgbench_branches values (-1,-1,'aa');
INSERT 0 1

… and check for the same row in the target:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

postgres=# select * from pgbench_branches where bid = -1;
 bid | bbalance |                                          filler                                          
-----+----------+------------------------------------------------------------------------------------------
  -1 |       -1 | aa                                                                                      
(1 row)

Now we need to handle the remaining table. The first step to do is to create a publication for that table in the source:

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# create publication pub_test_2 for table pgbench_accounts;
CREATE PUBLICATION
postgres=# select * from pg_publication;
  pubname   | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate 
------------+----------+--------------+-----------+-----------+-----------+-------------
 pub_test   |       10 | f            | t         | t         | t         | t
 pub_test_2 |       10 | f            | t         | t         | t         | t
(2 rows)

For being able to start the replication from a specific point in time, we need a snapshot in the source to start from. This can be done by creating a replication connection to the source database and then define a logical replication slot. The important point here is, that you need to keep this connection open until the replication is fully setup. Otherwise you’ll loose the snapshot:

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

postgres=# \q
postgres@debian11pg:/home/postgres/ [1112] psql "dbname=postgres replication=database"
psql (11.12)
Type "help" for help.

postgres=# CREATE_REPLICATION_SLOT my_logical_repl_slot LOGICAL pgoutput;
      slot_name       | consistent_point |    snapshot_name    | output_plugin 
----------------------+------------------+---------------------+---------------
 my_logical_repl_slot | 0/37404478       | 00000003-00000097-1 | pgoutput
(1 row)

postgres=# 

This gives us the snapshot (00000003-00000097-1) we can export from with pg_dump:

postgres@debian11pg:/home/postgres/ [1112] pg_dump --snapshot=00000003-00000097-1 -a -t public.pgbench_accounts > pgbench_accounts.sql

On the target we’ll load the pgbench_accounts table:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# \i pgbench_accounts.sql
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
COPY 1000000
postgres=# select count(*) from public.pgbench_accounts;
  count  
---------
 1000000
(1 row)

At this point, at least in real life, changes are still going on in the pgbench_accounts table in the source and we do not have them in the target yet. Now we need a new subscription which attaches to the logical replication slot we’ve created in the replication connection. The important point here is, that this subscription does not load the initial data automatically, which can be specified using the following options:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# create subscription sub_test_2 connection 'host=localhost port=5435 user=postgres dbname=postgres' publication pub_test_2 with ( slot_name = 'my_logical_repl_slot', create_slot='false' , enabled='false', copy_data='false');
CREATE SUBSCRIPTION

Having that in place we can start the replication:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# alter subscription sub_test_2 enable;
ALTER SUBSCRIPTION

Adding data on the source should now trigger the replication of the new data to the target:

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

postgres=# insert into pgbench_accounts select i,i,i,i::text from generate_series(1000001,1000100) i;
INSERT 0 100
postgres=# 

On the target we should see 100 additional rows in the pgbench_accounts table:

postgres=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# select count(*) from public.pgbench_accounts ;
  count  
---------
 1000100
(1 row)

Works as expected. This can be a nice workaround if you have tables which , for whatever reason, cannot be initially replicated in a time which is acceptable for the project.