Finally in PostgreSQL 10 (expected to be released this September) a long awaited feature will probably appear: In-core logical replication. PostgreSQL supports physical replication since version 9.0 and now the next step happened with the implementation of logical replication. This will be a major help in upgrading PostgreSQL instances from one version to another with no (or almost no) downtime. In addition this can be used to consolidate data from various instances into one instance for reporting purposes or you can use it to distribute only a subset of your data to selected users on other instances. In contrast to physical replication logical replication works on the table level so you can replicate changes in one or more tables, one database are all databases in a PostgreSQL instance which is quite flexible.
In PostgreSQL logical replication is implemented using a publisher and subscriber model. This mean the publisher is the one who will send the data and the subscriber is the one who will receive and apply the changes. A subscriber can be a publisher as well so you can build cascading logical replication. Here is an overview of a possible setup:
For setting up logical replication when you do not start with an empty database you’ll need to initially load the database where you want to replicate to. How can you do that? I have two PostgreSQL 10 instances (build from the git sources) running on the same host:
Role | Port |
---|---|
Publisher | 6666 |
Subsriber | 6667 |
Lets assume we have this sample setup on the publisher instance:
drop table if exists t1; create table t1 ( a int primary key , b varchar(100) ); with generator as ( select a.* from generate_series ( 1, 5000000 ) a order by random() ) insert into t1 ( a,b ) select a , md5(a::varchar) from generator; select * from pg_size_pretty ( pg_relation_size ('t1' ));
On the subscriber instance there is the same table, but empty:
create table t1 ( a int primary key , b varchar(100) );
Before we start with the initial load lets take a look at the process list:
postgres@pgbox:/home/postgres/ [PUBLISHER] ps -ef | egrep "PUBLISHER|SUBSCRIBER" postgres 17311 1 0 11:33 pts/0 00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/PUBLISHER postgres 17313 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: checkpointer process postgres 17314 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: writer process postgres 17315 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: wal writer process postgres 17316 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: autovacuum launcher process postgres 17317 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: stats collector process postgres 17318 17311 0 11:33 ? 00:00:00 postgres: PUBLISHER: bgworker: logical replication launcher postgres 17321 1 0 11:33 pts/1 00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/SUBSCRIBER postgres 17323 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: checkpointer process postgres 17324 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: writer process postgres 17325 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: wal writer process postgres 17326 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: autovacuum launcher process postgres 17327 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: stats collector process postgres 17328 17321 0 11:33 ? 00:00:00 postgres: SUBSCRIBER: bgworker: logical replication launcher
You’ll notice that there is a new background process called “bgworker: logical replication launcher”. We’ll come back to that later.
Time to create our first publication on the publisher with the create publication command:
postgres@pgbox:/u02/pgdata/PUBLISHER/ [PUBLISHER] psql -X postgres psql (10devel) Type "help" for help. postgres=# create publication my_first_publication for table t1; CREATE PUBLICATION
On the subscriber we need to create a subscription by using the create subscription command:
postgres@pgbox:/u02/pgdata/SUBSCRIBER/ [SUBSCRIBER] psql -X postgres psql (10devel) Type "help" for help. postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication; ERROR: could not create replication slot "my_first_subscription": ERROR: logical decoding requires wal_level >= logical
Ok, good hint. After changing that on both instances:
postgres@pgbox:/home/postgres/ [SUBSCRIBER] psql -X postgres psql (10devel) Type "help" for help. postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication; CREATE SUBSCRIPTION
If you are not on super fast hardware and check the process list again you’ll see something like this:
postgres 19465 19079 19 11:58 ? 00:00:04 postgres: SUBSCRIBER: bgworker: logical replication worker for subscription 16390 sync 16384
On the subscriber the “logical replication launcher” background process launched a worker process and syncs the table automatically (this can be avoided by using the “NOCOPY DATA”):
postgres=# show port; port ------ 6667 (1 row) postgres=# select count(*) from t1; count --------- 5000000 (1 row)
Wow, that was really easy. You can find more details in the logfile of the subscriber instance:
2017-04-13 11:58:15.099 CEST - 1 - 19087 - - @ LOG: starting logical replication worker for subscription "my_first_subscription" 2017-04-13 11:58:15.101 CEST - 1 - 19463 - - @ LOG: logical replication apply for subscription my_first_subscription started 2017-04-13 11:58:15.104 CEST - 2 - 19463 - - @ LOG: starting logical replication worker for subscription "my_first_subscription" 2017-04-13 11:58:15.105 CEST - 1 - 19465 - - @ LOG: logical replication sync for subscription my_first_subscription, table t1 started 2017-04-13 11:59:03.373 CEST - 1 - 19082 - - @ LOG: checkpoint starting: xlog 2017-04-13 11:59:37.985 CEST - 2 - 19082 - - @ LOG: checkpoint complete: wrote 14062 buffers (85.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=26.959 s, sync=2.291 s, total=34.740 s; sync files=13, longest=1.437 s, average=0.171 s; distance=405829 kB, estimate=405829 kB 2017-04-13 12:02:23.728 CEST - 2 - 19465 - - @ LOG: logical replication synchronization worker finished processing
On the publisher instance you get another process for sending the changes to the subscriber:
postgres 19464 18318 0 11:58 ? 00:00:00 postgres: PUBLISHER: wal sender process postgres ::1(41768) idle
Changes to the table on the publisher should now get replicated to the subscriber node:
postgres=# show port; port ------ 6666 (1 row) postgres=# insert into t1 (a,b) values (-1,'aaaaa'); INSERT 0 1 postgres=# update t1 set b='bbbbb' where a=-1; UPDATE 1
On the subscriber node:
postgres=# show port; port ------ 6667 (1 row) postgres=# select * from t1 where a = -1; a | b ----+------- -1 | aaaaa (1 row) postgres=# select * from t1 where a = -1; a | b ----+------- -1 | bbbbb (1 row)
As mentioned initially you can make the subscriber a publisher and the publisher a subscriber at the same time. So when we create this table on both instances:
create table t2 ( a int primary key );
Then create a publication on the subscriber node:
postgres=# create table t2 ( a int primary key ); CREATE TABLE postgres=# show port; port ------ 6667 (1 row) postgres=# create publication my_second_publication for table t2; CREATE PUBLICATION postgres=#
Then create the subscription to that on the publisher node:
postgres=# show port; port ------ 6666 (1 row) postgres=# create subscription my_second_subscription connection 'host=localhost port=6667 dbname=postgres user=postgres' publication my_second_publication; CREATE SUBSCRIPTION
… we have a second logical replication the other way around:
postgres=# show port; port ------ 6667 (1 row) postgres=# insert into t2 values ( 1 ); INSERT 0 1 postgres=# insert into t2 values ( 2 ); INSERT 0 1 postgres=#
On the other instance:
postgres=# show port; port ------ 6666 (1 row) postgres=# select * from t2; a --- 1 2 (2 rows)
There are two new catalog views which give you information about subscriptions and publications:
postgres=# select * from pg_subscription; subdbid | subname | subowner | subenabled | subconninfo | subslotname | subpublications ---------+------------------------+----------+------------+--------------------------------------------------------+------------------------+------------------------- 13216 | my_second_subscription | 10 | t | host=localhost port=6667 dbname=postgres user=postgres | my_second_subscription | {my_second_publication} (1 row) postgres=# select * from pg_publication; pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete ----------------------+----------+--------------+-----------+-----------+----------- my_first_publication | 10 | f | t | t | t (1 row)
What a cool feature and so easy to use. Thanks to all who brought that into PostgreSQL 10, great work.