My last post about logical replication in PostgreSQL was in 2017 before PostgreSQL 10 was released later that year. There have been many enhancements to logical replication since then and today we’ll look at a new features that will probably come with PostgreSQL 15 later this year: Allow specifying row filters for logical replication of tables. What is it about? Up until today you can only replicate whole tables to a subscriber node. It is not possible to filter for specific rows or, in other words: You cannot replicate a subset of the rows in the table. With this commit this becomes possible, which is a great feature.

Lets start by creating two brand new instances, the first one will be used as the publisher, and the second one as the subscriber:

postgres@debian11pg:/home/postgres/ [pgdev] mkdir /var/tmp/pub && initdb -D /var/tmp/pub && echo "port=8888" >> /var/tmp/pub/postgresql.auto.conf
postgres@debian11pg:/home/postgres/ [pgdev] mkdir /var/tmp/sub && initdb -D /var/tmp/sub && echo "port=9999" >> /var/tmp/sub/postgresql.auto.conf

For being able to use logical replication, wal_level needs to be increased to “logical”, so lets do that and start up both instances:

postgres@debian11pg:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/pub/postgresql.auto.conf
postgres@debian11pg:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/sub/postgresql.auto.conf
postgres@debian11pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pub/ start -l /dev/null
postgres@debian11pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/sub/ start -l /dev/null
postgres@debian11pg:/home/postgres/ [pgdev] psql -p 8888 -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

postgres@debian11pg:/home/postgres/ [pgdev] psql -p 9999 -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

Although it is not really required (as long as table and column names match), the exactly same schema and table are now getting created in both clusters:

postgres@debian11pg:/home/postgres/ [pgdev] cat c.sql 
create schema a;
create table a.t ( a int primary key, b text, c date );
postgres@debian11pg:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 8888
CREATE SCHEMA
CREATE TABLE
postgres@debian11pg:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 9999
CREATE SCHEMA
CREATE TABLE

Lets assume we have a kind of tenancy in that table on the publisher, which is defined by the second (b) column:

postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'aaa',now() from generate_series(1,1000) i" -p 8888
INSERT 0 1000
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'bbb',now() from generate_series(1001,2000) i" -p 8888
INSERT 0 1000
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'ccc',now() from generate_series(2001,3000) i" -p 8888
INSERT 0 1000

Up to PostgreSQL 14, the only option you have for logically replicating that table, is to replicate the whole table. Since the commit mentioned above we can filter on a subset of the rows:

postgres@debian11pg:/home/postgres/ [pgdev] psql -c "h create publication" -p 8888
Command:     CREATE PUBLICATION
Description: define a new publication
Syntax:
CREATE PUBLICATION name
    [ FOR ALL TABLES
      | FOR publication_object [, ... ] ]
    [ WITH ( publication_parameter [= value] [, ... ] ) ]

where publication_object is one of:

    TABLE [ ONLY ] table_name [ * ] [ WHERE ( expression ) ] [, ... ]
    ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]

URL: https://www.postgresql.org/docs/devel/sql-createpublication.html

Lets assume we only want to replicate tenant ‘aaa’, so we can do it like this:

postgres@debian11pg:/home/postgres/ [pgdev] psql -c "create publication pub1 for table a.t where (b = 'aaa')" -p 8888
CREATE PUBLICATION

Once we consume that on the subscriber side we’ll only have the data for that tenant:

postgres@debian11pg:/home/postgres/ [pgdev] psql -c "create subscription sub1 connection 'port=8888' publication pub1" -p 9999
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "select distinct(b), count(*) from a.t group by b" -p 9999
  b  | count 
-----+-------
 aaa |  1000
(1 row)

Really cool. Thanks to all involved for making this happen.