Usually we do not see many foreign data wrappers being used by our customers. Most of them use the foreign data wrapper for Oracle to fetch data from Oracle systems. Some of them use the foreign data wrapper for files but that’s mostly it. Only one (I am aware of) actually uses the foreign data wrapper for PostgreSQL which obviously connects PostgreSQL to PostgreSQL. Some foreign data wrappers allow for collecting optimizer statistics on foreign tables and the foreign data wrappers for Oracle and PostgreSQL are examples for this. These local statistics are better than nothing but you need to take care that they are up to date and for that you need a fresh copy of the statistics over the remote data. PostgreSQL 19 will come with a solution for that when it comes to the foreign data wrapper for PostgreSQL. Actually, the solution is not in the foreign data wrapper for PostgreSQL but in the underlying framework and postgres_fdw uses can use that from version 19 on.
For looking at this we need a simple setup, so we initialize two new PostgreSQL 19 clusters and connect them with postgres_fdw:
postgres@:/home/postgres/ [pgdev] initdb --version
initdb (PostgreSQL) 19devel
postgres@:/home/postgres/ [pgdev] initdb --pgdata=/var/tmp/pg1
postgres@:/home/postgres/ [pgdev] initdb --pgdata=/var/tmp/pg2
postgres@:/home/postgres/ [pgdev] echo "port=8888" >> /var/tmp/pg1/postgresql.auto.conf
postgres@:/home/postgres/ [pgdev] echo "port=8889" >> /var/tmp/pg2/postgresql.auto.conf
postgres@:/home/postgres/ [pgdev] pg_ctl --pgdata=/var/tmp/pg1/ start
postgres@:/home/postgres/ [pgdev] pg_ctl --pgdata=/var/tmp/pg2/ start
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "create extension postgres_fdw"
CREATE EXTENSION
postgres@:/home/postgres/ [pgdev] psql -p 8889 -c "create table t ( a int, b text, c timestamptz )"
CREATE TABLE
postgres@:/home/postgres/ [pgdev] psql -p 8889 -c "insert into t select i, md5(i::text), now() from generate_series(1,1000000) i"
INSERT 0 1000000
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "create server srv_pg2 foreign data wrapper postgres_fdw options(port '8889', dbname 'postgres')"
CREATE SERVER
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "create user mapping for postgres server srv_pg2 options (user 'postgres', password 'postgres')"
CREATE USER MAPPING
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "create foreign table ft (a int, b text, c timestamptz) server srv_pg2 options (schema_name 'public', table_name 't')"
CREATE FOREIGN TABLE
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "select count(*) from ft"
count
---------
1000000
(1 row)
What we have now is one table in the cluster on port 8889 and this table is attached as a foreign table in the cluster on port 8888.
We already have statistics on the source table in the cluster on port 8889:
postgres@:/home/postgres/ [pgdev] psql -p 8889 -c "select reltuples::bigint from pg_class where relname = 't'"
reltuples
-----------
1000000
(1 row)
… but we do not have any statistics on the foreign table in the cluster on port 8888:
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "select reltuples::bigint from pg_class where relname = 'ft'"
reltuples
-----------
-1
(1 row)
Only after manually analyzing the foreign table the statistics show up:
postgres@:/home/postgres/ [DEV] psql -p 8888 -c "analyze ft"
ANALYZE
postgres@:/home/postgres/ [DEV] psql -p 8888 -c "select reltuples::bigint from pg_class where relname = 'ft'"
reltuples
-----------
1000000
(1 row)
The issue that can arise with these local statistics is, that they probably become outdated when the source table is modified:
postgres@:/home/postgres/ [pgdev] psql -p 8889 -c "insert into t select i, md5(i::text), now() from generate_series(1000001,2000000) i"
INSERT 0 1000000
postgres@:/home/postgres/ [DEV] psql -p 8889 -c "select reltuples::bigint from pg_class where relname = 't'"
reltuples
-----------
2000000
(1 row)
postgres@:/home/postgres/ [DEV] psql -p 8888 -c "select reltuples::bigint from pg_class where relname = 'ft'"
reltuples
-----------
1000000
(1 row)
As you can see, the row counts do not match anymore. Once the local statistics are gathered we again have the same picture on both sides:
postgres@:/home/postgres/ [DEV] psql -p 8888 -c "analyze ft"
ANALYZE
postgres@:/home/postgres/ [DEV] psql -p 8888 -c "select reltuples::bigint from pg_class where relname = 'ft'"
reltuples
-----------
2000000
(1 row)
One way to avoid this issue even before PostgreSQL 19 is to tell postgres_fdw to run analyze on the remote table and to use those statistics:
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "alter foreign table ft options ( use_remote_estimate 'true' )"
In this case the local statistics will not be used but of course this comes with the overhead of the additional analyze on the remote side.
From PostgreSQL 19 there is another option:
postgres@:/home/postgres/ [pgdev] psql -p 8888 -c "alter foreign table ft options ( restore_stats 'true' )"
ALTER FOREIGN TABLE
This option tells postgres_fdw to import the statistics from the remote side and store them locally. If that fails it will run analyze as above, the commit message nicely explains this:
Add support for importing statistics from remote servers.
Add a new FDW callback routine that allows importing remote statistics
for a foreign table directly to the local server, instead of collecting
statistics locally. The new callback routine is called at the beginning
of the ANALYZE operation on the table, and if the FDW failed to import
the statistics, the existing callback routine is called on the table to
collect statistics locally.
Also implement this for postgres_fdw. It is enabled by "restore_stats"
option both at the server and table level. Currently, it is the user's
responsibility to ensure remote statistics to import are up-to-date, so
the default is false.
As usual, thanks to all involved.