Up to now, there is no official way in PostgreSQL to tweak optimizer statistics manually. While this is quite common in other relational databases, PostgreSQL did not provide anything to do this. The first steps to allow this have now been committed for PostgreSQL 18.
Before look at the two new functions which have been added, let’s create a small test table and an index:
postgres=# create table t ( a int, b text );
CREATE TABLE
postgres=# insert into t values (1,'aa');
INSERT 0 1
postgres=# insert into t select i, 'bb' from generate_series(2,100) i;
INSERT 0 99
postgres=# analyze t;
ANALYZE
postgres=# create index i on t(b);
CREATE INDEX
postgres=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | text | | |
Indexes:
"i" btree (b)
The number of pages and the number of rows are currently reported as this:
postgres=# select relpages,reltuples from pg_class where relname = 't';
relpages | reltuples
----------+-----------
1 | 100
(1 row)
Asking for the explain plan for a simple select over the table restricting on ”aa’ for column b will give a sequential scan:
postgres=# explain select * from t where b = 'aa';
QUERY PLAN
-------------------------------------------------
Seq Scan on t (cost=0.00..2.25 rows=1 width=7)
Filter: (b = 'aa'::text)
(2 rows)
As the table is consisting of only one block, it is still faster to read the whole table than to use the index, even if there is only one row matching our criteria. One of the two new function introduced by the commit linked above is pg_set_relation_stats:
postgres=# \x
Expanded display is on.
postgres=# \df pg_set_relation_stats
List of functions
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------------------------------------------------------------
Schema | pg_catalog
Name | pg_set_relation_stats
Result data type | boolean
Argument data types | relation regclass, relpages integer DEFAULT NULL::integer, reltuples real DEFAULT NULL::real, relallvisible integer DEFAULT NULL::integer
Type | func
As you can easily see, this function takes a relation name, the number of pages, the number of rows and the all visible flag as input parameters. This allows us to tweak the statistics so that the optimizer will go for an index scan instead of the sequential scan:
postgres=# select * from pg_set_relation_stats('t'::regclass, 1, 1000000 );
pg_set_relation_stats
-----------------------
t
(1 row)
postgres=# \x
Expanded display is off.
postgres=# select relpages,reltuples from pg_class where relname = 't';
relpages | reltuples
----------+-----------
1 | 1e+06
(1 row)
postgres=# explain select * from t where b = 'aa';
QUERY PLAN
-----------------------------------------------------------------
Index Scan using i on t (cost=0.17..183.18 rows=10000 width=7)
Index Cond: (b = 'aa'::text)
(2 rows)
postgres=#
Nice, now we can debug optimizer decisions by modifying these statistics. Just be aware that statistics we set using this function are kind of temporary. Any manual or automatic analyze will overwrite them:
postgres=# analyze t;
ANALYZE
postgres=# select relpages,reltuples from pg_class where relname = 't';
relpages | reltuples
----------+-----------
1 | 100
(1 row)
postgres=# explain select * from t where b = 'aa';
QUERY PLAN
-------------------------------------------------
Seq Scan on t (cost=0.00..2.25 rows=1 width=7)
Filter: (b = 'aa'::text)
(2 rows)
The second function provided can be used to reset the statistics as they would be when the table gets created:
postgres=# select * from pg_clear_relation_stats('t'::regclass);
pg_clear_relation_stats
-------------------------
t
(1 row)
postgres=# select relpages,reltuples from pg_class where relname = 't';
relpages | reltuples
----------+-----------
0 | -1
(1 row)