While giving the last PostgreSQL DBA Essentials workshop this question came up: When we create a table or an index: are the statistics available automatically? To be more precise: When we create and load a table in one step, create an index on that table afterwards: Do we have the statistics available by default or do we need to wait for autovacuum to kick in or analyze manually? Lets see …
First of all lets disable autovacuum so it does not kick off analyze in the background:
postgres=# ! ps -ef | grep autov | grep -v grep postgres 1641 1635 0 07:08 ? 00:00:00 postgres: MY_CLUSTER: autovacuum launcher process postgres=# alter system set autovacuum=off; ALTER SYSTEM postgres=# select * from pg_reload_conf(); pg_reload_conf ---------------- t (1 row) postgres=# ! ps -ef | grep autov | grep -v grep
Create and populate the table:
postgres=# ! cat a.sql drop table if exists t; create table t as select a.*, md5(a::varchar) from generate_series(1,5000000) a; postgres=# i a.sql psql:a.sql:1: NOTICE: table "t" does not exist, skipping DROP TABLE SELECT 5000000
Create an index:
postgres=# create index i1 on t(a); CREATE INDEX postgres=# d+ t Table "public.t" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+----------+--------------+------------- a | integer | | | | plain | | md5 | text | | | | extended | | Indexes: "i1" btree (a)
Do we have statistics already? Lets check:
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- (0 rows)
No, at least not for the table. What about the index?
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i1'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- (0 rows)
No. Lets analyze:
postgres=# analyze t; ANALYZE postgres=# analyze i1; WARNING: skipping "i1" --- cannot analyze non-tables or special system tables ANALYZE
Apparently we can not analyze an index. What do we see now?
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- 0 | 4 | -1 0 | 33 | -1 (2 rows) postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i1'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- (0 rows)
We do see statistics for the table but not for the index. The reason is that “analyze” works on the tables, but not on the indexes. For regular indexes there will be nothing in pg_statistic because that information would be redundant with the underlying table columns. But there will be statistics for function based indexes:
postgres=# create index i2 on t(lower(a::text)); CREATE INDEX postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i2'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- (0 rows) postgres=# analyze t; ANALYZE postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 'i2'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- 0 | 10 | -1 (1 row)
So, when autovacuum is off we do not get statistics when we do not kick off a manual analyze (which is not a surprise). What happens when autovacuum is on?
postgres=# alter system set autovacuum=on; ALTER SYSTEM postgres=# select * from pg_reload_conf(); pg_reload_conf ---------------- t (1 row) postgres=# i a.sql DROP TABLE SELECT 5000000 postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- (0 rows)
Nope, same picture here. But some seconds later:
postgres=# select stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 't'::regclass; stanullfrac | stawidth | stadistinct -------------+----------+------------- 0 | 4 | -1 0 | 33 | -1 (2 rows)
… statistics are there. Conclusion: When you require current statistics directly after loading a table you’d better kick of a manual analyze right after. Otherwise autovacuum will take care about that, but not immediately.