Every time you send a SQL statement to PostgreSQL it must be parsed. Parsing is fast, yes, but parsing the same statement a thousand times can quickly sum up to quite some time the database could spend for something else. To avoid that PostgreSQL comes with the prepare statement. Using that you can avoid parsing of statements and only planning and executing will happen. Lets see how that works.
To generate some sample data I used pgbench with a scale factor of 100 which gives me 10’000’000 rows in the pgbench_accounts table:
postgres@pgbox:/home/postgres/ [PGDEV] pgbench -i -s 100 bench postgres@pgbox:/home/postgres/ [PGDEV] psql bench psql (11devel) Type "help" for help. bench=# select count(*) from pgbench_accounts; count ---------- 10000000 (1 row) bench=# d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Collation | Nullable | Default ----------+---------------+-----------+----------+--------- aid | integer | | not null | bid | integer | | | abalance | integer | | | filler | character(84) | | | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Lets assume we have the following simple query:
bench=# select count(*) from pgbench_accounts where aid = 11111; count ------- 1 (1 row)
As said at the beginning of this post PostgreSQL will need to parse that statement. Using explain with the right options you are able to see how much time was spend on planning:
bench=# explain (analyze,buffers) select count(*) from pgbench_accounts where aid = 11111; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4.46..4.46 rows=1 width=8) (actual time=0.060..0.063 rows=1 loops=1) Buffers: shared hit=4 -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..4.45 rows=1 width=0) (actual time=0.034..0.039 rows=1 loops=1) Index Cond: (aid = 11111) Heap Fetches: 0 Buffers: shared hit=4 Planning Time: 0.150 ms Execution Time: 0.133 ms (8 rows)
Planning this statement takes more time then executing it. Now assume you want to execute that very same statement one thousand times:
bench=# t bench=# select 'select count(*) from pgbench_accounts where aid = 11111;' from generate_series(1,1000) i; g test.sql bench=# ! cat test.sql | head select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; select count(*) from pgbench_accounts where aid = 11111; ...
When we execute that we force PostgreSQL to parse all of those 1000 statements:
bench=# timing Timing is on. bench=# ! time psql -f test.sql bench real 0m0.148s user 0m0.031s sys 0m0.015s
What you can do to avoid that is to prepare the statement:
bench=# prepare tt as select count(*) from pgbench_accounts where aid = 11111; PREPARE
Once it is prepared you can execute it:
bench=# execute tt; count ------- 1 (1 row)
… and you can also explain it:
bench=# explain(analyze,buffers) execute tt; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4.46..4.46 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1) Buffers: shared hit=4 -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..4.45 rows=1 width=0) (actual time=0.037..0.043 rows=1 loops=1) Index Cond: (aid = 11111) Heap Fetches: 0 Buffers: shared hit=4 Planning Time: 0.021 ms Execution Time: 0.269 ms (8 rows)
You’ll notice that the planning time went down quite a lot compared to the not prepared statement:
bench=# explain (analyze,buffers) select count(*) from pgbench_accounts where aid = 11111; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4.46..4.46 rows=1 width=8) (actual time=0.167..0.172 rows=1 loops=1) Buffers: shared hit=4 -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..4.45 rows=1 width=0) (actual time=0.030..0.037 rows=1 loops=1) Index Cond: (aid = 11111) Heap Fetches: 0 Buffers: shared hit=4 Planning Time: 0.167 ms Execution Time: 0.248 ms (8 rows)
When you do that one thousand times now:
bench=# t bench=# select 'execute tt;' from generate_series(1,1000) i; g test.sql bench=# ! sed -i '1s/^/prepare tt as select count(*) from pgbench_accounts where aid = 11111;/' test.sql bench=# ! time psql -f test.sql bench real 0m0.095s user 0m0.031s sys 0m0.010s
… execution time will come down. It is not much in that simple case but this is due to the simple statement. Btw: As prepared statements only life for the time of the session the sed command adds the prepare statement to the top of file, and preparing takes time as well. Without that execution time would be even less.
When your values change in the where clause you can do it like this:
bench=# prepare tt as select count(*) from pgbench_accounts where aid = $1; PREPARE Time: 0.571 ms bench=# execute tt (123); 1
For completeness: When you want to get rid of a prepared statement without losing your session there is deallocate:
bench=# deallocate tt; DEALLOCATE Time: 0.623 ms