This is a question that comes up quite often: How can I use optimizer hints in PostgreSQL as I can do it in Oracle? Well, you cant, and the reasons are this:
- Poor application code maintainability: hints in queries require massive refactoring.
- Interference with upgrades: today’s helpful hints become anti-performance after an upgrade.
- Encouraging bad DBA habits slap a hint on instead of figuring out the real issue.
- Does not scale with data size: the hint that’s right when a table is small is likely to be wrong when it gets larger.
- Failure to actually improve query performance: most of the time, the optimizer is actually right.
- Interfering with improving the query planner: people who use hints seldom report the query problem to the project.
But this does not mean that you cant influence the optimizer (or “planner” in PostgreSQL wording), it is just not working in the same way. Lets have a look.
On of the reasons that the planner does not choose an index over a sequential scan is that the parameter effective_cache_size is not set properly. To understand what it does you have to know that PostgreSQL works together with the operating system file cache/disk cache very well. It is not required, as you do it in Oracle, to give most of the available memory of the server to the database. Usually you start with 25% of the total available memory and give that to PostgreSQL by setting the parameter shared_buffers to that value. When pages fall out of that region it is still likely that they are available in the disk cache and can be retrieved from there without going down to disk. And this is what effective_cache_size is about: Setting this parameter does not consume more memory but is telling PostgreSQL how big the total cache of the system really is, so shared_buffers plus disk cache. This gets taken into consideration by the planner. A good starting point is 50 to 75% of the available memory. Lets do a quick test to show how this behaves. Lets generate some data:
postgres=# ! cat a.sql drop table if exists t1; create table t1 ( a int ); with generator as ( select a.* from generate_series ( 1, 5000000 ) a order by random() ) insert into t1 ( a ) select a from generator; create index i1 on t1(a); analyze verbose t1; select * from pg_size_pretty ( pg_relation_size ('t1' )); select * from pg_size_pretty ( pg_total_relation_size('t1')); postgres=# i a.sql DROP TABLE CREATE TABLE INSERT 0 5000000 CREATE INDEX psql:a.sql:12: INFO: analyzing "public.t1" psql:a.sql:12: INFO: "t1": scanned 22124 of 22124 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows ANALYZE pg_size_pretty ---------------- 173 MB (1 row) pg_size_pretty ---------------- 280 MB (1 row) postgres=# show shared_buffers ; shared_buffers ---------------- 128MB (1 row)
The table without the index is big enough to not fit into shared_buffers (173MB) and even bigger of course including the index (280MB). When we set effective_cache_size to a very low value we get costs of 40.55 for the statement below (almost no disk cache):
postgres=# SET effective_cache_size TO '1 MB'; SET postgres=# explain SELECT * FROM t1 ORDER BY a limit 10; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=0.43..40.55 rows=10 width=4) -> Index Only Scan using i1 on t1 (cost=0.43..20057243.41 rows=5000000 width=4) (2 rows)
Setting this to a more realistic value decreases the costs because it is expected to find the index in the disk cache:
postgres=# SET effective_cache_size TO '5 GB'; SET postgres=# explain SELECT * FROM t1 ORDER BY a limit 10; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.43..0.87 rows=10 width=4) -> Index Only Scan using i1 on t1 (cost=0.43..218347.46 rows=5000000 width=4) (2 rows)
This is the first “hint” you can set to influence the optimizer/planner. But there are many others. What PostgreSQL allows you to do is to enable or disable features of the planner:
postgres=# select name from pg_settings where name like 'enable%'; name ---------------------- enable_bitmapscan enable_hashagg enable_hashjoin enable_indexonlyscan enable_indexscan enable_material enable_mergejoin enable_nestloop enable_seqscan enable_sort enable_tidscan
Using the same data from above we could disable the index only scan:
postgres=# set enable_indexonlyscan=false; SET postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY a limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..0.87 rows=10 width=4) (actual time=0.019..0.058 rows=10 loops=1) Buffers: shared hit=13 -> Index Scan using i1 on t1 (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.017..0.036 rows=10 loops=1) Buffers: shared hit=13 Planning time: 0.057 ms Execution time: 0.084 ms (6 rows) postgres=# set enable_indexonlyscan=true; SET postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY a limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.43..0.87 rows=10 width=4) (actual time=0.025..0.072 rows=10 loops=1) Buffers: shared hit=13 -> Index Only Scan using i1 on t1 (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.023..0.048 rows=10 loops=1) Heap Fetches: 10 Buffers: shared hit=13 Planning time: 0.068 ms Execution time: 0.105 ms (7 rows)
But the documentation clearly states: “If the default plan chosen by the optimizer for a particular query is not optimal, a temporary solution is to use one of these configuration parameters to force the optimizer to choose a different plan”. For testing and troubleshooting this can be handy.
Another way to influence the optimizer/planner is to set the planner cost constants:
postgres=# select name from pg_settings where name like '%cost%' and name not like '%vacuum%'; name ---------------------- cpu_index_tuple_cost cpu_operator_cost cpu_tuple_cost parallel_setup_cost parallel_tuple_cost random_page_cost seq_page_cost" (7 rows)
What they mean is pretty well documented and how you need to set them (if you need to change them at all) depends on your hardware and application. There are others as well, such as the *collapse_limit* parameters and the parameters for the Genetic Query Optimizer.
Conclusion: There are several ways you can influence the optimizer/planner in PostgreSQL it is just not by using hints.