In database management, query performance is critical for efficient data retrieval. PostgreSQL, a powerful open-source database, offers various extensions to optimize and tune query performance. Two such extensions are Index Advisor and HypoPG. These tools help database administrators (DBAs) identify and test potential indexes to improve query performance without the overhead of creating and dropping indexes repeatedly. This blog will guide you through the installation and usage of these extensions to enhance your PostgreSQL performance.
1. Installing Index Advisor and HypoPG Extensions
Before diving into the usage of these extensions, let’s start with the installation process.
1.1. PostgreSQL Index Advisor
Index Advisor is a PostgreSQL extension that recommends indexes based on query analysis. One nice point is that the extension code is still maintained. index_advisor requires Postgres with hypopg installed. Here’s how to install it:
- Installation:
--- Install hypopg
postgres@server> sudo apt install postgresql-XY-hypopg
--- Install index_adviser
postgres@server> git clone https://github.com/supabase/index_advisor.git
postgres@server> cd index_advisor
postgres@server> sudo make install
- Enable the Extension in PostgreSQL:
postgres# CREATE EXTENSION hypopg;
postgres# create extension if not exists index_advisor cascade;
Now that the extensions are installed, let’s explore how to use them to analyze and optimize query performance.
2. Using Index Advisor to Recommend Indexes
Index Advisor analyzes your queries and suggests indexes that can improve performance. Let’s see it in action with a sample query.
2.1. Example Query
Consider the following query that retrieves data from dbi_technologies
, dbi_techres
, and dbi_techlevels
tables:
SELECT tech.id techid, tech.label techlabel, reslevel.techlevel techlevel, reslevel.techlvlid
FROM dbi_technologies tech
LEFT JOIN
(SELECT techlvl.label techlevel, techlvl.id techlvlid, techres.tech_id tech_id
FROM dbi_techres techres,
dbi_techlevels techlvl
WHERE techres.res_id = $2
AND techlvl.id = techres.techlevel_id) AS reslevel
ON tech.id = reslevel.tech_id
WHERE tech.cat_id = $1
ORDER BY techlabel;
2.2. Running Index Advisor
To get index recommendations for the above query, use the following command:
SELECT * FROM index_advisor('YOUR QUERY HERE');
2.3. Sample Output
-[ RECORD 1 ]-------------------------
startup_cost_before | 70.97
startup_cost_after | 66.76
total_cost_before | 71.00
total_cost_after | 66.79
index_statements | ['CREATE INDEX ON dbicc.dbi_technologies USING btree (cat_id)', 'CREATE INDEX ON dbicc.dbi_techres USING btree (res_id)']
errors | []
2.4. Output Explanation
- startup_cost_before and startup_cost_after: These show the estimated cost until the query can return the first row of the result before and after applying the recommended indexes. For example, with a hash join, this includes the cost to build the internal table. For a sequential scan, the startup cost is always zero, as rows can be returned immediately.
- total_cost_before and total_cost_after: Estimated total cost of executing the query before and after applying the recommended indexes.
- index_statements: Suggested SQL statements to create the indexes. In this case, we got two suggestions.
- errors: Any errors encountered during the analysis (empty in this case).
3. Using HypoPG to Test Hypothetical Indexes
HypoPG allows you to test the impact of hypothetical indexes without actually creating them. This is useful for evaluating potential indexes before committing to their creation. The hypothetical indexes created by HypoPG are stored in your connection’s private memory, not in any catalog. This means they won’t increase the size of any table or affect other connections.
3.1. Analyzing Query Performance without Hypothetical Indexes
Let’s start by testing the query performance without any hypothetical indexes.
postgres# EXPLAIN SELECT tech.id techid, tech.label techlabel, reslevel.techlevel techlevel, reslevel.techlvlid
FROM dbi_technologies tech
LEFT JOIN (
SELECT techlvl.label techlevel, techlvl.id techlvlid, techres.tech_id tech_id
FROM dbi_techres techres, dbi_techlevels techlvl
WHERE techres.res_id = 60
AND techlvl.id = techres.techlevel_id
) AS reslevel
ON tech.id = reslevel.tech_id
WHERE tech.cat_id = 10
ORDER BY techlabel;
+-------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-------------------------------------------------------------------------------------------------------------------------|
| Sort (cost=72.10..72.13 rows=14 width=241) |
| Sort Key: tech.label |
| -> Nested Loop Left Join (cost=0.41..71.83 rows=14 width=241) |
| -> Seq Scan on dbi_technologies tech (cost=0.00..10.01 rows=14 width=19) |
| Filter: (cat_id = 10) |
| -> Nested Loop (cost=0.41..4.41 rows=1 width=226) |
| -> Index Only Scan using idx_tech_res_level on dbi_techres techres (cost=0.28..4.02 rows=1 width=8) |
| Index Cond: ((tech_id = tech.id) AND (res_id = 60)) |
| -> Index Scan using dbi_techlevels_id_pkey on dbi_techlevels techlvl (cost=0.13..0.33 rows=1 width=222) |
| Index Cond: (id = techres.techlevel_id) |
+-------------------------------------------------------------------------------------------------------------------------+
3.2. Creating the First Hypothetical Index
We are going to create our first hypothetical index based on the output from the index_advisor statement. Use hypopg_create_index
to create it:
SELECT * FROM hypopg_create_index('CREATE INDEX ON dbicc.dbi_technologies USING btree (cat_id)');
3.3. Analyzing Query Performance with Hypothetical Index
Run EXPLAIN
on your query to see the impact of the hypothetical index:
postgres# EXPLAIN SELECT tech.id techid, tech.label techlabel, reslevel.techlevel techlevel, reslevel.techlvlid
FROM dbi_technologies tech
LEFT JOIN
( SELECT techlvl.label techlevel, techlvl.id techlvlid, techres.tech_id tech_id
FROM dbi_techres techres,
dbi_techlevels techlvl
WHERE techres.res_id = 60
AND techlvl.id = techres.techlevel_id) AS reslevel
ON tech.id = reslevel.tech_id
WHERE tech.cat_id = 10
ORDER BY techlabel;
+-------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-------------------------------------------------------------------------------------------------------------------------|
| Sort (cost=71.39..71.43 rows=14 width=241) |
| Sort Key: tech.label |
| -> Nested Loop Left Join (cost=4.55..71.12 rows=14 width=241) |
| -> Bitmap Heap Scan on dbi_technologies tech (cost=4.13..9.31 rows=14 width=19) |
| Recheck Cond: (cat_id = 10) |
| -> Bitmap Index Scan on "<14454>btree_dbicc_dbi_technologies_cat_id" (cost=0.00..4.13 rows=14 width=0) |
| Index Cond: (cat_id = 10) |
| -> Nested Loop (cost=0.41..4.41 rows=1 width=226) |
| -> Index Only Scan using idx_tech_res_level on dbi_techres techres (cost=0.28..4.02 rows=1 width=8) |
| Index Cond: ((tech_id = tech.id) AND (res_id = 60)) |
| -> Index Scan using dbi_techlevels_id_pkey on dbi_techlevels techlvl (cost=0.13..0.33 rows=1 width=222) |
| Index Cond: (id = techres.techlevel_id) |
+-------------------------------------------------------------------------------------------------------------------------+
3.4. Output Explanation
- Bitmap Heap Scan and Bitmap Index Scan: Show how the hypothetical index is used in scanning the table.
- Cost Reduction: Reduced costs in the query plan indicate improved performance due to the hypothetical index.
3.5. Creating our second Hypothetical Index and check the output
postgres# SELECT * FROM hypopg_create_index('CREATE INDEX ON dbicc.dbi_techres USING btree (res_id)');
+------------+---------------------------------------+
| indexrelid | indexname |
|------------+---------------------------------------|
| 14455 | <14455>btree_dbicc_dbi_techres_res_id |
+------------+---------------------------------------+
SELECT 1
Time: 0.013s
postgres# EXPLAIN SELECT tech.id techid, tech.label techlabel, reslevel.techlevel techlevel, reslevel.techlvlid
FROM dbi_technologies tech
LEFT JOIN
( SELECT techlvl.label techlevel, techlvl.id techlvlid, techres.tech_id tech_id
FROM dbi_techres techres,
dbi_techlevels techlvl
WHERE techres.res_id = 60
AND techlvl.id = techres.techlevel_id) AS reslevel
ON tech.id = reslevel.tech_id
WHERE tech.cat_id = 10
ORDER BY techlabel;
+---------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|---------------------------------------------------------------------------------------------------------------------------------------|
| Sort (cost=62.47..62.51 rows=14 width=241) |
| Sort Key: tech.label |
| -> Hash Right Join (cost=10.65..62.21 rows=14 width=241) |
| Hash Cond: (techres.tech_id = tech.id) |
| -> Hash Join (cost=1.17..52.55 rows=66 width=226) |
| Hash Cond: (techres.techlevel_id = techlvl.id) |
| -> Index Scan using "<14455>btree_dbicc_dbi_techres_res_id" on dbi_techres techres (cost=0.03..51.12 rows=66 width=8) |
| Index Cond: (res_id = 60) |
| -> Hash (cost=1.06..1.06 rows=6 width=222) |
| -> Seq Scan on dbi_techlevels techlvl (cost=0.00..1.06 rows=6 width=222) |
| -> Hash (cost=9.31..9.31 rows=14 width=19) |
| -> Bitmap Heap Scan on dbi_technologies tech (cost=4.13..9.31 rows=14 width=19) |
| Recheck Cond: (cat_id = 10) |
| -> Bitmap Index Scan on "<14454>btree_dbicc_dbi_technologies_cat_id" (cost=0.00..4.13 rows=14 width=0) |
| Index Cond: (cat_id = 10) |
+---------------------------------------------------------------------------------------------------------------------------------------+
3.6. Output Explanation
- Bitmap Heap Scan and Bitmap Index Scan: Show how the hypothetical index is used in scanning the table.
- Cost Reduction: Reduced costs in the query plan indicate improved performance due to the hypothetical index.
3.7. Key Differences and Benefits
After the creation of our HypoPG indexes, we can see some differences in our explain statements and notice some benefits:
Reduced Total Cost:
- Without Indexes: The total cost was estimated at around 71.83.
- With HypoPG Indexes: The total cost decreased to about 62.21.
Benefit: The overall cost of executing the query is lower, indicating improved efficiency.
Improved Join Strategies:
- Without Indexes: The query used nested loop joins, which can be less efficient for large datasets.
- With HypoPG Indexes: The query plan switched to hash joins, which are generally faster for joining large sets of data.
Benefit: Hash joins are more efficient and can handle larger datasets better than nested loops.
Better Index Usage:
- Without Indexes: The plan didn’t use any specific indexes for
cat_id
orres_id
in thedbi_technologies
table. - With HypoPG Indexes: The plan utilized the hypothetical indexes for both
cat_id
andres_id
.
Benefit: The use of these indexes speeds up data retrieval by directly accessing the relevant rows rather than scanning the entire table.
Change in Access Methods:
- Without Indexes: The plan involved a sequence scan and index-only scans.
- With HypoPG Indexes: The plan used bitmap heap scans and bitmap index scans.
Benefit: Bitmap scans are efficient for handling multiple rows and are typically faster when dealing with indexed columns.
4. Conclusion
Index Advisor and HypoPG are powerful tools in the PostgreSQL ecosystem, allowing DBAs to fine-tune their database performance with minimal disruption (and without overheating your brain). By recommending and simulating indexes, these extensions help you make decisions about index creation, ensuring your queries run efficiently.
Use these tools to analyze your queries, create hypothetical indexes, and measure their impact before making any permanent changes. I hope that this blog will help you make your PostgreSQL performances better thanks to Index Advisor and HypoPG.
Happy optimizing!
Useful links
https://github.com/supabase/index_advisor
https://hypopg.readthedocs.io/en/rel1_stable/index.html
https://github.com/HypoPG/hypopg
Don’t mind checking my previous blog about pgcli to make your life easier while using PostgreSQL: https://www.dbi-services.com/blog/simplifying-postgresql-management-a-guide-to-install-and-use-pgcli/