{"id":43256,"date":"2026-03-01T20:09:15","date_gmt":"2026-03-01T19:09:15","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=43256"},"modified":"2026-03-01T22:35:59","modified_gmt":"2026-03-01T21:35:59","slug":"pgvector-a-guide-for-dba-part-2-indexes-update-march-2026","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/","title":{"rendered":"pgvector, a guide for DBA &#8211; Part 2: Indexes (update march 2026)"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>In <a href=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/\">Part 1<\/a> of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search &#8212; but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production.<\/p>\n\n\n\n<p>This post is about what comes next: <strong>indexes<\/strong>. Specifically, the <strong>three index families<\/strong> in the pgvector ecosystem as of February 2026 (HNSW, IVFFlat, and DiskANN), including <strong>two DiskANN implementations targeting different deployment models<\/strong>, what they&#8217;re good at, where they break, and the patterns you need, whether you&#8217;re the DBA tuning them or the developer looking to understand the the strenghts of PostgreSQL as a vector store. <\/p>\n\n\n\n<p>Everything in this post was tested on public dataset: <strong>25,000 Wikipedia articles<\/strong> embedded with OpenAI&#8217;s <code>text-embedding-3-large<\/code> at <strong>3,072 dimensions<\/strong>, the maximum the model supports. The high number of dimension is a choice, to highlight some limitations for pedagogical reasons. You would be ok running and testing with lower dimensions or other embedding models, you might want to look into the RAG series, I will probably make a blog post on how to test embedding models against your data sets.  <br>The environment is PostgreSQL 18 with pgvector 0.8.1 and pgvectorscale 0.9.0. <\/p>\n\n\n\n<p>All the SQL scripts, Python code, and Docker configuration are in the companion lab: <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\/tree\/main\/lab\/06_pgvector_indexes\"><code>lab\/06_pgvector_indexes<\/code><\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-index-types\">The Index Types<\/h2>\n\n\n\n<p>Before we dive in, here&#8217;s the landscape. pgvector ships with two built-in index types (HNSW and IVFFlat), and two DiskANN implementations are available from different vendors:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>HNSW<\/th><th>IVFFlat<\/th><th>DiskANN (pgvectorscale)<\/th><th>DiskANN (pg_diskann)<\/th><\/tr><\/thead><tbody><tr><td><strong>Provider<\/strong><\/td><td>pgvector<\/td><td>pgvector<\/td><td>Timescale<\/td><td>Microsoft<\/td><\/tr><tr><td><strong>Availability<\/strong><\/td><td>Built-in<\/td><td>Built-in<\/td><td>Open source, self-hosted<\/td><td>Azure DB for PostgreSQL<\/td><\/tr><tr><td><strong>Algorithm<\/strong><\/td><td>Multi-layer graph<\/td><td>Voronoi cell partitioning<\/td><td>Vamana graph + SBQ<\/td><td>Vamana graph + PQ<\/td><\/tr><tr><td><strong>Best for<\/strong><\/td><td>General purpose<\/td><td>Fast build<\/td><td>Storage-constrained<\/td><td>Azure + high recall<\/td><\/tr><tr><td><strong>Build time (25K, 3072d)<\/strong><\/td><td>29s<\/td><td>5s<\/td><td>49s<\/td><td>N\/A (Azure)<\/td><\/tr><tr><td><strong>Index size<\/strong><\/td><td>193 MB<\/td><td>193 MB<\/td><td><strong>21 MB<\/strong><\/td><td>Similar<\/td><\/tr><tr><td><strong>Query time<\/strong><\/td><td>2-6 ms<\/td><td>2-10 ms<\/td><td>3 ms<\/td><td>~3 ms<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>That pgvectorscale number is not a typo. 21 MB vs 193 MB for the same data. W<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Note:<\/strong> This post uses pgvectorscale for all DiskANN benchmarks since it&#8217;s the open-source, self-hosted option. We&#8217;ll compare both DiskANN implementations in detail in Section 3. pg_diskann is available only for Azure Flexible Server for PostgreSQL, the managed instance service from Microsoft. <\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-hnsw-the-default-choice\">HNSW: The Default Choice<\/h2>\n\n\n\n<p>HNSW (Hierarchical Navigable Small World) is the most commonly recommended index type for vector search. It builds a multi-layered graph where each node connects to its nearest neighbors, and search navigates this graph from top to bottom.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-2-000-dimension-wall\">The 2,000-Dimension Wall<\/h3>\n\n\n\n<p>Here&#8217;s the first thing you&#8217;ll hit with modern embedding models:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nCREATE INDEX idx_content_hnsw\nON articles USING hnsw (content_vector vector_cosine_ops)\nWITH (m = 16, ef_construction = 64);\n\nERROR:  column cannot have more than 2000 dimensions for hnsw index\n<\/pre><\/div>\n\n\n<p>The <code>vector<\/code> type in pgvector has a <strong>2,000-dimension limit<\/strong> for HNSW indexes. If you&#8217;re using <code>text-embedding-3-large<\/code> (3,072 dimensions), or <code>voyage-3-large<\/code> at its 2,048-dimension setting, this is a blocker.<\/p>\n\n\n\n<p>The workaround: <strong><code>halfvec<\/code><\/strong>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Step 1: Store a half-precision copy\nALTER TABLE articles ADD COLUMN content_halfvec halfvec(3072);\nUPDATE articles SET content_halfvec = content_vector::halfvec;\n\n-- Step 2: Index the halfvec column (limit: 4,000 dimensions)\nCREATE INDEX idx_content_hnsw_halfvec\nON articles USING hnsw (content_halfvec halfvec_cosine_ops)\nWITH (m = 16, ef_construction = 64);\n\nTime: 28974.392 ms (00:28.974)\n\nSELECT pg_size_pretty(pg_relation_size(&#039;idx_content_hnsw_halfvec&#039;)) AS hnsw_size;\n\n hnsw_size\n-----------\n 193 MB\n<\/pre><\/div>\n\n\n<p>29 seconds to build, 193 MB for 25,000 articles at 3,072 dimensions. That&#8217;s roughly <strong>8 KB per row<\/strong> in the index alone.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Important:<\/strong> <code>halfvec<\/code> stores each dimension in 2 bytes instead of 4. You lose some floating-point precision, but in practice the recall difference is negligible for similarity search. The storage savings are real: your halfvec column is 6 KB per row vs 12 KB for the full vector.<\/p>\n\n\n\n<p><strong>Alternative:<\/strong> Instead of a separate column, you can create an expression index that casts on the fly: <code>CREATE INDEX ... ON articles USING hnsw ((content_vector::halfvec(3072)) halfvec_cosine_ops);<\/code> The trade-off is that your queries must use the matching expression (<code>content_vector::halfvec(3072) &lt;=&gt; ...<\/code>) for the planner to pick it up, which is harder to read in application code. The separate column approach gives cleaner queries.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-tuning-ef-search-the-recall-vs-speed-dial\">Tuning ef_search: The Recall vs Speed Dial<\/h3>\n\n\n\n<p><code>ef_search<\/code> controls how many candidates the HNSW algorithm considers during search. Higher values mean more candidates examined, better recall, but more work. The default is 40.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- ef_search = 40 (default)\nSET hnsw.ef_search = 40;\nEXPLAIN ANALYZE\nSELECT id, title, content_halfvec &lt;=&gt; (\n    SELECT content_halfvec FROM articles WHERE id = 1\n) AS distance\nFROM articles\nORDER BY content_halfvec &lt;=&gt; (\n    SELECT content_halfvec FROM articles WHERE id = 1\n)\nLIMIT 10;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n  Index Searches: 1\n  Buffers: shared hit=551\nExecution Time: 6.004 ms\n\n-- ef_search = 100\nSET hnsw.ef_search = 100;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Buffers: shared hit=716\nExecution Time: 2.365 ms\n\n-- ef_search = 200\nSET hnsw.ef_search = 200;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Buffers: shared hit=883\nExecution Time: 2.542 ms\n<\/pre><\/div>\n\n\n<p>Wait &#8212; ef_search=100 was <em>faster<\/em> than ef_search=40? Not really. Those numbers came from a warm cache (<code>shared hit=551<\/code>, zero disk reads). The apparent speedup is a <strong>cache warming effect<\/strong>, not a property of the algorithm. To prove this, I restarted PostgreSQL and ran the full sweep from cold:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>ef_search<\/th><th>Execution Time<\/th><th>Buffers<\/th><th>read (disk)<\/th><\/tr><\/thead><tbody><tr><td>40 (cold)<\/td><td>91 ms<\/td><td>hit=189, <strong>read=362<\/strong><\/td><td>362 pages from disk<\/td><\/tr><tr><td>100<\/td><td>33 ms<\/td><td>hit=616, <strong>read=132<\/strong><\/td><td>fewer cold pages<\/td><\/tr><tr><td>200<\/td><td>22 ms<\/td><td>hit=850, <strong>read=65<\/strong><\/td><td>even fewer<\/td><\/tr><tr><td>40 (warm)<\/td><td><strong>0.8 ms<\/strong><\/td><td><strong>hit=551, read=0<\/strong><\/td><td>all cached<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The second run at ef_search=40 clocked 0.8 ms &#8212; faster than any ef_search=100 or 200 run. On a warm cache, all three (40\/100\/200) land in the 0.8-5 ms range. The variance is cache state, not algorithmic shortcuts. <strong>The real cost cliff is at ef_search=400<\/strong> where the optimizer switches plans entirely:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- ef_search = 400\nSET hnsw.ef_search = 400;\n\nSort  (actual time=364.693..364.695 rows=10.00 loops=1)\n  Sort Key: ((articles.content_halfvec &lt;=&gt; (InitPlan 1).col1))\n  Sort Method: top-N heapsort  Memory: 25kB\n  -&gt;  Seq Scan on articles  (actual time=0.143..356.482 rows=24700.00 loops=1)\n        Filter: (content_halfvec IS NOT NULL)\n        Rows Removed by Filter: 300\nExecution Time: 364.724 ms\n<\/pre><\/div>\n\n\n<p><strong>The planner chose a Seq Scan + Sort path.<\/strong> At ef_search=400, PostgreSQL&#8217;s cost model estimated the index scan would be more expensive than reading every row sequentially, and the query went from 2.5 ms to 365 ms.<\/p>\n\n\n\n<p>This is your <strong>optimizer flip-flop<\/strong>. It&#8217;s not a bug &#8212; it&#8217;s the planner doing its job. But it means you need to be aware of the threshold.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>ef_search<\/th><th>Execution Time<\/th><th>Plan<\/th><th>Buffers<\/th><\/tr><\/thead><tbody><tr><td>40<\/td><td>6.0 ms<\/td><td>Index Scan<\/td><td>551<\/td><\/tr><tr><td>100<\/td><td>2.4 ms<\/td><td>Index Scan<\/td><td>716<\/td><\/tr><tr><td>200<\/td><td>2.5 ms<\/td><td>Index Scan<\/td><td>883<\/td><\/tr><tr><td>400<\/td><td><strong>365 ms<\/strong><\/td><td><strong>Seq Scan<\/strong><\/td><td>209,192<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>DBA Takeaway:<\/strong> For HNSW on halfvec(3072), stay in the ef_search 40-200 range. Past that, you&#8217;re fighting the optimizer. If you need ef_search &gt; 200 for recall, you probably need a bigger <code>m<\/code> parameter at build time.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-build-parameters-m-and-ef-construction\">Build Parameters: m and ef_construction<\/h3>\n\n\n\n<p><code>m<\/code> is the number of connections per node in the graph. <code>ef_construction<\/code> is the candidate list size during build. Higher values = better graph quality but slower builds and (potentially) larger indexes.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- m=16, ef_construction=64 (default-ish)\nTime: 28,974 ms    Size: 193 MB\n\n-- m=32, ef_construction=128\nTime: 54,077 ms    Size: 193 MB\n<\/pre><\/div>\n\n\n<p>At 25K rows, doubling <code>m<\/code> doubled the build time but didn&#8217;t change the index size. The size effect becomes more visible at larger scales. In general: <strong>start with m=16, ef_construction=64 and only increase if recall is insufficient after tuning ef_search<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-ivfflat-fast-build\">IVFFlat: Fast Build<\/h2>\n\n\n\n<p>IVFFlat (Inverted File with Flat quantization) partitions the vector space into Voronoi cells using k-means clustering, then searches only the cells closest to the query vector.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-same-dimension-limit\">Same Dimension Limit<\/h3>\n\n\n\n<p>IVFFlat has the <strong>same 2,000-dimension limit<\/strong> as HNSW for the <code>vector<\/code> type. Same workaround:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE INDEX idx_content_ivfflat\nON articles USING ivfflat (content_halfvec halfvec_cosine_ops)\nWITH (lists = 25);\n\nTime: 5008.765 ms (00:05.009)\n\nSELECT pg_size_pretty(pg_relation_size(&#039;idx_content_ivfflat&#039;));\n-- 193 MB\n<\/pre><\/div>\n\n\n<p>5 seconds to build vs 29 for HNSW. The index size is nearly identical (193 MB vs 193 MB), but <strong>IVFFlat builds 5.8x faster<\/strong>.<\/p>\n\n\n\n<p>The <code>lists<\/code> parameter controls the number of Voronoi cells. The pgvector documentation recommends: <strong><code>rows \/ 1000<\/code> for tables up to 1M rows<\/strong>, <code>sqrt(rows)<\/code> for larger tables. For 25,000 articles: 25000 \/ 1000 = 25 lists, giving roughly 1,000 rows per cell. A common mistake is applying <code>sqrt(rows)<\/code> to small tables &#8212; that gives 158 lists here, creating cells with only ~167 rows each, which fragments the index and causes the optimizer to flip to sequential scan at surprisingly low probe counts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-tuning-probes-how-many-cells-to-search\">Tuning Probes: How Many Cells to Search<\/h3>\n\n\n\n<p><code>probes<\/code> controls how many Voronoi cells are searched at query time. Default is 1 &#8212; fast but low recall. A good starting point is <code>sqrt(lists)<\/code>. Here&#8217;s the sweep with lists=25:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET ivfflat.probes = 1;\n-- Index Scan, Execution Time: 1.0 ms, Buffers: 571\n\nSET ivfflat.probes = 2;\n-- Index Scan, Execution Time: 3.7 ms, Buffers: 1,944\n\nSET ivfflat.probes = 3;\n-- Index Scan, Execution Time: 4.4 ms, Buffers: 2,793\n\nSET ivfflat.probes = 4;\n-- Index Scan, Execution Time: 5.9 ms, Buffers: 3,937\n<\/pre><\/div>\n\n\n<p>And then:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET ivfflat.probes = 5;\n\nSort  (actual time=152.548..152.549 rows=10.00 loops=1)\n  Sort Key: ((articles.content_halfvec &lt;=&gt; (InitPlan 1).col1))\n  -&gt;  Seq Scan on articles  (actual time=0.144..148.844 rows=24700.00 loops=1)\n        Filter: (content_halfvec IS NOT NULL)\nExecution Time: 152.584 ms\n<\/pre><\/div>\n\n\n<p>Same story as HNSW. At probes=5 (searching 5 of 25 cells = 20%), the optimizer decided a sequential scan was cheaper. The query went from 6 ms to 153 ms.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>probes<\/th><th>Execution Time<\/th><th>Plan<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>1.0 ms<\/td><td>Index Scan<\/td><\/tr><tr><td>2<\/td><td>3.7 ms<\/td><td>Index Scan<\/td><\/tr><tr><td>3<\/td><td>4.4 ms<\/td><td>Index Scan<\/td><\/tr><tr><td>4<\/td><td>5.9 ms<\/td><td>Index Scan<\/td><\/tr><tr><td>5<\/td><td><strong>153 ms<\/strong><\/td><td><strong>Seq Scan<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> For IVFFlat with 25 lists, the optimizer flips between probes=4 and probes=5. With <code>sqrt(25) = 5<\/code>, you&#8217;re right at the tipping point. Use <code>SET LOCAL ivfflat.probes = 3<\/code> or <code>4<\/code> for a good recall\/speed balance. On larger tables (100K+ rows), the flip happens at much higher probe counts because sequential scans become proportionally more expensive.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-set-local-the-production-pattern\">SET LOCAL: The Production Pattern<\/h3>\n\n\n\n<p>Never set <code>ivfflat.probes<\/code> at the session level in production. Use <code>SET LOCAL<\/code> inside a transaction:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nBEGIN;\n    SET LOCAL ivfflat.probes = 3;\n    SELECT id, title, content_halfvec &lt;=&gt; (\n        SELECT content_halfvec FROM articles WHERE id = 1\n    ) AS distance\n    FROM articles\n    ORDER BY content_halfvec &lt;=&gt; (\n        SELECT content_halfvec FROM articles WHERE id = 1\n    )\n    LIMIT 10;\nCOMMIT;\n\n-- Verify: probes reverted to default\nSHOW ivfflat.probes;  -- 1\n<\/pre><\/div>\n\n\n<p>The setting reverts automatically after COMMIT\/ROLLBACK. No global state leakage. <strong>Do the same for <code>hnsw.ef_search<\/code>.<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>DO NOT PLAY AROUND WITH SESSION PARAMETERS ON PRODUCTION.<\/strong> Use <code>SET LOCAL<\/code> inside a transaction, or set them at the function\/procedure level. Session-level changes persist until disconnect and can affect every query on that connection, including connection pooler reuse. Hinting is not a strategy. <\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-diskann-leveraging-b-tree-index-principle\">DiskANN: Leveraging B-TREE index principle <\/h2>\n\n\n\n<p>DiskANN is provided by the <a href=\"https:\/\/github.com\/timescale\/pgvectorscale\">pgvectorscale<\/a> project from Timescale (SQL extension name: <code>vectorscale<\/code>). It implements the DiskANN algorithm with <strong>Statistical Binary Quantization (SBQ)<\/strong> compression built in. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-no-2-000-dimension-wall\">No 2,000-Dimension Wall<\/h3>\n\n\n\n<p>Unlike HNSW and IVFFlat, DiskANN supports the <code>vector<\/code> type natively up to <strong>16,000 dimensions<\/strong> &#8212; no halfvec workaround needed:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE INDEX idx_content_diskann ON articles USING diskann (content_vector)\nWITH (storage_layout = memory_optimized);\n\nNOTICE:  Starting index build with num_neighbors=50, search_list_size=100,\n         max_alpha=1.2, storage_layout=SbqCompression.  -- memory_optimized maps to SBQ\nNOTICE:  Indexed 24700 tuples\nTime: 49140.736 ms (00:49.141)\n\nSELECT pg_size_pretty(pg_relation_size(&#039;idx_content_diskann&#039;));\n-- 21 MB\n<\/pre><\/div>\n\n\n<p>49 seconds to build, but <strong>21 MB<\/strong>. Compare:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n        indexname         |  size  | size_bytes\n--------------------------+--------+------------\n idx_content_diskann      | 21 MB  |   22,511,616\n idx_content_hnsw_halfvec | 193 MB |  202,350,592\n idx_content_ivfflat      | 193 MB |  202,522,624\n<\/pre><\/div>\n\n\n<p>DiskANN is <strong>9x smaller<\/strong> than HNSW and IVFFlat on the same data. SBQ compression is the default &#8212; <code>storage_layout = memory_optimized<\/code> is what you get if you don&#8217;t specify a layout. Specifying it explicitly (as in the <code>CREATE INDEX<\/code> above) is good practice for readability. The alternative <code>plain<\/code> layout stores full vectors in the index and does not compress.<\/p>\n\n\n\n<p>Query performance is competitive:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nIndex Scan using idx_content_diskann on articles\n  Order By: (content_vector &lt;=&gt; (InitPlan 1).col1)\n  Buffers: shared hit=1437 read=132\nExecution Time: 2.915 ms\n<\/pre><\/div>\n\n\n<p>3 ms for a 3,072-dimension nearest neighbor search on 25K articles. On the same data, HNSW does it in 2-6 ms and IVFFlat in 2-10 ms. All three are in the same ballpark for query speed.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> DiskANN is the right choice when your HNSW index outgrows <code>shared_buffers<\/code>. At 193 MB for 25K rows, HNSW on halfvec(3072) would reach ~77 GB at 10 million rows, this is well beyond what most buffer pools can keep hot. DiskANN&#8217;s 9x compression keeps the navigational structure cached while full vectors stay in the heap, fetched only for the final rescore of a few dozen candidates. Same access pattern as a B-tree: compact index in memory, selective heap lookups on demand. The trade-off is longer build times and fewer operator class options (vector type only, no halfvec\/bit\/sparsevec).<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-create-index-concurrently\">CREATE INDEX CONCURRENTLY<\/h3>\n\n\n\n<p>As of pgvectorscale 0.9.0, DiskANN supports <code>CONCURRENTLY<\/code>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE INDEX CONCURRENTLY idx_content_diskann\nON articles USING diskann (content_vector)\nWITH (storage_layout = memory_optimized);\n<\/pre><\/div>\n\n\n<p>This is critical for production &#8212; you can build the index without locking the table. HNSW and IVFFlat also support <code>CREATE INDEX CONCURRENTLY<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-why-these-dimension-limits-exist-buffer-pages-and-bits\">Why These Dimension Limits Exist: Buffer Pages and Bits<\/h3>\n\n\n\n<p>If the 2,000 \/ 4,000 \/ 16,000 dimension limits seem arbitrary, they&#8217;re not. The <strong>intuition<\/strong> comes from how PostgreSQL stores data: in <strong>8 KB buffer pages<\/strong> (8,192 bytes). Every index tuple &#8212; including the vector representation in a vector index &#8212; has to fit within a page. The fewer bytes per dimension, the more dimensions you can pack.<\/p>\n\n\n\n<p>Here&#8217;s the back-of-the-envelope arithmetic:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Encoding<\/th><th>Bytes per dimension<\/th><th>Theoretical max in 8 KB<\/th><\/tr><\/thead><tbody><tr><td><code>vector<\/code> (float32)<\/td><td>4 bytes<\/td><td>8,192 \/ 4 = <strong>2,048<\/strong><\/td><\/tr><tr><td><code>halfvec<\/code> (float16)<\/td><td>2 bytes<\/td><td>8,192 \/ 2 = <strong>4,096<\/strong><\/td><\/tr><tr><td>4-bit quantized (PQ)<\/td><td>0.5 bytes<\/td><td>8,192 * 2 = <strong>16,384<\/strong><\/td><\/tr><tr><td>1-bit binary (SBQ)<\/td><td>0.125 bytes<\/td><td>8,192 * 8 = <strong>65,536<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The theoretical numbers explain the <strong>intuition<\/strong> &#8212; why halfvec doubles the limit and quantized encodings push it further. The <strong>actual limits<\/strong> are slightly lower because of page headers, tuple overhead, and the index metadata stored alongside each vector. For HNSW, each page also stores <strong>neighbor connection lists<\/strong> (up to <code>m * 2<\/code> neighbor IDs per node); for IVFFlat, each page carries <strong>centroid references and list pointers<\/strong>. These eat into the available space, which is why pgvector&#8217;s HNSW sets 2,000 (not 2,048) and pgvectorscale&#8217;s DiskANN sets 16,000 (not 16,384). But the pattern is unmistakable: <strong>the fewer bits per dimension, the more dimensions you can fit in a page<\/strong>.<\/p>\n\n\n\n<p>This is why DiskANN can handle 16,000 dimensions where HNSW on <code>vector<\/code> tops out at 2,000 &#8212; DiskANN stores a compressed representation in the index page, not the full vector. And why halfvec doubled the HNSW\/IVFFlat limit from 2,000 to 4,000: half the bytes per dimension, twice the capacity.<br>This is more than enough for the vast majority of use cases. Most embedding models in production today default to 768\u20131536 dimensions, well within the 2,000-dimension limit. This also proves how future proof the curent vector store implementation is on PostgreSQL. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-pgvectorscale-compresses-statistical-binary-quantization-sbq\">How pgvectorscale Compresses: Statistical Binary Quantization (SBQ)<\/h3>\n\n\n\n<p>pgvectorscale&#8217;s DiskANN from Tiger Data uses a method called <strong>Statistical Binary Quantization<\/strong> (SBQ). The idea is deceptively simple: for each dimension, replace the float value with a 1 or 2-bit code.<\/p>\n\n\n\n<p><strong>1-bit mode<\/strong> (default for dimensions &gt;= 900): Each dimension is compressed to a single bit. But not naively &#8212; standard binary quantization uses 0.0 as the threshold (positive = 1, negative = 0), which works poorly because real embedding distributions are rarely centered on zero. SBQ instead computes the <strong>per-dimension mean<\/strong> across all vectors during index build and uses that as the threshold:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nif value &amp;gt; mean_of_this_dimension \u2192 1\nelse \u2192 0\n<\/pre><\/div>\n\n\n<p>A 3,072-dimension float32 vector (12,288 bytes) becomes a 3,072-bit string (384 bytes). That&#8217;s <strong>32x compression<\/strong>. During search, the query vector is also SBQ-encoded, and distances are computed using <strong>XOR + popcount<\/strong> on the bit strings &#8212; which modern CPUs execute in a single instruction.<\/p>\n\n\n\n<p><strong>2-bit mode<\/strong> (default for dimensions &lt; 900): Each dimension gets two bits, encoding four &#8220;zones&#8221; based on the z-score (how many standard deviations from the mean). This gives finer granularity at 16x compression instead of 32x.<\/p>\n\n\n\n<p>The accuracy loss is real but small. On common benchmarks, SBQ achieves 96-99% recall compared to exact search. The rescore step (controlled by <code>diskann.query_rescore<\/code>, default 50) compensates: after the graph traversal finds the top-50 candidates using quantized distances, pgvectorscale fetches the <strong>full-precision vectors from the heap<\/strong> and re-computes exact distances to produce the final top-10.<\/p>\n\n\n\n<p><strong>What&#8217;s stored where:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Location<\/th><th>What&#8217;s stored<\/th><th>Accessed when<\/th><\/tr><\/thead><tbody><tr><td><strong>Index pages<\/strong><\/td><td>SBQ-compressed vectors + graph edges<\/td><td>Every query (graph traversal)<\/td><\/tr><tr><td><strong>Heap (table)<\/strong><\/td><td>Full float32 vectors<\/td><td>Only during rescore (top-N candidates)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This two-tier architecture is why DiskANN achieves 21 MB index size: the index only stores 384-byte compressed vectors, not 12 KB originals. The full vectors stay in the table, fetched only for the final re-ranking of a few dozen candidates.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-microsoft-s-pg-diskann-a-different-approach\">Microsoft&#8217;s pg_diskann: A Different Approach<\/h3>\n\n\n\n<p>There&#8217;s a second DiskANN implementation for PostgreSQL: Microsoft&#8217;s <strong>pg_diskann<\/strong>, currently documented and distributed for <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/postgresql\/extensions\/how-to-use-pgdiskann\">Azure Database for PostgreSQL Flexible Server<\/a>. It uses the same Vamana graph algorithm but a fundamentally different compression strategy: <strong>Product Quantization (PQ)<\/strong>.<\/p>\n\n\n\n<p>Where SBQ asks &#8220;is this dimension above or below the mean?&#8221;, Product Quantization asks &#8220;which of 16 codewords best represents this group of dimensions?&#8221;<\/p>\n\n\n\n<p>Here&#8217;s how PQ works, step by step:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Divide the vector into chunks.<\/strong> A 3,072-dimension vector is split into, say, 1,024 chunks of 3 dimensions each.<\/li>\n\n\n\n<li><strong>Train a codebook per chunk.<\/strong> For each chunk, k-means clustering finds 16 representative codewords (centroids). Why 16? Because 16 values fit in <strong>4 bits<\/strong> &#8212; a single hex digit.<\/li>\n\n\n\n<li><strong>Encode each chunk as a 4-bit symbol.<\/strong> During index build, each chunk of 3 dimensions is replaced by the index (0-15) of its closest codeword. The entire 3,072-dimension vector becomes 1,024 symbols of 4 bits each = <strong>512 bytes<\/strong>.<\/li>\n\n\n\n<li><strong>Decode via lookup table at query time.<\/strong> To compute the distance to a query vector, you pre-compute the distance from the query to all 16 codewords for each chunk, creating a <strong>16-row x 1,024-column lookup table<\/strong>. Then for each stored vector, you sum up the table entries corresponding to its symbols. No floating-point multiplication needed &#8212; just table lookups and additions.<\/li>\n<\/ol>\n\n\n\n<p>The compression is dramatic: 3,072 dimensions * 4 bytes = 12,288 bytes \u2192 512 bytes with PQ. That&#8217;s <strong>24x compression<\/strong>, in the same ballpark as SBQ&#8217;s 32x.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-comparing-the-two-diskann-implementations\">Comparing the Two DiskANN Implementations<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>pgvectorscale (Timescale)<\/th><th>pg_diskann (Microsoft)<\/th><\/tr><\/thead><tbody><tr><td><strong>Compression<\/strong><\/td><td>SBQ (1-2 bits\/dim)<\/td><td>Product Quantization (4 bits\/chunk)<\/td><\/tr><tr><td><strong>Compression ratio<\/strong><\/td><td>32x (1-bit) or 16x (2-bit)<\/td><td>~24x (depends on chunks)<\/td><\/tr><tr><td><strong>How it works<\/strong><\/td><td>Per-dimension thresholding<\/td><td>Codebook lookup per chunk<\/td><\/tr><tr><td><strong>Distance computation<\/strong><\/td><td>XOR + popcount (very fast)<\/td><td>Table lookup + sum<\/td><\/tr><tr><td><strong>Trainable<\/strong><\/td><td>Minimal (just means + stddev)<\/td><td>Heavy (k-means per chunk)<\/td><\/tr><tr><td><strong>Max dimensions<\/strong><\/td><td>16,000<\/td><td>16,000<\/td><\/tr><tr><td><strong>Availability<\/strong><\/td><td>Open source, self-hosted<\/td><td>Azure Database for PostgreSQL<\/td><\/tr><tr><td><strong>License<\/strong><\/td><td>PostgreSQL License<\/td><td>Distributed via Azure<\/td><\/tr><tr><td><strong>Iterative scan<\/strong><\/td><td>No<\/td><td>Yes (relaxed\/strict, ON by default)<\/td><\/tr><tr><td><strong>PG version<\/strong><\/td><td>PG 14-18 (self-hosted)<\/td><td>Azure DB for PostgreSQL<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Both implementations share the core DiskANN algorithm (Vamana graph) and the two-phase search pattern (compressed scan + full-precision rescore). The difference is <em>how<\/em> they compress:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SBQ<\/strong> is simpler and faster to build (just compute means). It&#8217;s a blunt instrument &#8212; 1 bit per dimension loses a lot of information, but XOR + popcount is blazingly fast, and the rescore step recovers accuracy.<\/li>\n\n\n\n<li><strong>PQ<\/strong> is more sophisticated and retains more information per bit (a 4-bit symbol captures relationships between groups of dimensions). It&#8217;s slower to build (k-means training) but can achieve better recall at the same compression ratio, especially for vectors with correlated dimensions.<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> If you&#8217;re self-hosting PostgreSQL, pgvectorscale is your DiskANN option &#8212; open source, well-maintained, and the SBQ compression is effective. If you&#8217;re on Azure Database for PostgreSQL, you also have pg_diskann, whose PQ compression may give better recall on very high-dimensional data. The underlying algorithm is the same; the compression strategy is the difference.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-iterative-scans\">Iterative Scans<\/h2>\n\n\n\n<p>This is the most important query-time feature added to pgvector since HNSW support. If you take one thing from this post, let it be this section.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-problem\">The Problem<\/h3>\n\n\n\n<p>Vector search indexes return the K nearest neighbors, then PostgreSQL applies your WHERE clause. If your filter is selective, you get fewer results than requested.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Our dataset has 9 categories with varying selectivity\n  category   | cnt   | % of 25K\n-------------+-------+---------\n History     | 8719  | 34.9%\n General     | 6221  | 24.9%\n Science     | 2232  |  8.9%\n Mathematics |  116  |  0.5%\n<\/pre><\/div>\n\n\n<p>When you search for the 10 nearest &#8220;Science&#8221; articles, the HNSW index returns its ef_search nearest neighbors (say, 40), PostgreSQL filters to keep only Science articles, and you get however many matched. For Science (8.9%), you&#8217;ll usually get your 10 results. For Mathematics (0.5%), you won&#8217;t.<\/p>\n\n\n\n<p>Let&#8217;s prove it. Here&#8217;s a search for Mathematics articles with ef_search=40, forcing the HNSW index path. (We disable sequential scan here to force the index path on our small 25K dataset. On production tables with 100K+ rows, the optimizer would naturally choose the index without this hint.)<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET enable_seqscan = off;\nSET hnsw.iterative_scan = &#039;off&#039;;\nSET hnsw.ef_search = 40;\n\nSELECT id, title, category,\n       content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1) AS distance\nFROM articles\nWHERE category = &#039;Mathematics&#039;\nORDER BY content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1)\nLIMIT 10;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n  Filter: (category = &#039;Mathematics&#039;::text)\n  Rows Removed by Filter: 40\n  Index Searches: 1\nExecution Time: 0.766 ms\n(0 rows)\n<\/pre><\/div>\n\n\n<p><strong>Zero rows returned.<\/strong> The index fetched 40 candidates, all 40 were filtered out (none were Mathematics), and the query silently returned an empty result set. This is the core problem: your application asked for 10 results and got 0.<\/p>\n\n\n\n<p>Before iterative scans, you had two bad options:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Over-fetch<\/strong> (LIMIT 1000) and hope enough rows match &#8212; wasteful and unreliable<\/li>\n\n\n\n<li><strong>Sequential scan<\/strong> &#8212; correct but slow on large tables<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-solution-iterative-index-scans\">The Solution: Iterative Index Scans<\/h3>\n\n\n\n<p>pgvector 0.8.0 introduced iterative scans. Instead of fetching a fixed batch and filtering, the index keeps fetching more candidates until the filter is satisfied.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET enable_seqscan = off;\nSET hnsw.iterative_scan = &#039;relaxed_order&#039;;\nSET hnsw.ef_search = 40;\n\nSELECT id, title, category,\n       content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1) AS distance\nFROM articles\nWHERE category = &#039;Mathematics&#039;\nORDER BY content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1)\nLIMIT 10;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n  Filter: (category = &#039;Mathematics&#039;::text)\n  Rows Removed by Filter: 6809\n  Index Searches: 171\nExecution Time: 4235.805 ms\n(10 rows)\n<\/pre><\/div>\n\n\n<p><strong>10 rows returned.<\/strong> The index scanned 171 times, examined 6,819 candidates, filtered out 6,809 non-Mathematics rows, and delivered exactly 10 results. It took 4.2 seconds &#8212; much slower than the 0.8 ms empty result &#8212; but you got a <strong>correct<\/strong> answer instead of silence.<\/p>\n\n\n\n<p>At 6,819 tuples scanned, this was well within the default <code>max_scan_tuples<\/code> of 20,000. On a table with millions of rows and the same 0.5% selectivity, the scan might hit the 20,000 limit before finding 10 matches &#8212; you&#8217;d get a partial result set. That&#8217;s the trade-off the safety valve makes: bounded latency vs guaranteed result count.<\/p>\n\n\n\n<p>That 4.2-second cost reflects the extreme selectivity: Mathematics is only 0.5% of the data, so the index had to traverse deep into the graph. For moderate selectivity like Science (8.9%), the overhead is negligible:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET hnsw.iterative_scan = &#039;relaxed_order&#039;;\nSET hnsw.ef_search = 40;\n\nSELECT ... WHERE category = &#039;Science&#039; ORDER BY ... LIMIT 10;\n\nIndex Scan using idx_content_hnsw_halfvec on articles\n  Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n  Filter: (category = &#039;Science&#039;::text)\n  Rows Removed by Filter: 26\n  Index Searches: 1\nExecution Time: 1.058 ms\n<\/pre><\/div>\n\n\n<p>Same feature, but for Science the index found 10 matching rows in a single search pass &#8212; no extra work needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-two-modes\">Two Modes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><code>relaxed_order<\/code><\/strong>: Results are approximately ordered by distance. Slightly faster. Good enough for most use cases.<\/li>\n\n\n\n<li><strong><code>strict_order<\/code><\/strong>: Results are exactly ordered by distance. Slightly slower. Use when ranking precision matters. SET hnsw.iterative_scan = &#8216;strict_order&#8217;;<br>&#8212; Execution Time: 0.885 ms<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-safety-valve-max-scan-tuples\">The Safety Valve: max_scan_tuples<\/h3>\n\n\n\n<p>To prevent runaway scans on extremely selective filters (imagine filtering for a category that has 1 row in 10 million), there&#8217;s a safety limit:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET hnsw.max_scan_tuples = 500;   -- Restrictive: stop after 500 index tuples\nSET hnsw.max_scan_tuples = 20000; -- Default: generous enough for most workloads\nSET hnsw.max_scan_tuples = 0;     -- Unlimited (use with caution)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-ivfflat-too\">IVFFlat Too<\/h3>\n\n\n\n<p>Same concept, different GUC prefix:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET ivfflat.iterative_scan = &#039;relaxed_order&#039;;\nSET ivfflat.probes = 3;\n\nSELECT ... WHERE category = &#039;Science&#039; ORDER BY ... LIMIT 10;\n-- Execution Time: 2.001 ms\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-filtered-results-in-action\">Filtered Results in Action<\/h3>\n\n\n\n<p>With iterative scan, every result matches the filter:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET hnsw.iterative_scan = &#039;relaxed_order&#039;;\nSET hnsw.ef_search = 100;\n\nSELECT id, title, category,\n       round((content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1))::numeric, 4) AS distance\nFROM articles\nWHERE category = &#039;Science&#039;\nORDER BY content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1)\nLIMIT 10;\n\n  id   |  title   | category | distance\n-------+----------+----------+----------\n     1 | April    | Science  |   0.0000\n  7862 | April 23 | Science  |   0.2953\n  9878 | April 25 | Science  |   0.3076\n   469 | May      | Science  |   0.3082\n  9880 | April 24 | Science  |   0.3451\n   402 | July     | Science  |   0.3453\n  5156 | April 4  | Science  |   0.3531\n  9530 | April 7  | Science  |   0.3588\n 34906 | 2013     | Science  |   0.3674\n  9149 | April 8  | Science  |   0.3690\n<\/pre><\/div>\n\n\n<p>All 10 results are Science. All sorted by cosine distance. No over-fetching, no sequential scan. <em>(The titles are Wikipedia date articles &#8212; &#8220;April&#8221;, &#8220;May&#8221;, etc. &#8212; that happen to be classified under Science in this dataset.)<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-multi-filter-combinations\">Multi-Filter Combinations<\/h3>\n\n\n\n<p>Iterative scans work with compound WHERE clauses too:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSET hnsw.iterative_scan = &#039;relaxed_order&#039;;\nSET hnsw.ef_search = 100;\n\nSELECT id, title, category, word_count\nFROM articles\nWHERE category = &#039;Science&#039; AND word_count &gt; 1000\nORDER BY content_halfvec &lt;=&gt; (SELECT content_halfvec FROM articles WHERE id = 1)\nLIMIT 10;\n\nLimit  (actual time=7.098..7.105 rows=10.00 loops=1)\n  -&gt;  Sort  (actual time=7.097..7.101 rows=10.00 loops=1)\n        Sort Key: ((content_halfvec &lt;=&gt; (InitPlan 1).col1))\n        Sort Method: top-N heapsort  Memory: 25kB\n        -&gt;  Bitmap Heap Scan on articles  (actual time=0.690..7.027 rows=444.00 loops=1)\n              Recheck Cond: ((word_count &gt; 1000) AND (category = &#039;Science&#039;::text))\n              -&gt;  BitmapAnd  (actual time=0.641..0.643 rows=0.00 loops=1)\n                    -&gt;  Bitmap Index Scan on idx_articles_word_count  (rows=1806.00)\n                    -&gt;  Bitmap Index Scan on idx_articles_category    (rows=2232.00)\nExecution Time: 7.542 ms\n<\/pre><\/div>\n\n\n<p>Here the PostgreSQL optimizer did something clever: it combined two B-tree indexes (BitmapAnd) instead of using the HNSW iterative scan. On a 25K-row table that fits in memory, this is often cheaper. At scale (millions of rows), the iterative scan path wins because the bitmap approach requires reading too many heap pages.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> Iterative scans are the answer to &#8220;vector search + WHERE doesn&#8217;t work.&#8221; Enable them with <code>SET LOCAL hnsw.iterative_scan = 'relaxed_order'<\/code> inside transactions. The safety valve <code>max_scan_tuples = 20000<\/code> is a sensible default.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-quantization-and-storage\">Quantization and Storage<\/h2>\n\n\n\n<p>With 3,072-dimension embeddings, storage is the elephant in the room. Here&#8217;s what each representation costs per row:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT\n  pg_size_pretty(avg(pg_column_size(content_vector)))  AS vector_size,\n  pg_size_pretty(avg(pg_column_size(content_halfvec))) AS halfvec_size,\n  pg_size_pretty(avg(pg_column_size(content_bq)))      AS binary_size\nFROM articles WHERE content_vector IS NOT NULL;\n\n vector_size | halfvec_size | binary_size\n-------------+--------------+-------------\n 12 kB       | 6148 bytes   | 392 bytes\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Type<\/th><th>Bytes per dimension<\/th><th>Per-row (3072d)<\/th><th>Savings<\/th><\/tr><\/thead><tbody><tr><td><code>vector(3072)<\/code><\/td><td>4 bytes (float32)<\/td><td>12,288 bytes<\/td><td>baseline<\/td><\/tr><tr><td><code>halfvec(3072)<\/code><\/td><td>2 bytes (float16)<\/td><td>6,144 bytes<\/td><td><strong>50%<\/strong><\/td><\/tr><tr><td><code>bit(3072)<\/code><\/td><td>1\/8 byte (1 bit)<\/td><td>384 bytes<\/td><td><strong>97%<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>At 1 million rows:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Type<\/th><th>Column storage<\/th><th>HNSW index<\/th><\/tr><\/thead><tbody><tr><td><code>vector(3072)<\/code><\/td><td>~12 GB<\/td><td>N\/A (2000-dim limit)<\/td><\/tr><tr><td><code>halfvec(3072)<\/code><\/td><td>~6 GB<\/td><td>~7.7 GB<\/td><\/tr><tr><td><code>bit(3072)<\/code><\/td><td>~0.4 GB<\/td><td>~0.8 GB (bit_hamming_ops)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The storage math is brutal for high-dimensional embeddings. This is why DiskANN&#8217;s built-in SBQ compression matters: it gets you to 21 MB where HNSW on halfvec costs 193 MB.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-binary-quantize-re-ranking\">Binary Quantize + Re-Ranking<\/h3>\n\n\n\n<p>Binary quantization crushes each dimension to a single bit (positive = 1, negative = 0). It&#8217;s lossy, but very fast for coarse filtering. The pattern is a two-phase search:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Phase 1: Hamming distance on binary (fast, coarse) \u2192 100 candidates\n-- Phase 2: Cosine distance on full vector (precise) \u2192 10 results\n\nWITH coarse AS (\n    SELECT id, title, content_vector\n    FROM articles\n    WHERE content_bq IS NOT NULL\n    ORDER BY content_bq &lt;~&gt; (\n        SELECT binary_quantize(content_vector)::bit(3072)\n        FROM articles WHERE id = 1\n    )\n    LIMIT 100\n)\nSELECT id, title,\n       content_vector &lt;=&gt; (SELECT content_vector FROM articles WHERE id = 1) AS distance\nFROM coarse\nORDER BY content_vector &lt;=&gt; (SELECT content_vector FROM articles WHERE id = 1)\nLIMIT 10;\n\nLimit  (actual time=29.021..29.026 rows=10.00 loops=1)\n  -&gt;  Sort on coarse  (actual time=29.020..29.023 rows=10.00 loops=1)\n        -&gt;  Subquery Scan  (actual time=27.805..28.999 rows=100.00 loops=1)\n              -&gt;  Sort by content_bq &lt;~&gt;  (actual time=27.744..27.754 rows=100.00 loops=1)\n                    -&gt;  Seq Scan on articles  (actual time=0.435..24.288 rows=24700.00 loops=1)\nExecution Time: 29.127 ms\n<\/pre><\/div>\n\n\n<p>29 ms without an index on <code>content_bq<\/code>. With an HNSW index using <code>bit_hamming_ops<\/code>, Phase 1 would be sub-millisecond. The re-ranking in Phase 2 only touches 100 full vectors instead of 25,000.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>DBA Takeaway:<\/strong> Use halfvec as your default for 3072-dimension embeddings. Use binary quantize + re-ranking when you need to search billions of rows and can tolerate a two-phase approach. DiskANN&#8217;s SBQ gives you similar compression automatically.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-operators-and-sargability\">Operators and Sargability<\/h2>\n\n\n\n<p>pgvector provides four distance operators:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Operator<\/th><th>Distance<\/th><th>Operator Class<\/th><th>Use When<\/th><\/tr><\/thead><tbody><tr><td><code>&lt;=&gt;<\/code><\/td><td>Cosine<\/td><td><code>vector_cosine_ops<\/code><\/td><td>Normalized embeddings (most common)<\/td><\/tr><tr><td><code>&lt;-&gt;<\/code><\/td><td>L2 (Euclidean)<\/td><td><code>vector_l2_ops<\/code><\/td><td>Absolute distance matters<\/td><\/tr><tr><td><code>&lt;#&gt;<\/code><\/td><td>Inner Product (negative)<\/td><td><code>vector_ip_ops<\/code><\/td><td>Pre-normalized, slight speed edge<\/td><\/tr><tr><td><code>&lt;+&gt;<\/code><\/td><td>L1 (Manhattan)<\/td><td><code>vector_l1_ops<\/code><\/td><td>Sparse-like behavior on dense vectors<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-wrong-operator-no-index\">Wrong Operator = No Index<\/h3>\n\n\n\n<p>This is the single most common mistake. If your index uses <code>halfvec_cosine_ops<\/code> but your query uses <code>&lt;-&gt;<\/code> (L2), the index <strong>cannot<\/strong> be used:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- CORRECT: cosine operator on cosine index \u2192 Index Scan\nEXPLAIN (COSTS OFF)\nSELECT ... ORDER BY content_halfvec &lt;=&gt; (...) LIMIT 10;\n\n Limit\n   -&gt;  Index Scan using idx_content_hnsw_halfvec on articles\n         Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n\n-- WRONG: L2 operator on cosine index \u2192 Seq Scan!\nEXPLAIN (COSTS OFF)\nSELECT ... ORDER BY content_halfvec &lt;-&gt; (...) LIMIT 10;\n\n Limit\n   -&gt;  Sort\n         Sort Key: ((articles.content_halfvec &lt;-&gt; (InitPlan 1).col1))\n         -&gt;  Seq Scan on articles\n<\/pre><\/div>\n\n\n<p>The planner can&#8217;t use a cosine-distance index for an L2-distance query. They&#8217;re different metrics with different orderings. If you see an unexpected Seq Scan on a vector query, <strong>check your operator first<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-sargable-queries-the-cross-join-trap\">Sargable Queries: The Cross-Join Trap<\/h3>\n\n\n\n<p>This is the pattern I see most often in the wild, and it&#8217;s wrong:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- BAD: cross-join prevents index use\nSELECT a.id, a.title,\n       a.content_halfvec &lt;=&gt; b.content_halfvec AS distance\nFROM articles a, articles b\nWHERE b.id = 1\nORDER BY a.content_halfvec &lt;=&gt; b.content_halfvec\nLIMIT 10;\n\nLimit\n  -&gt;  Sort\n        Sort Key: ((a.content_halfvec &lt;=&gt; b.content_halfvec))\n        -&gt;  Nested Loop\n              -&gt;  Index Scan using articles_pkey on articles b\n                    Index Cond: (id = 1)\n              -&gt;  Seq Scan on articles a     \u2190 NO INDEX!\n<\/pre><\/div>\n\n\n<p>The planner sees <code>a.content_halfvec &lt;=&gt; b.content_halfvec<\/code> as a <strong>join condition<\/strong>, not an index-scan ordering. It can&#8217;t push the ORDER BY into the vector index because the right-hand side comes from a different table reference.<\/p>\n\n\n\n<p>The fix: use a <strong>scalar subquery<\/strong>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- GOOD: scalar subquery \u2192 index used\nSELECT id, title,\n       content_halfvec &lt;=&gt; (\n           SELECT content_halfvec FROM articles WHERE id = 1\n       ) AS distance\nFROM articles\nORDER BY content_halfvec &lt;=&gt; (\n    SELECT content_halfvec FROM articles WHERE id = 1\n)\nLIMIT 10;\n\nLimit\n  InitPlan 1\n    -&gt;  Index Scan using articles_pkey on articles articles_1\n          Index Cond: (id = 1)\n  -&gt;  Index Scan using idx_content_hnsw_halfvec on articles\n        Order By: (content_halfvec &lt;=&gt; (InitPlan 1).col1)\n<\/pre><\/div>\n\n\n<p>The scalar subquery is evaluated once (InitPlan), then the result is treated as a constant. Now the planner can push the ORDER BY into the HNSW index scan. Same results, but with the index instead of a sequential scan.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> For this nearest-neighbor ORDER BY pattern, always use scalar subqueries for vector distance calculations, never cross-joins. This is the vector-search equivalent of writing sargable WHERE clauses for B-tree indexes.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-partial-indexes\">Partial Indexes<\/h3>\n\n\n\n<p>If you frequently filter by a specific category, a partial index is dramatically more efficient:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE INDEX idx_content_hnsw_science\nON articles USING hnsw (content_halfvec halfvec_cosine_ops)\nWITH (m = 16, ef_construction = 64)\nWHERE category = &#039;Science&#039;;\n\nTime: 1420.207 ms (00:01.420)\n\n        indexname         |  size\n--------------------------+--------\n idx_content_hnsw_halfvec | 193 MB   \u2190 full table (25,000 rows)\n idx_content_hnsw_science |  17 MB   \u2190 Science only (2,232 rows)\n<\/pre><\/div>\n\n\n<p><strong>11x smaller, 20x faster to build.<\/strong> If your application queries consistently filter by a known set of categories, partial indexes are the single biggest optimization available. Build one per high-value category.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-monitoring\">Monitoring<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-index-inventory\">Index Inventory<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT indexname,\n       pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,\n       indexdef\nFROM pg_indexes\nWHERE tablename = &#039;articles&#039;\nORDER BY pg_relation_size(indexname::regclass) DESC;\n\n        indexname        |  size   | indexdef\n-------------------------+---------+-------------------------------------------\n idx_content_hnsw_halfvec| 193 MB  | USING hnsw ... WITH (m=&#039;16&#039;, ef_construction=&#039;64&#039;)\n idx_content_ivfflat     | 193 MB  | USING ivfflat ... WITH (lists=&#039;25&#039;)\n idx_content_diskann     | 21 MB   | USING diskann ... WITH (storage_layout=memory_optimized)\n articles_pkey           | 1384 kB | USING btree (id)\n idx_articles_category   | 1192 kB | USING btree (category)\n idx_articles_word_count | 904 kB  | USING btree (word_count)\n<\/pre><\/div>\n\n\n<p>Adding up the three vector indexes (193 + 193 + 21 = 407 MB) plus B-tree indexes (3 MB), the total index footprint is over <strong>410 MB<\/strong> for a <strong>90 MB<\/strong> table. The indexes are ~4.5x the data. This is typical for high-dimensional vector data &#8212; plan your storage accordingly.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Note:<\/strong> In practice you&#8217;d pick one vector index, not all three. With just HNSW + B-tree indexes, the ratio drops to ~2x.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-settings\">Settings<\/h3>\n\n\n\n<p>Know what settings exist and what they default to:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT name, setting, short_desc\nFROM pg_settings\nWHERE name ~ &#039;^(hnsw|ivfflat|diskann)\\.&#039;\nORDER BY name;\n\n                    name                    | setting | short_desc\n--------------------------------------------+---------+-------------------------------------------\n hnsw.ef_search                             | 40      | Dynamic candidate list size for search\n hnsw.iterative_scan                        | off     | Mode for iterative scans\n hnsw.max_scan_tuples                       | 20000   | Max tuples to visit for iterative scans\n hnsw.scan_mem_multiplier                   | 1       | Multiple of work_mem for iterative scans\n ivfflat.iterative_scan                     | off     | Mode for iterative scans\n ivfflat.max_probes                         | 32768   | Max probes for iterative scans\n ivfflat.probes                             | 1       | Number of probes\n diskann.query_search_list_size             | 100     | Search list size for queries\n diskann.query_rescore                      | 50      | Rescore candidates\n<\/pre><\/div>\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Takeaway:<\/strong> <code>hnsw.iterative_scan<\/code> and <code>ivfflat.iterative_scan<\/code> default to <code>off<\/code>. If your application relies on vector search with WHERE clauses, you need to explicitly enable iterative scans.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-access-method-capabilities\">Access Method Capabilities<\/h3>\n\n\n\n<p>Not sure which operator class works with which index type? Query the catalog:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT am.amname AS access_method,\n       opc.opcname AS operator_class,\n       t.typname AS data_type\nFROM pg_opclass opc\nJOIN pg_am am ON am.oid = opc.opcmethod\nJOIN pg_type t ON t.oid = opc.opcintype\nWHERE am.amname IN (&#039;hnsw&#039;, &#039;ivfflat&#039;, &#039;diskann&#039;)\nORDER BY am.amname, opc.opcname;\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Access Method<\/th><th>Data Types<\/th><th>Operator Classes<\/th><\/tr><\/thead><tbody><tr><td><strong>HNSW<\/strong><\/td><td>vector, halfvec, bit, sparsevec<\/td><td>18 classes (broadest support)<\/td><\/tr><tr><td><strong>IVFFlat<\/strong><\/td><td>vector, halfvec, bit<\/td><td>7 classes<\/td><\/tr><tr><td><strong>DiskANN<\/strong><\/td><td>vector only (+ label filtering)<\/td><td>4 classes<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>HNSW is the most versatile. DiskANN is the most constrained. IVFFlat falls in between.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-build-progress-monitoring\">Build Progress Monitoring<\/h3>\n\n\n\n<p>While building a large index:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT phase,\n       round(100.0 * blocks_done \/ nullif(blocks_total, 0), 1) AS pct_done,\n       tuples_done, tuples_total\nFROM pg_stat_progress_create_index;\n<\/pre><\/div>\n\n\n<p>This works for HNSW and IVFFlat builds (pgvector reports progress). DiskANN builds from pgvectorscale don&#8217;t currently report to this view.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-decision-guidelines\">Decision Guidelines<\/h2>\n\n\n\n<p>Here&#8217;s how to choose:<\/p>\n\n\n\n<p><strong>Start with HNSW<\/strong> unless you have a reason not to. It has the broadest operator support, good recall, and predictable performance. Use halfvec if your dimensions exceed 2,000.<\/p>\n\n\n\n<p><strong>Choose IVFFlat<\/strong> if build time matters more than query time. IVFFlat builds 5-6x faster than HNSW. If your data distribution shifts materially, plan a <code>REINDEX<\/code> to refresh clustering quality (centroid drift). A practical signal: if recall drops without any change in query patterns, or if you&#8217;ve inserted more than ~30% new rows since the last build, the centroids are likely stale. For a deeper look at how embedding model upgrades trigger reindexing at scale, see <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\">Embedding Versioning with pgvector<\/a>.<\/p>\n\n\n\n<p><strong>Choose DiskANN<\/strong> if storage is the constraint. The 9x compression is decisive at scale. It handles high dimensions natively (no halfvec needed) and supports <code>CONCURRENTLY<\/code> for production deployments.<\/p>\n\n\n\n<p><strong>Enable iterative scans<\/strong> for vector search + filtering in production. The trade-off is potentially higher latency on very selective filters (the index does more work to find matching rows), but that&#8217;s usually the right trade-off for correctness. Tune <code>max_scan_tuples<\/code> \/ <code>max_probes<\/code> to bound worst-case work. Use <code>relaxed_order<\/code> by default, <code>strict_order<\/code> when ranking precision matters.<\/p>\n\n\n\n<p><strong>Use partial indexes<\/strong> for category-specific searches. An 11x size reduction and 20x faster build is hard to argue with.<\/p>\n\n\n\n<p><strong>Use SET LOCAL<\/strong> for all vector parameter changes in production AFTER having tested them.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary\">Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>HNSW<\/th><th>IVFFlat<\/th><th>DiskANN (pgvectorscale)<\/th><th>DiskANN (pg_diskann)<\/th><\/tr><\/thead><tbody><tr><td><strong>Provider<\/strong><\/td><td>pgvector<\/td><td>pgvector<\/td><td>Timescale (open source)<\/td><td>Microsoft (Azure DB for PG)<\/td><\/tr><tr><td><strong>Max dims (vector)<\/strong><\/td><td>2,000<\/td><td>2,000<\/td><td>16,000<\/td><td>16,000<\/td><\/tr><tr><td><strong>Max dims (halfvec)<\/strong><\/td><td>4,000<\/td><td>4,000<\/td><td>N\/A<\/td><td>N\/A<\/td><\/tr><tr><td><strong>Compression<\/strong><\/td><td>Via halfvec<\/td><td>Via halfvec<\/td><td>SBQ (1-2 bits\/dim)<\/td><td>PQ (4 bits\/chunk)<\/td><\/tr><tr><td><strong>Build time (25K, 3072d)<\/strong><\/td><td>29s<\/td><td>5s<\/td><td>49s<\/td><td>N\/A<\/td><\/tr><tr><td><strong>Index size<\/strong><\/td><td>193 MB<\/td><td>193 MB<\/td><td><strong>21 MB<\/strong><\/td><td>Similar<\/td><\/tr><tr><td><strong>Query time<\/strong><\/td><td>2-6 ms<\/td><td>2-10 ms<\/td><td>3 ms<\/td><td>~3 ms<\/td><\/tr><tr><td><strong>Key tuning param<\/strong><\/td><td>ef_search<\/td><td>probes<\/td><td>query_search_list_size<\/td><td>search list \/ PQ params<\/td><\/tr><tr><td><strong>Iterative scan<\/strong><\/td><td>Yes<\/td><td>Yes<\/td><td>No<\/td><td>Yes (ON by default)<\/td><\/tr><tr><td><strong>CONCURRENTLY<\/strong><\/td><td>Yes<\/td><td>Yes<\/td><td>Yes (0.9.0+)<\/td><td>Yes<\/td><\/tr><tr><td><strong>Data types<\/strong><\/td><td>vector, halfvec, bit, sparsevec<\/td><td>vector, halfvec, bit<\/td><td>vector only<\/td><td>vector only<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"h-dimension-limits-are-largely-explained-by-postgresql-s-8-kb-page-size-and-encoding-density-exact-cutoffs-are-implementation-defined\">Dimension limits are largely explained by PostgreSQL&#8217;s 8 KB page size and encoding density (exact cutoffs are implementation-defined):<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Encoding<\/th><th>Bits\/dim<\/th><th>Theoretical max<\/th><th>Actual limit<\/th><th>Who uses it<\/th><\/tr><\/thead><tbody><tr><td>float32 (<code>vector<\/code>)<\/td><td>32<\/td><td>2,048<\/td><td>2,000<\/td><td>HNSW, IVFFlat<\/td><\/tr><tr><td>float16 (<code>halfvec<\/code>)<\/td><td>16<\/td><td>4,096<\/td><td>4,000<\/td><td>HNSW, IVFFlat<\/td><\/tr><tr><td>PQ symbol<\/td><td>4<\/td><td>16,384<\/td><td>16,000<\/td><td>pg_diskann<\/td><\/tr><tr><td>SBQ binary<\/td><td>1<\/td><td>65,536<\/td><td>16,000<\/td><td>pgvectorscale<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The compression story is ultimately a story about <strong>how many bits of information you need per dimension to navigate the index<\/strong>. pgvector stores full-precision values; DiskANN stores just enough to find the right neighborhood, then goes back to the heap for exact distances. <br>But here&#8217;s the thing: exact results are not necessarily what you want. In a RAG pipeline, the retrieved documents are context for a language model that will synthesize and rephrase an answer \u2014 not return rows verbatim. Whether your top-10 results are ranked by exact cosine distance or by a 96%-accurate approximation rarely changes the generated answer. The same is true for recommendations, semantic deduplication, and most classification workflows: the downstream consumer is tolerant of small ranking variations.<\/p>\n\n\n\n<p>The real production trade-off is not precision vs approximation \u2014 it&#8217;s <strong>the balance between retrieval speed, resource efficiency, and result quality at your scale<\/strong>. An HNSW index that doesn&#8217;t fit in <code>shared_buffers<\/code> and hits disk on every query will give you worse <em>effective<\/em> results than a DiskANN index that stays cached and returns slightly less precise distances in a fraction of the time. The best retrieval is the one that actually runs within your latency budget.<\/p>\n\n\n\n<p>The lab with all SQL scripts, and Python embedding pipeline are available here :  <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\/tree\/main\/lab\/06_pgvector_indexes\"><code>lab\/06_pgvector_indexes<\/code><\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p><em>All benchmarks: PostgreSQL 18, pgvector 0.8.1, pgvectorscale 0.9.0, 25K Wikipedia articles, 3072d text-embedding-3-large embeddings, (4 vCPUs, 8 GB RAM).<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In Part 1 of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search &#8212; but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production. This post is about [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":37679,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[83],"tags":[3685,3523,2602,3678],"type_dbi":[],"class_list":["post-43256","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-ai-llm","tag-pgvector","tag-postgresql-2","tag-rag"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>pgvector, a guide for DBA - Part 2: Indexes (update march 2026) - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"pgvector, a guide for DBA - Part 2: Indexes (update march 2026)\" \/>\n<meta property=\"og:description\" content=\"Introduction In Part 1 of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search &#8212; but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production. This post is about [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-01T19:09:15+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-01T21:35:59+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"18 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"pgvector, a guide for DBA &#8211; Part 2: Indexes (update march 2026)\",\"datePublished\":\"2026-03-01T19:09:15+00:00\",\"dateModified\":\"2026-03-01T21:35:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\"},\"wordCount\":4401,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"keywords\":[\"AI\/LLM\",\"pgvector\",\"postgresql\",\"RAG\"],\"articleSection\":[\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\",\"name\":\"pgvector, a guide for DBA - Part 2: Indexes (update march 2026) - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"datePublished\":\"2026-03-01T19:09:15+00:00\",\"dateModified\":\"2026-03-01T21:35:59+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"contentUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"width\":1024,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"pgvector, a guide for DBA &#8211; Part 2: Indexes (update march 2026)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"pgvector, a guide for DBA - Part 2: Indexes (update march 2026) - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/","og_locale":"en_US","og_type":"article","og_title":"pgvector, a guide for DBA - Part 2: Indexes (update march 2026)","og_description":"Introduction In Part 1 of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search &#8212; but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production. This post is about [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/","og_site_name":"dbi Blog","article_published_time":"2026-03-01T19:09:15+00:00","article_modified_time":"2026-03-01T21:35:59+00:00","og_image":[{"width":1024,"height":1024,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","type":"image\/png"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"18 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"pgvector, a guide for DBA &#8211; Part 2: Indexes (update march 2026)","datePublished":"2026-03-01T19:09:15+00:00","dateModified":"2026-03-01T21:35:59+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/"},"wordCount":4401,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","keywords":["AI\/LLM","pgvector","postgresql","RAG"],"articleSection":["PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/","url":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/","name":"pgvector, a guide for DBA - Part 2: Indexes (update march 2026) - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","datePublished":"2026-03-01T19:09:15+00:00","dateModified":"2026-03-01T21:35:59+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","width":1024,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"pgvector, a guide for DBA &#8211; Part 2: Indexes (update march 2026)"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43256","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=43256"}],"version-history":[{"count":23,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43256\/revisions"}],"predecessor-version":[{"id":43279,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43256\/revisions\/43279"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/37679"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=43256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=43256"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=43256"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=43256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}