Introduction
In Part 1 of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search — but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production.
This post is about what comes next: indexes. Specifically, the three index families in the pgvector ecosystem as of February 2026 (HNSW, IVFFlat, and DiskANN), including two DiskANN implementations targeting different deployment models, what they’re good at, where they break, and the patterns you need, whether you’re the DBA tuning them or the developer looking to understand the the strenghts of PostgreSQL as a vector store.
Everything in this post was tested on public dataset: 25,000 Wikipedia articles embedded with OpenAI’s text-embedding-3-large at 3,072 dimensions, the maximum the model supports. The high number of dimension is a choice, to highlight some limitations for pedagogical reasons. You would be ok running and testing with lower dimensions or other embedding models, you might want to look into the RAG series, I will probably make a blog post on how to test embedding models against your data sets.
The environment is PostgreSQL 18 with pgvector 0.8.1 and pgvectorscale 0.9.0.
All the SQL scripts, Python code, and Docker configuration are in the companion lab: lab/06_pgvector_indexes.
The Index Types
Before we dive in, here’s the landscape. pgvector ships with two built-in index types (HNSW and IVFFlat), and two DiskANN implementations are available from different vendors:
| HNSW | IVFFlat | DiskANN (pgvectorscale) | DiskANN (pg_diskann) | |
|---|---|---|---|---|
| Provider | pgvector | pgvector | Timescale | Microsoft |
| Availability | Built-in | Built-in | Open source, self-hosted | Azure DB for PostgreSQL |
| Algorithm | Multi-layer graph | Voronoi cell partitioning | Vamana graph + SBQ | Vamana graph + PQ |
| Best for | General purpose | Fast build | Storage-constrained | Azure + high recall |
| Build time (25K, 3072d) | 29s | 5s | 49s | N/A (Azure) |
| Index size | 193 MB | 193 MB | 21 MB | Similar |
| Query time | 2-6 ms | 2-10 ms | 3 ms | ~3 ms |
That pgvectorscale number is not a typo. 21 MB vs 193 MB for the same data. W
Note: This post uses pgvectorscale for all DiskANN benchmarks since it’s the open-source, self-hosted option. We’ll compare both DiskANN implementations in detail in Section 3. pg_diskann is available only for Azure Flexible Server for PostgreSQL, the managed instance service from Microsoft.
HNSW: The Default Choice
HNSW (Hierarchical Navigable Small World) is the most commonly recommended index type for vector search. It builds a multi-layered graph where each node connects to its nearest neighbors, and search navigates this graph from top to bottom.
The 2,000-Dimension Wall
Here’s the first thing you’ll hit with modern embedding models:
CREATE INDEX idx_content_hnsw
ON articles USING hnsw (content_vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
ERROR: column cannot have more than 2000 dimensions for hnsw index
The vector type in pgvector has a 2,000-dimension limit for HNSW indexes. If you’re using text-embedding-3-large (3,072 dimensions), or voyage-3-large at its 2,048-dimension setting, this is a blocker.
The workaround: halfvec.
-- Step 1: Store a half-precision copy
ALTER TABLE articles ADD COLUMN content_halfvec halfvec(3072);
UPDATE articles SET content_halfvec = content_vector::halfvec;
-- Step 2: Index the halfvec column (limit: 4,000 dimensions)
CREATE INDEX idx_content_hnsw_halfvec
ON articles USING hnsw (content_halfvec halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
Time: 28974.392 ms (00:28.974)
SELECT pg_size_pretty(pg_relation_size('idx_content_hnsw_halfvec')) AS hnsw_size;
hnsw_size
-----------
193 MB
29 seconds to build, 193 MB for 25,000 articles at 3,072 dimensions. That’s roughly 8 KB per row in the index alone.
Important:
halfvecstores each dimension in 2 bytes instead of 4. You lose some floating-point precision, but in practice the recall difference is negligible for similarity search. The storage savings are real: your halfvec column is 6 KB per row vs 12 KB for the full vector.Alternative: Instead of a separate column, you can create an expression index that casts on the fly:
CREATE INDEX ... ON articles USING hnsw ((content_vector::halfvec(3072)) halfvec_cosine_ops);The trade-off is that your queries must use the matching expression (content_vector::halfvec(3072) <=> ...) for the planner to pick it up, which is harder to read in application code. The separate column approach gives cleaner queries.
Tuning ef_search: The Recall vs Speed Dial
ef_search controls how many candidates the HNSW algorithm considers during search. Higher values mean more candidates examined, better recall, but more work. The default is 40.
-- ef_search = 40 (default)
SET hnsw.ef_search = 40;
EXPLAIN ANALYZE
SELECT id, title, content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
) AS distance
FROM articles
ORDER BY content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
)
LIMIT 10;
Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
Index Searches: 1
Buffers: shared hit=551
Execution Time: 6.004 ms
-- ef_search = 100
SET hnsw.ef_search = 100;
Index Scan using idx_content_hnsw_halfvec on articles
Buffers: shared hit=716
Execution Time: 2.365 ms
-- ef_search = 200
SET hnsw.ef_search = 200;
Index Scan using idx_content_hnsw_halfvec on articles
Buffers: shared hit=883
Execution Time: 2.542 ms
Wait — ef_search=100 was faster than ef_search=40? Not really. Those numbers came from a warm cache (shared hit=551, zero disk reads). The apparent speedup is a cache warming effect, not a property of the algorithm. To prove this, I restarted PostgreSQL and ran the full sweep from cold:
| ef_search | Execution Time | Buffers | read (disk) |
|---|---|---|---|
| 40 (cold) | 91 ms | hit=189, read=362 | 362 pages from disk |
| 100 | 33 ms | hit=616, read=132 | fewer cold pages |
| 200 | 22 ms | hit=850, read=65 | even fewer |
| 40 (warm) | 0.8 ms | hit=551, read=0 | all cached |
The second run at ef_search=40 clocked 0.8 ms — faster than any ef_search=100 or 200 run. On a warm cache, all three (40/100/200) land in the 0.8-5 ms range. The variance is cache state, not algorithmic shortcuts. The real cost cliff is at ef_search=400 where the optimizer switches plans entirely:
-- ef_search = 400
SET hnsw.ef_search = 400;
Sort (actual time=364.693..364.695 rows=10.00 loops=1)
Sort Key: ((articles.content_halfvec <=> (InitPlan 1).col1))
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on articles (actual time=0.143..356.482 rows=24700.00 loops=1)
Filter: (content_halfvec IS NOT NULL)
Rows Removed by Filter: 300
Execution Time: 364.724 ms
The planner chose a Seq Scan + Sort path. At ef_search=400, PostgreSQL’s cost model estimated the index scan would be more expensive than reading every row sequentially, and the query went from 2.5 ms to 365 ms.
This is your optimizer flip-flop. It’s not a bug — it’s the planner doing its job. But it means you need to be aware of the threshold.
| ef_search | Execution Time | Plan | Buffers |
|---|---|---|---|
| 40 | 6.0 ms | Index Scan | 551 |
| 100 | 2.4 ms | Index Scan | 716 |
| 200 | 2.5 ms | Index Scan | 883 |
| 400 | 365 ms | Seq Scan | 209,192 |
DBA Takeaway: For HNSW on halfvec(3072), stay in the ef_search 40-200 range. Past that, you’re fighting the optimizer. If you need ef_search > 200 for recall, you probably need a bigger
mparameter at build time.
Build Parameters: m and ef_construction
m is the number of connections per node in the graph. ef_construction is the candidate list size during build. Higher values = better graph quality but slower builds and (potentially) larger indexes.
-- m=16, ef_construction=64 (default-ish)
Time: 28,974 ms Size: 193 MB
-- m=32, ef_construction=128
Time: 54,077 ms Size: 193 MB
At 25K rows, doubling m doubled the build time but didn’t change the index size. The size effect becomes more visible at larger scales. In general: start with m=16, ef_construction=64 and only increase if recall is insufficient after tuning ef_search.
IVFFlat: Fast Build
IVFFlat (Inverted File with Flat quantization) partitions the vector space into Voronoi cells using k-means clustering, then searches only the cells closest to the query vector.
Same Dimension Limit
IVFFlat has the same 2,000-dimension limit as HNSW for the vector type. Same workaround:
CREATE INDEX idx_content_ivfflat
ON articles USING ivfflat (content_halfvec halfvec_cosine_ops)
WITH (lists = 25);
Time: 5008.765 ms (00:05.009)
SELECT pg_size_pretty(pg_relation_size('idx_content_ivfflat'));
-- 193 MB
5 seconds to build vs 29 for HNSW. The index size is nearly identical (193 MB vs 193 MB), but IVFFlat builds 5.8x faster.
The lists parameter controls the number of Voronoi cells. The pgvector documentation recommends: rows / 1000 for tables up to 1M rows, sqrt(rows) for larger tables. For 25,000 articles: 25000 / 1000 = 25 lists, giving roughly 1,000 rows per cell. A common mistake is applying sqrt(rows) to small tables — that gives 158 lists here, creating cells with only ~167 rows each, which fragments the index and causes the optimizer to flip to sequential scan at surprisingly low probe counts.
Tuning Probes: How Many Cells to Search
probes controls how many Voronoi cells are searched at query time. Default is 1 — fast but low recall. A good starting point is sqrt(lists). Here’s the sweep with lists=25:
SET ivfflat.probes = 1;
-- Index Scan, Execution Time: 1.0 ms, Buffers: 571
SET ivfflat.probes = 2;
-- Index Scan, Execution Time: 3.7 ms, Buffers: 1,944
SET ivfflat.probes = 3;
-- Index Scan, Execution Time: 4.4 ms, Buffers: 2,793
SET ivfflat.probes = 4;
-- Index Scan, Execution Time: 5.9 ms, Buffers: 3,937
And then:
SET ivfflat.probes = 5;
Sort (actual time=152.548..152.549 rows=10.00 loops=1)
Sort Key: ((articles.content_halfvec <=> (InitPlan 1).col1))
-> Seq Scan on articles (actual time=0.144..148.844 rows=24700.00 loops=1)
Filter: (content_halfvec IS NOT NULL)
Execution Time: 152.584 ms
Same story as HNSW. At probes=5 (searching 5 of 25 cells = 20%), the optimizer decided a sequential scan was cheaper. The query went from 6 ms to 153 ms.
| probes | Execution Time | Plan |
|---|---|---|
| 1 | 1.0 ms | Index Scan |
| 2 | 3.7 ms | Index Scan |
| 3 | 4.4 ms | Index Scan |
| 4 | 5.9 ms | Index Scan |
| 5 | 153 ms | Seq Scan |
Takeaway: For IVFFlat with 25 lists, the optimizer flips between probes=4 and probes=5. With
sqrt(25) = 5, you’re right at the tipping point. UseSET LOCAL ivfflat.probes = 3or4for a good recall/speed balance. On larger tables (100K+ rows), the flip happens at much higher probe counts because sequential scans become proportionally more expensive.
SET LOCAL: The Production Pattern
Never set ivfflat.probes at the session level in production. Use SET LOCAL inside a transaction:
BEGIN;
SET LOCAL ivfflat.probes = 3;
SELECT id, title, content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
) AS distance
FROM articles
ORDER BY content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
)
LIMIT 10;
COMMIT;
-- Verify: probes reverted to default
SHOW ivfflat.probes; -- 1
The setting reverts automatically after COMMIT/ROLLBACK. No global state leakage. Do the same for hnsw.ef_search.
DO NOT PLAY AROUND WITH SESSION PARAMETERS ON PRODUCTION. Use
SET LOCALinside a transaction, or set them at the function/procedure level. Session-level changes persist until disconnect and can affect every query on that connection, including connection pooler reuse. Hinting is not a strategy.
DiskANN: Leveraging B-TREE index principle
DiskANN is provided by the pgvectorscale project from Timescale (SQL extension name: vectorscale). It implements the DiskANN algorithm with Statistical Binary Quantization (SBQ) compression built in.
No 2,000-Dimension Wall
Unlike HNSW and IVFFlat, DiskANN supports the vector type natively up to 16,000 dimensions — no halfvec workaround needed:
CREATE INDEX idx_content_diskann ON articles USING diskann (content_vector)
WITH (storage_layout = memory_optimized);
NOTICE: Starting index build with num_neighbors=50, search_list_size=100,
max_alpha=1.2, storage_layout=SbqCompression. -- memory_optimized maps to SBQ
NOTICE: Indexed 24700 tuples
Time: 49140.736 ms (00:49.141)
SELECT pg_size_pretty(pg_relation_size('idx_content_diskann'));
-- 21 MB
49 seconds to build, but 21 MB. Compare:
indexname | size | size_bytes
--------------------------+--------+------------
idx_content_diskann | 21 MB | 22,511,616
idx_content_hnsw_halfvec | 193 MB | 202,350,592
idx_content_ivfflat | 193 MB | 202,522,624
DiskANN is 9x smaller than HNSW and IVFFlat on the same data. SBQ compression is the default — storage_layout = memory_optimized is what you get if you don’t specify a layout. Specifying it explicitly (as in the CREATE INDEX above) is good practice for readability. The alternative plain layout stores full vectors in the index and does not compress.
Query performance is competitive:
Index Scan using idx_content_diskann on articles
Order By: (content_vector <=> (InitPlan 1).col1)
Buffers: shared hit=1437 read=132
Execution Time: 2.915 ms
3 ms for a 3,072-dimension nearest neighbor search on 25K articles. On the same data, HNSW does it in 2-6 ms and IVFFlat in 2-10 ms. All three are in the same ballpark for query speed.
Takeaway: DiskANN is the right choice when your HNSW index outgrows
shared_buffers. At 193 MB for 25K rows, HNSW on halfvec(3072) would reach ~77 GB at 10 million rows, this is well beyond what most buffer pools can keep hot. DiskANN’s 9x compression keeps the navigational structure cached while full vectors stay in the heap, fetched only for the final rescore of a few dozen candidates. Same access pattern as a B-tree: compact index in memory, selective heap lookups on demand. The trade-off is longer build times and fewer operator class options (vector type only, no halfvec/bit/sparsevec).
CREATE INDEX CONCURRENTLY
As of pgvectorscale 0.9.0, DiskANN supports CONCURRENTLY:
CREATE INDEX CONCURRENTLY idx_content_diskann
ON articles USING diskann (content_vector)
WITH (storage_layout = memory_optimized);
This is critical for production — you can build the index without locking the table. HNSW and IVFFlat also support CREATE INDEX CONCURRENTLY.
Why These Dimension Limits Exist: Buffer Pages and Bits
If the 2,000 / 4,000 / 16,000 dimension limits seem arbitrary, they’re not. The intuition comes from how PostgreSQL stores data: in 8 KB buffer pages (8,192 bytes). Every index tuple — including the vector representation in a vector index — has to fit within a page. The fewer bytes per dimension, the more dimensions you can pack.
Here’s the back-of-the-envelope arithmetic:
| Encoding | Bytes per dimension | Theoretical max in 8 KB |
|---|---|---|
vector (float32) | 4 bytes | 8,192 / 4 = 2,048 |
halfvec (float16) | 2 bytes | 8,192 / 2 = 4,096 |
| 4-bit quantized (PQ) | 0.5 bytes | 8,192 * 2 = 16,384 |
| 1-bit binary (SBQ) | 0.125 bytes | 8,192 * 8 = 65,536 |
The theoretical numbers explain the intuition — why halfvec doubles the limit and quantized encodings push it further. The actual limits are slightly lower because of page headers, tuple overhead, and the index metadata stored alongside each vector. For HNSW, each page also stores neighbor connection lists (up to m * 2 neighbor IDs per node); for IVFFlat, each page carries centroid references and list pointers. These eat into the available space, which is why pgvector’s HNSW sets 2,000 (not 2,048) and pgvectorscale’s DiskANN sets 16,000 (not 16,384). But the pattern is unmistakable: the fewer bits per dimension, the more dimensions you can fit in a page.
This is why DiskANN can handle 16,000 dimensions where HNSW on vector tops out at 2,000 — DiskANN stores a compressed representation in the index page, not the full vector. And why halfvec doubled the HNSW/IVFFlat limit from 2,000 to 4,000: half the bytes per dimension, twice the capacity.
This is more than enough for the vast majority of use cases. Most embedding models in production today default to 768–1536 dimensions, well within the 2,000-dimension limit. This also proves how future proof the curent vector store implementation is on PostgreSQL.
How pgvectorscale Compresses: Statistical Binary Quantization (SBQ)
pgvectorscale’s DiskANN from Tiger Data uses a method called Statistical Binary Quantization (SBQ). The idea is deceptively simple: for each dimension, replace the float value with a 1 or 2-bit code.
1-bit mode (default for dimensions >= 900): Each dimension is compressed to a single bit. But not naively — standard binary quantization uses 0.0 as the threshold (positive = 1, negative = 0), which works poorly because real embedding distributions are rarely centered on zero. SBQ instead computes the per-dimension mean across all vectors during index build and uses that as the threshold:
if value > mean_of_this_dimension → 1
else → 0
A 3,072-dimension float32 vector (12,288 bytes) becomes a 3,072-bit string (384 bytes). That’s 32x compression. During search, the query vector is also SBQ-encoded, and distances are computed using XOR + popcount on the bit strings — which modern CPUs execute in a single instruction.
2-bit mode (default for dimensions < 900): Each dimension gets two bits, encoding four “zones” based on the z-score (how many standard deviations from the mean). This gives finer granularity at 16x compression instead of 32x.
The accuracy loss is real but small. On common benchmarks, SBQ achieves 96-99% recall compared to exact search. The rescore step (controlled by diskann.query_rescore, default 50) compensates: after the graph traversal finds the top-50 candidates using quantized distances, pgvectorscale fetches the full-precision vectors from the heap and re-computes exact distances to produce the final top-10.
What’s stored where:
| Location | What’s stored | Accessed when |
|---|---|---|
| Index pages | SBQ-compressed vectors + graph edges | Every query (graph traversal) |
| Heap (table) | Full float32 vectors | Only during rescore (top-N candidates) |
This two-tier architecture is why DiskANN achieves 21 MB index size: the index only stores 384-byte compressed vectors, not 12 KB originals. The full vectors stay in the table, fetched only for the final re-ranking of a few dozen candidates.
Microsoft’s pg_diskann: A Different Approach
There’s a second DiskANN implementation for PostgreSQL: Microsoft’s pg_diskann, currently documented and distributed for Azure Database for PostgreSQL Flexible Server. It uses the same Vamana graph algorithm but a fundamentally different compression strategy: Product Quantization (PQ).
Where SBQ asks “is this dimension above or below the mean?”, Product Quantization asks “which of 16 codewords best represents this group of dimensions?”
Here’s how PQ works, step by step:
- Divide the vector into chunks. A 3,072-dimension vector is split into, say, 1,024 chunks of 3 dimensions each.
- Train a codebook per chunk. For each chunk, k-means clustering finds 16 representative codewords (centroids). Why 16? Because 16 values fit in 4 bits — a single hex digit.
- Encode each chunk as a 4-bit symbol. During index build, each chunk of 3 dimensions is replaced by the index (0-15) of its closest codeword. The entire 3,072-dimension vector becomes 1,024 symbols of 4 bits each = 512 bytes.
- Decode via lookup table at query time. To compute the distance to a query vector, you pre-compute the distance from the query to all 16 codewords for each chunk, creating a 16-row x 1,024-column lookup table. Then for each stored vector, you sum up the table entries corresponding to its symbols. No floating-point multiplication needed — just table lookups and additions.
The compression is dramatic: 3,072 dimensions * 4 bytes = 12,288 bytes → 512 bytes with PQ. That’s 24x compression, in the same ballpark as SBQ’s 32x.
Comparing the Two DiskANN Implementations
| pgvectorscale (Timescale) | pg_diskann (Microsoft) | |
|---|---|---|
| Compression | SBQ (1-2 bits/dim) | Product Quantization (4 bits/chunk) |
| Compression ratio | 32x (1-bit) or 16x (2-bit) | ~24x (depends on chunks) |
| How it works | Per-dimension thresholding | Codebook lookup per chunk |
| Distance computation | XOR + popcount (very fast) | Table lookup + sum |
| Trainable | Minimal (just means + stddev) | Heavy (k-means per chunk) |
| Max dimensions | 16,000 | 16,000 |
| Availability | Open source, self-hosted | Azure Database for PostgreSQL |
| License | PostgreSQL License | Distributed via Azure |
| Iterative scan | No | Yes (relaxed/strict, ON by default) |
| PG version | PG 14-18 (self-hosted) | Azure DB for PostgreSQL |
Both implementations share the core DiskANN algorithm (Vamana graph) and the two-phase search pattern (compressed scan + full-precision rescore). The difference is how they compress:
- SBQ is simpler and faster to build (just compute means). It’s a blunt instrument — 1 bit per dimension loses a lot of information, but XOR + popcount is blazingly fast, and the rescore step recovers accuracy.
- PQ is more sophisticated and retains more information per bit (a 4-bit symbol captures relationships between groups of dimensions). It’s slower to build (k-means training) but can achieve better recall at the same compression ratio, especially for vectors with correlated dimensions.
Takeaway: If you’re self-hosting PostgreSQL, pgvectorscale is your DiskANN option — open source, well-maintained, and the SBQ compression is effective. If you’re on Azure Database for PostgreSQL, you also have pg_diskann, whose PQ compression may give better recall on very high-dimensional data. The underlying algorithm is the same; the compression strategy is the difference.
Iterative Scans
This is the most important query-time feature added to pgvector since HNSW support. If you take one thing from this post, let it be this section.
The Problem
Vector search indexes return the K nearest neighbors, then PostgreSQL applies your WHERE clause. If your filter is selective, you get fewer results than requested.
-- Our dataset has 9 categories with varying selectivity
category | cnt | % of 25K
-------------+-------+---------
History | 8719 | 34.9%
General | 6221 | 24.9%
Science | 2232 | 8.9%
Mathematics | 116 | 0.5%
When you search for the 10 nearest “Science” articles, the HNSW index returns its ef_search nearest neighbors (say, 40), PostgreSQL filters to keep only Science articles, and you get however many matched. For Science (8.9%), you’ll usually get your 10 results. For Mathematics (0.5%), you won’t.
Let’s prove it. Here’s a search for Mathematics articles with ef_search=40, forcing the HNSW index path. (We disable sequential scan here to force the index path on our small 25K dataset. On production tables with 100K+ rows, the optimizer would naturally choose the index without this hint.)
SET enable_seqscan = off;
SET hnsw.iterative_scan = 'off';
SET hnsw.ef_search = 40;
SELECT id, title, category,
content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1) AS distance
FROM articles
WHERE category = 'Mathematics'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;
Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
Filter: (category = 'Mathematics'::text)
Rows Removed by Filter: 40
Index Searches: 1
Execution Time: 0.766 ms
(0 rows)
Zero rows returned. The index fetched 40 candidates, all 40 were filtered out (none were Mathematics), and the query silently returned an empty result set. This is the core problem: your application asked for 10 results and got 0.
Before iterative scans, you had two bad options:
- Over-fetch (LIMIT 1000) and hope enough rows match — wasteful and unreliable
- Sequential scan — correct but slow on large tables
The Solution: Iterative Index Scans
pgvector 0.8.0 introduced iterative scans. Instead of fetching a fixed batch and filtering, the index keeps fetching more candidates until the filter is satisfied.
SET enable_seqscan = off;
SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 40;
SELECT id, title, category,
content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1) AS distance
FROM articles
WHERE category = 'Mathematics'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;
Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
Filter: (category = 'Mathematics'::text)
Rows Removed by Filter: 6809
Index Searches: 171
Execution Time: 4235.805 ms
(10 rows)
10 rows returned. The index scanned 171 times, examined 6,819 candidates, filtered out 6,809 non-Mathematics rows, and delivered exactly 10 results. It took 4.2 seconds — much slower than the 0.8 ms empty result — but you got a correct answer instead of silence.
At 6,819 tuples scanned, this was well within the default max_scan_tuples of 20,000. On a table with millions of rows and the same 0.5% selectivity, the scan might hit the 20,000 limit before finding 10 matches — you’d get a partial result set. That’s the trade-off the safety valve makes: bounded latency vs guaranteed result count.
That 4.2-second cost reflects the extreme selectivity: Mathematics is only 0.5% of the data, so the index had to traverse deep into the graph. For moderate selectivity like Science (8.9%), the overhead is negligible:
SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 40;
SELECT ... WHERE category = 'Science' ORDER BY ... LIMIT 10;
Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
Filter: (category = 'Science'::text)
Rows Removed by Filter: 26
Index Searches: 1
Execution Time: 1.058 ms
Same feature, but for Science the index found 10 matching rows in a single search pass — no extra work needed.
Two Modes
relaxed_order: Results are approximately ordered by distance. Slightly faster. Good enough for most use cases.strict_order: Results are exactly ordered by distance. Slightly slower. Use when ranking precision matters. SET hnsw.iterative_scan = ‘strict_order’;
— Execution Time: 0.885 ms
The Safety Valve: max_scan_tuples
To prevent runaway scans on extremely selective filters (imagine filtering for a category that has 1 row in 10 million), there’s a safety limit:
SET hnsw.max_scan_tuples = 500; -- Restrictive: stop after 500 index tuples
SET hnsw.max_scan_tuples = 20000; -- Default: generous enough for most workloads
SET hnsw.max_scan_tuples = 0; -- Unlimited (use with caution)
IVFFlat Too
Same concept, different GUC prefix:
SET ivfflat.iterative_scan = 'relaxed_order';
SET ivfflat.probes = 3;
SELECT ... WHERE category = 'Science' ORDER BY ... LIMIT 10;
-- Execution Time: 2.001 ms
Filtered Results in Action
With iterative scan, every result matches the filter:
SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 100;
SELECT id, title, category,
round((content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1))::numeric, 4) AS distance
FROM articles
WHERE category = 'Science'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;
id | title | category | distance
-------+----------+----------+----------
1 | April | Science | 0.0000
7862 | April 23 | Science | 0.2953
9878 | April 25 | Science | 0.3076
469 | May | Science | 0.3082
9880 | April 24 | Science | 0.3451
402 | July | Science | 0.3453
5156 | April 4 | Science | 0.3531
9530 | April 7 | Science | 0.3588
34906 | 2013 | Science | 0.3674
9149 | April 8 | Science | 0.3690
All 10 results are Science. All sorted by cosine distance. No over-fetching, no sequential scan. (The titles are Wikipedia date articles — “April”, “May”, etc. — that happen to be classified under Science in this dataset.)
Multi-Filter Combinations
Iterative scans work with compound WHERE clauses too:
SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 100;
SELECT id, title, category, word_count
FROM articles
WHERE category = 'Science' AND word_count > 1000
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;
Limit (actual time=7.098..7.105 rows=10.00 loops=1)
-> Sort (actual time=7.097..7.101 rows=10.00 loops=1)
Sort Key: ((content_halfvec <=> (InitPlan 1).col1))
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on articles (actual time=0.690..7.027 rows=444.00 loops=1)
Recheck Cond: ((word_count > 1000) AND (category = 'Science'::text))
-> BitmapAnd (actual time=0.641..0.643 rows=0.00 loops=1)
-> Bitmap Index Scan on idx_articles_word_count (rows=1806.00)
-> Bitmap Index Scan on idx_articles_category (rows=2232.00)
Execution Time: 7.542 ms
Here the PostgreSQL optimizer did something clever: it combined two B-tree indexes (BitmapAnd) instead of using the HNSW iterative scan. On a 25K-row table that fits in memory, this is often cheaper. At scale (millions of rows), the iterative scan path wins because the bitmap approach requires reading too many heap pages.
Takeaway: Iterative scans are the answer to “vector search + WHERE doesn’t work.” Enable them with
SET LOCAL hnsw.iterative_scan = 'relaxed_order'inside transactions. The safety valvemax_scan_tuples = 20000is a sensible default.
Quantization and Storage
With 3,072-dimension embeddings, storage is the elephant in the room. Here’s what each representation costs per row:
SELECT
pg_size_pretty(avg(pg_column_size(content_vector))) AS vector_size,
pg_size_pretty(avg(pg_column_size(content_halfvec))) AS halfvec_size,
pg_size_pretty(avg(pg_column_size(content_bq))) AS binary_size
FROM articles WHERE content_vector IS NOT NULL;
vector_size | halfvec_size | binary_size
-------------+--------------+-------------
12 kB | 6148 bytes | 392 bytes
| Type | Bytes per dimension | Per-row (3072d) | Savings |
|---|---|---|---|
vector(3072) | 4 bytes (float32) | 12,288 bytes | baseline |
halfvec(3072) | 2 bytes (float16) | 6,144 bytes | 50% |
bit(3072) | 1/8 byte (1 bit) | 384 bytes | 97% |
At 1 million rows:
| Type | Column storage | HNSW index |
|---|---|---|
vector(3072) | ~12 GB | N/A (2000-dim limit) |
halfvec(3072) | ~6 GB | ~7.7 GB |
bit(3072) | ~0.4 GB | ~0.8 GB (bit_hamming_ops) |
The storage math is brutal for high-dimensional embeddings. This is why DiskANN’s built-in SBQ compression matters: it gets you to 21 MB where HNSW on halfvec costs 193 MB.
Binary Quantize + Re-Ranking
Binary quantization crushes each dimension to a single bit (positive = 1, negative = 0). It’s lossy, but very fast for coarse filtering. The pattern is a two-phase search:
-- Phase 1: Hamming distance on binary (fast, coarse) → 100 candidates
-- Phase 2: Cosine distance on full vector (precise) → 10 results
WITH coarse AS (
SELECT id, title, content_vector
FROM articles
WHERE content_bq IS NOT NULL
ORDER BY content_bq <~> (
SELECT binary_quantize(content_vector)::bit(3072)
FROM articles WHERE id = 1
)
LIMIT 100
)
SELECT id, title,
content_vector <=> (SELECT content_vector FROM articles WHERE id = 1) AS distance
FROM coarse
ORDER BY content_vector <=> (SELECT content_vector FROM articles WHERE id = 1)
LIMIT 10;
Limit (actual time=29.021..29.026 rows=10.00 loops=1)
-> Sort on coarse (actual time=29.020..29.023 rows=10.00 loops=1)
-> Subquery Scan (actual time=27.805..28.999 rows=100.00 loops=1)
-> Sort by content_bq <~> (actual time=27.744..27.754 rows=100.00 loops=1)
-> Seq Scan on articles (actual time=0.435..24.288 rows=24700.00 loops=1)
Execution Time: 29.127 ms
29 ms without an index on content_bq. With an HNSW index using bit_hamming_ops, Phase 1 would be sub-millisecond. The re-ranking in Phase 2 only touches 100 full vectors instead of 25,000.
DBA Takeaway: Use halfvec as your default for 3072-dimension embeddings. Use binary quantize + re-ranking when you need to search billions of rows and can tolerate a two-phase approach. DiskANN’s SBQ gives you similar compression automatically.
Operators and Sargability
pgvector provides four distance operators:
| Operator | Distance | Operator Class | Use When |
|---|---|---|---|
<=> | Cosine | vector_cosine_ops | Normalized embeddings (most common) |
<-> | L2 (Euclidean) | vector_l2_ops | Absolute distance matters |
<#> | Inner Product (negative) | vector_ip_ops | Pre-normalized, slight speed edge |
<+> | L1 (Manhattan) | vector_l1_ops | Sparse-like behavior on dense vectors |
Wrong Operator = No Index
This is the single most common mistake. If your index uses halfvec_cosine_ops but your query uses <-> (L2), the index cannot be used:
-- CORRECT: cosine operator on cosine index → Index Scan
EXPLAIN (COSTS OFF)
SELECT ... ORDER BY content_halfvec <=> (...) LIMIT 10;
Limit
-> Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
-- WRONG: L2 operator on cosine index → Seq Scan!
EXPLAIN (COSTS OFF)
SELECT ... ORDER BY content_halfvec <-> (...) LIMIT 10;
Limit
-> Sort
Sort Key: ((articles.content_halfvec <-> (InitPlan 1).col1))
-> Seq Scan on articles
The planner can’t use a cosine-distance index for an L2-distance query. They’re different metrics with different orderings. If you see an unexpected Seq Scan on a vector query, check your operator first.
Sargable Queries: The Cross-Join Trap
This is the pattern I see most often in the wild, and it’s wrong:
-- BAD: cross-join prevents index use
SELECT a.id, a.title,
a.content_halfvec <=> b.content_halfvec AS distance
FROM articles a, articles b
WHERE b.id = 1
ORDER BY a.content_halfvec <=> b.content_halfvec
LIMIT 10;
Limit
-> Sort
Sort Key: ((a.content_halfvec <=> b.content_halfvec))
-> Nested Loop
-> Index Scan using articles_pkey on articles b
Index Cond: (id = 1)
-> Seq Scan on articles a ← NO INDEX!
The planner sees a.content_halfvec <=> b.content_halfvec as a join condition, not an index-scan ordering. It can’t push the ORDER BY into the vector index because the right-hand side comes from a different table reference.
The fix: use a scalar subquery:
-- GOOD: scalar subquery → index used
SELECT id, title,
content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
) AS distance
FROM articles
ORDER BY content_halfvec <=> (
SELECT content_halfvec FROM articles WHERE id = 1
)
LIMIT 10;
Limit
InitPlan 1
-> Index Scan using articles_pkey on articles articles_1
Index Cond: (id = 1)
-> Index Scan using idx_content_hnsw_halfvec on articles
Order By: (content_halfvec <=> (InitPlan 1).col1)
The scalar subquery is evaluated once (InitPlan), then the result is treated as a constant. Now the planner can push the ORDER BY into the HNSW index scan. Same results, but with the index instead of a sequential scan.
Takeaway: For this nearest-neighbor ORDER BY pattern, always use scalar subqueries for vector distance calculations, never cross-joins. This is the vector-search equivalent of writing sargable WHERE clauses for B-tree indexes.
Partial Indexes
If you frequently filter by a specific category, a partial index is dramatically more efficient:
CREATE INDEX idx_content_hnsw_science
ON articles USING hnsw (content_halfvec halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64)
WHERE category = 'Science';
Time: 1420.207 ms (00:01.420)
indexname | size
--------------------------+--------
idx_content_hnsw_halfvec | 193 MB ← full table (25,000 rows)
idx_content_hnsw_science | 17 MB ← Science only (2,232 rows)
11x smaller, 20x faster to build. If your application queries consistently filter by a known set of categories, partial indexes are the single biggest optimization available. Build one per high-value category.
Monitoring
Index Inventory
SELECT indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
indexdef
FROM pg_indexes
WHERE tablename = 'articles'
ORDER BY pg_relation_size(indexname::regclass) DESC;
indexname | size | indexdef
-------------------------+---------+-------------------------------------------
idx_content_hnsw_halfvec| 193 MB | USING hnsw ... WITH (m='16', ef_construction='64')
idx_content_ivfflat | 193 MB | USING ivfflat ... WITH (lists='25')
idx_content_diskann | 21 MB | USING diskann ... WITH (storage_layout=memory_optimized)
articles_pkey | 1384 kB | USING btree (id)
idx_articles_category | 1192 kB | USING btree (category)
idx_articles_word_count | 904 kB | USING btree (word_count)
Adding up the three vector indexes (193 + 193 + 21 = 407 MB) plus B-tree indexes (3 MB), the total index footprint is over 410 MB for a 90 MB table. The indexes are ~4.5x the data. This is typical for high-dimensional vector data — plan your storage accordingly.
Note: In practice you’d pick one vector index, not all three. With just HNSW + B-tree indexes, the ratio drops to ~2x.
Settings
Know what settings exist and what they default to:
SELECT name, setting, short_desc
FROM pg_settings
WHERE name ~ '^(hnsw|ivfflat|diskann)\.'
ORDER BY name;
name | setting | short_desc
--------------------------------------------+---------+-------------------------------------------
hnsw.ef_search | 40 | Dynamic candidate list size for search
hnsw.iterative_scan | off | Mode for iterative scans
hnsw.max_scan_tuples | 20000 | Max tuples to visit for iterative scans
hnsw.scan_mem_multiplier | 1 | Multiple of work_mem for iterative scans
ivfflat.iterative_scan | off | Mode for iterative scans
ivfflat.max_probes | 32768 | Max probes for iterative scans
ivfflat.probes | 1 | Number of probes
diskann.query_search_list_size | 100 | Search list size for queries
diskann.query_rescore | 50 | Rescore candidates
Takeaway:
hnsw.iterative_scanandivfflat.iterative_scandefault tooff. If your application relies on vector search with WHERE clauses, you need to explicitly enable iterative scans.
Access Method Capabilities
Not sure which operator class works with which index type? Query the catalog:
SELECT am.amname AS access_method,
opc.opcname AS operator_class,
t.typname AS data_type
FROM pg_opclass opc
JOIN pg_am am ON am.oid = opc.opcmethod
JOIN pg_type t ON t.oid = opc.opcintype
WHERE am.amname IN ('hnsw', 'ivfflat', 'diskann')
ORDER BY am.amname, opc.opcname;
| Access Method | Data Types | Operator Classes |
|---|---|---|
| HNSW | vector, halfvec, bit, sparsevec | 18 classes (broadest support) |
| IVFFlat | vector, halfvec, bit | 7 classes |
| DiskANN | vector only (+ label filtering) | 4 classes |
HNSW is the most versatile. DiskANN is the most constrained. IVFFlat falls in between.
Build Progress Monitoring
While building a large index:
SELECT phase,
round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS pct_done,
tuples_done, tuples_total
FROM pg_stat_progress_create_index;
This works for HNSW and IVFFlat builds (pgvector reports progress). DiskANN builds from pgvectorscale don’t currently report to this view.
Decision Guidelines
Here’s how to choose:
Start with HNSW unless you have a reason not to. It has the broadest operator support, good recall, and predictable performance. Use halfvec if your dimensions exceed 2,000.
Choose IVFFlat if build time matters more than query time. IVFFlat builds 5-6x faster than HNSW. If your data distribution shifts materially, plan a REINDEX to refresh clustering quality (centroid drift). A practical signal: if recall drops without any change in query patterns, or if you’ve inserted more than ~30% new rows since the last build, the centroids are likely stale. For a deeper look at how embedding model upgrades trigger reindexing at scale, see Embedding Versioning with pgvector.
Choose DiskANN if storage is the constraint. The 9x compression is decisive at scale. It handles high dimensions natively (no halfvec needed) and supports CONCURRENTLY for production deployments.
Enable iterative scans for vector search + filtering in production. The trade-off is potentially higher latency on very selective filters (the index does more work to find matching rows), but that’s usually the right trade-off for correctness. Tune max_scan_tuples / max_probes to bound worst-case work. Use relaxed_order by default, strict_order when ranking precision matters.
Use partial indexes for category-specific searches. An 11x size reduction and 20x faster build is hard to argue with.
Use SET LOCAL for all vector parameter changes in production AFTER having tested them.
Summary
| Feature | HNSW | IVFFlat | DiskANN (pgvectorscale) | DiskANN (pg_diskann) |
|---|---|---|---|---|
| Provider | pgvector | pgvector | Timescale (open source) | Microsoft (Azure DB for PG) |
| Max dims (vector) | 2,000 | 2,000 | 16,000 | 16,000 |
| Max dims (halfvec) | 4,000 | 4,000 | N/A | N/A |
| Compression | Via halfvec | Via halfvec | SBQ (1-2 bits/dim) | PQ (4 bits/chunk) |
| Build time (25K, 3072d) | 29s | 5s | 49s | N/A |
| Index size | 193 MB | 193 MB | 21 MB | Similar |
| Query time | 2-6 ms | 2-10 ms | 3 ms | ~3 ms |
| Key tuning param | ef_search | probes | query_search_list_size | search list / PQ params |
| Iterative scan | Yes | Yes | No | Yes (ON by default) |
| CONCURRENTLY | Yes | Yes | Yes (0.9.0+) | Yes |
| Data types | vector, halfvec, bit, sparsevec | vector, halfvec, bit | vector only | vector only |
Dimension limits are largely explained by PostgreSQL’s 8 KB page size and encoding density (exact cutoffs are implementation-defined):
| Encoding | Bits/dim | Theoretical max | Actual limit | Who uses it |
|---|---|---|---|---|
float32 (vector) | 32 | 2,048 | 2,000 | HNSW, IVFFlat |
float16 (halfvec) | 16 | 4,096 | 4,000 | HNSW, IVFFlat |
| PQ symbol | 4 | 16,384 | 16,000 | pg_diskann |
| SBQ binary | 1 | 65,536 | 16,000 | pgvectorscale |
The compression story is ultimately a story about how many bits of information you need per dimension to navigate the index. pgvector stores full-precision values; DiskANN stores just enough to find the right neighborhood, then goes back to the heap for exact distances.
But here’s the thing: exact results are not necessarily what you want. In a RAG pipeline, the retrieved documents are context for a language model that will synthesize and rephrase an answer — not return rows verbatim. Whether your top-10 results are ranked by exact cosine distance or by a 96%-accurate approximation rarely changes the generated answer. The same is true for recommendations, semantic deduplication, and most classification workflows: the downstream consumer is tolerant of small ranking variations.
The real production trade-off is not precision vs approximation — it’s the balance between retrieval speed, resource efficiency, and result quality at your scale. An HNSW index that doesn’t fit in shared_buffers and hits disk on every query will give you worse effective results than a DiskANN index that stays cached and returns slightly less precise distances in a fraction of the time. The best retrieval is the one that actually runs within your latency budget.
The lab with all SQL scripts, and Python embedding pipeline are available here : lab/06_pgvector_indexes.
All benchmarks: PostgreSQL 18, pgvector 0.8.1, pgvectorscale 0.9.0, 25K Wikipedia articles, 3072d text-embedding-3-large embeddings, (4 vCPUs, 8 GB RAM).