Introduction

In Part 1 of this series, we covered what pgvector is, how embeddings work, and how to store them in PostgreSQL. We ended with a working similarity search — but on a sequential scan. That works fine for a demo table with 1,000 rows. It does not work for production.

This post is about what comes next: indexes. Specifically, the three index families in the pgvector ecosystem as of February 2026 (HNSW, IVFFlat, and DiskANN), including two DiskANN implementations targeting different deployment models, what they’re good at, where they break, and the patterns you need, whether you’re the DBA tuning them or the developer looking to understand the the strenghts of PostgreSQL as a vector store.

Everything in this post was tested on public dataset: 25,000 Wikipedia articles embedded with OpenAI’s text-embedding-3-large at 3,072 dimensions, the maximum the model supports. The high number of dimension is a choice, to highlight some limitations for pedagogical reasons. You would be ok running and testing with lower dimensions or other embedding models, you might want to look into the RAG series, I will probably make a blog post on how to test embedding models against your data sets.
The environment is PostgreSQL 18 with pgvector 0.8.1 and pgvectorscale 0.9.0.

All the SQL scripts, Python code, and Docker configuration are in the companion lab: lab/06_pgvector_indexes.


The Index Types

Before we dive in, here’s the landscape. pgvector ships with two built-in index types (HNSW and IVFFlat), and two DiskANN implementations are available from different vendors:

HNSWIVFFlatDiskANN (pgvectorscale)DiskANN (pg_diskann)
ProviderpgvectorpgvectorTimescaleMicrosoft
AvailabilityBuilt-inBuilt-inOpen source, self-hostedAzure DB for PostgreSQL
AlgorithmMulti-layer graphVoronoi cell partitioningVamana graph + SBQVamana graph + PQ
Best forGeneral purposeFast buildStorage-constrainedAzure + high recall
Build time (25K, 3072d)29s5s49sN/A (Azure)
Index size193 MB193 MB21 MBSimilar
Query time2-6 ms2-10 ms3 ms~3 ms

That pgvectorscale number is not a typo. 21 MB vs 193 MB for the same data. W

Note: This post uses pgvectorscale for all DiskANN benchmarks since it’s the open-source, self-hosted option. We’ll compare both DiskANN implementations in detail in Section 3. pg_diskann is available only for Azure Flexible Server for PostgreSQL, the managed instance service from Microsoft.


HNSW: The Default Choice

HNSW (Hierarchical Navigable Small World) is the most commonly recommended index type for vector search. It builds a multi-layered graph where each node connects to its nearest neighbors, and search navigates this graph from top to bottom.

The 2,000-Dimension Wall

Here’s the first thing you’ll hit with modern embedding models:

CREATE INDEX idx_content_hnsw
ON articles USING hnsw (content_vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

ERROR:  column cannot have more than 2000 dimensions for hnsw index

The vector type in pgvector has a 2,000-dimension limit for HNSW indexes. If you’re using text-embedding-3-large (3,072 dimensions), or voyage-3-large at its 2,048-dimension setting, this is a blocker.

The workaround: halfvec.

-- Step 1: Store a half-precision copy
ALTER TABLE articles ADD COLUMN content_halfvec halfvec(3072);
UPDATE articles SET content_halfvec = content_vector::halfvec;

-- Step 2: Index the halfvec column (limit: 4,000 dimensions)
CREATE INDEX idx_content_hnsw_halfvec
ON articles USING hnsw (content_halfvec halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);

Time: 28974.392 ms (00:28.974)

SELECT pg_size_pretty(pg_relation_size('idx_content_hnsw_halfvec')) AS hnsw_size;

 hnsw_size
-----------
 193 MB

29 seconds to build, 193 MB for 25,000 articles at 3,072 dimensions. That’s roughly 8 KB per row in the index alone.

Important: halfvec stores each dimension in 2 bytes instead of 4. You lose some floating-point precision, but in practice the recall difference is negligible for similarity search. The storage savings are real: your halfvec column is 6 KB per row vs 12 KB for the full vector.

Alternative: Instead of a separate column, you can create an expression index that casts on the fly: CREATE INDEX ... ON articles USING hnsw ((content_vector::halfvec(3072)) halfvec_cosine_ops); The trade-off is that your queries must use the matching expression (content_vector::halfvec(3072) <=> ...) for the planner to pick it up, which is harder to read in application code. The separate column approach gives cleaner queries.

Tuning ef_search: The Recall vs Speed Dial

ef_search controls how many candidates the HNSW algorithm considers during search. Higher values mean more candidates examined, better recall, but more work. The default is 40.

-- ef_search = 40 (default)
SET hnsw.ef_search = 40;
EXPLAIN ANALYZE
SELECT id, title, content_halfvec <=> (
    SELECT content_halfvec FROM articles WHERE id = 1
) AS distance
FROM articles
ORDER BY content_halfvec <=> (
    SELECT content_halfvec FROM articles WHERE id = 1
)
LIMIT 10;

Index Scan using idx_content_hnsw_halfvec on articles
  Order By: (content_halfvec <=> (InitPlan 1).col1)
  Index Searches: 1
  Buffers: shared hit=551
Execution Time: 6.004 ms

-- ef_search = 100
SET hnsw.ef_search = 100;

Index Scan using idx_content_hnsw_halfvec on articles
  Buffers: shared hit=716
Execution Time: 2.365 ms

-- ef_search = 200
SET hnsw.ef_search = 200;

Index Scan using idx_content_hnsw_halfvec on articles
  Buffers: shared hit=883
Execution Time: 2.542 ms

Wait — ef_search=100 was faster than ef_search=40? Not really. Those numbers came from a warm cache (shared hit=551, zero disk reads). The apparent speedup is a cache warming effect, not a property of the algorithm. To prove this, I restarted PostgreSQL and ran the full sweep from cold:

ef_searchExecution TimeBuffersread (disk)
40 (cold)91 mshit=189, read=362362 pages from disk
10033 mshit=616, read=132fewer cold pages
20022 mshit=850, read=65even fewer
40 (warm)0.8 mshit=551, read=0all cached

The second run at ef_search=40 clocked 0.8 ms — faster than any ef_search=100 or 200 run. On a warm cache, all three (40/100/200) land in the 0.8-5 ms range. The variance is cache state, not algorithmic shortcuts. The real cost cliff is at ef_search=400 where the optimizer switches plans entirely:

-- ef_search = 400
SET hnsw.ef_search = 400;

Sort  (actual time=364.693..364.695 rows=10.00 loops=1)
  Sort Key: ((articles.content_halfvec <=> (InitPlan 1).col1))
  Sort Method: top-N heapsort  Memory: 25kB
  ->  Seq Scan on articles  (actual time=0.143..356.482 rows=24700.00 loops=1)
        Filter: (content_halfvec IS NOT NULL)
        Rows Removed by Filter: 300
Execution Time: 364.724 ms

The planner chose a Seq Scan + Sort path. At ef_search=400, PostgreSQL’s cost model estimated the index scan would be more expensive than reading every row sequentially, and the query went from 2.5 ms to 365 ms.

This is your optimizer flip-flop. It’s not a bug — it’s the planner doing its job. But it means you need to be aware of the threshold.

ef_searchExecution TimePlanBuffers
406.0 msIndex Scan551
1002.4 msIndex Scan716
2002.5 msIndex Scan883
400365 msSeq Scan209,192

DBA Takeaway: For HNSW on halfvec(3072), stay in the ef_search 40-200 range. Past that, you’re fighting the optimizer. If you need ef_search > 200 for recall, you probably need a bigger m parameter at build time.

Build Parameters: m and ef_construction

m is the number of connections per node in the graph. ef_construction is the candidate list size during build. Higher values = better graph quality but slower builds and (potentially) larger indexes.

-- m=16, ef_construction=64 (default-ish)
Time: 28,974 ms    Size: 193 MB

-- m=32, ef_construction=128
Time: 54,077 ms    Size: 193 MB

At 25K rows, doubling m doubled the build time but didn’t change the index size. The size effect becomes more visible at larger scales. In general: start with m=16, ef_construction=64 and only increase if recall is insufficient after tuning ef_search.


IVFFlat: Fast Build

IVFFlat (Inverted File with Flat quantization) partitions the vector space into Voronoi cells using k-means clustering, then searches only the cells closest to the query vector.

Same Dimension Limit

IVFFlat has the same 2,000-dimension limit as HNSW for the vector type. Same workaround:

CREATE INDEX idx_content_ivfflat
ON articles USING ivfflat (content_halfvec halfvec_cosine_ops)
WITH (lists = 25);

Time: 5008.765 ms (00:05.009)

SELECT pg_size_pretty(pg_relation_size('idx_content_ivfflat'));
-- 193 MB

5 seconds to build vs 29 for HNSW. The index size is nearly identical (193 MB vs 193 MB), but IVFFlat builds 5.8x faster.

The lists parameter controls the number of Voronoi cells. The pgvector documentation recommends: rows / 1000 for tables up to 1M rows, sqrt(rows) for larger tables. For 25,000 articles: 25000 / 1000 = 25 lists, giving roughly 1,000 rows per cell. A common mistake is applying sqrt(rows) to small tables — that gives 158 lists here, creating cells with only ~167 rows each, which fragments the index and causes the optimizer to flip to sequential scan at surprisingly low probe counts.

probes controls how many Voronoi cells are searched at query time. Default is 1 — fast but low recall. A good starting point is sqrt(lists). Here’s the sweep with lists=25:

SET ivfflat.probes = 1;
-- Index Scan, Execution Time: 1.0 ms, Buffers: 571

SET ivfflat.probes = 2;
-- Index Scan, Execution Time: 3.7 ms, Buffers: 1,944

SET ivfflat.probes = 3;
-- Index Scan, Execution Time: 4.4 ms, Buffers: 2,793

SET ivfflat.probes = 4;
-- Index Scan, Execution Time: 5.9 ms, Buffers: 3,937

And then:

SET ivfflat.probes = 5;

Sort  (actual time=152.548..152.549 rows=10.00 loops=1)
  Sort Key: ((articles.content_halfvec <=> (InitPlan 1).col1))
  ->  Seq Scan on articles  (actual time=0.144..148.844 rows=24700.00 loops=1)
        Filter: (content_halfvec IS NOT NULL)
Execution Time: 152.584 ms

Same story as HNSW. At probes=5 (searching 5 of 25 cells = 20%), the optimizer decided a sequential scan was cheaper. The query went from 6 ms to 153 ms.

probesExecution TimePlan
11.0 msIndex Scan
23.7 msIndex Scan
34.4 msIndex Scan
45.9 msIndex Scan
5153 msSeq Scan

Takeaway: For IVFFlat with 25 lists, the optimizer flips between probes=4 and probes=5. With sqrt(25) = 5, you’re right at the tipping point. Use SET LOCAL ivfflat.probes = 3 or 4 for a good recall/speed balance. On larger tables (100K+ rows), the flip happens at much higher probe counts because sequential scans become proportionally more expensive.

SET LOCAL: The Production Pattern

Never set ivfflat.probes at the session level in production. Use SET LOCAL inside a transaction:

BEGIN;
    SET LOCAL ivfflat.probes = 3;
    SELECT id, title, content_halfvec <=> (
        SELECT content_halfvec FROM articles WHERE id = 1
    ) AS distance
    FROM articles
    ORDER BY content_halfvec <=> (
        SELECT content_halfvec FROM articles WHERE id = 1
    )
    LIMIT 10;
COMMIT;

-- Verify: probes reverted to default
SHOW ivfflat.probes;  -- 1

The setting reverts automatically after COMMIT/ROLLBACK. No global state leakage. Do the same for hnsw.ef_search.

DO NOT PLAY AROUND WITH SESSION PARAMETERS ON PRODUCTION. Use SET LOCAL inside a transaction, or set them at the function/procedure level. Session-level changes persist until disconnect and can affect every query on that connection, including connection pooler reuse. Hinting is not a strategy.


DiskANN: Leveraging B-TREE index principle

DiskANN is provided by the pgvectorscale project from Timescale (SQL extension name: vectorscale). It implements the DiskANN algorithm with Statistical Binary Quantization (SBQ) compression built in.

No 2,000-Dimension Wall

Unlike HNSW and IVFFlat, DiskANN supports the vector type natively up to 16,000 dimensions — no halfvec workaround needed:

CREATE INDEX idx_content_diskann ON articles USING diskann (content_vector)
WITH (storage_layout = memory_optimized);

NOTICE:  Starting index build with num_neighbors=50, search_list_size=100,
         max_alpha=1.2, storage_layout=SbqCompression.  -- memory_optimized maps to SBQ
NOTICE:  Indexed 24700 tuples
Time: 49140.736 ms (00:49.141)

SELECT pg_size_pretty(pg_relation_size('idx_content_diskann'));
-- 21 MB

49 seconds to build, but 21 MB. Compare:

        indexname         |  size  | size_bytes
--------------------------+--------+------------
 idx_content_diskann      | 21 MB  |   22,511,616
 idx_content_hnsw_halfvec | 193 MB |  202,350,592
 idx_content_ivfflat      | 193 MB |  202,522,624

DiskANN is 9x smaller than HNSW and IVFFlat on the same data. SBQ compression is the default — storage_layout = memory_optimized is what you get if you don’t specify a layout. Specifying it explicitly (as in the CREATE INDEX above) is good practice for readability. The alternative plain layout stores full vectors in the index and does not compress.

Query performance is competitive:

Index Scan using idx_content_diskann on articles
  Order By: (content_vector <=> (InitPlan 1).col1)
  Buffers: shared hit=1437 read=132
Execution Time: 2.915 ms

3 ms for a 3,072-dimension nearest neighbor search on 25K articles. On the same data, HNSW does it in 2-6 ms and IVFFlat in 2-10 ms. All three are in the same ballpark for query speed.

Takeaway: DiskANN is the right choice when your HNSW index outgrows shared_buffers. At 193 MB for 25K rows, HNSW on halfvec(3072) would reach ~77 GB at 10 million rows, this is well beyond what most buffer pools can keep hot. DiskANN’s 9x compression keeps the navigational structure cached while full vectors stay in the heap, fetched only for the final rescore of a few dozen candidates. Same access pattern as a B-tree: compact index in memory, selective heap lookups on demand. The trade-off is longer build times and fewer operator class options (vector type only, no halfvec/bit/sparsevec).

CREATE INDEX CONCURRENTLY

As of pgvectorscale 0.9.0, DiskANN supports CONCURRENTLY:

CREATE INDEX CONCURRENTLY idx_content_diskann
ON articles USING diskann (content_vector)
WITH (storage_layout = memory_optimized);

This is critical for production — you can build the index without locking the table. HNSW and IVFFlat also support CREATE INDEX CONCURRENTLY.

Why These Dimension Limits Exist: Buffer Pages and Bits

If the 2,000 / 4,000 / 16,000 dimension limits seem arbitrary, they’re not. The intuition comes from how PostgreSQL stores data: in 8 KB buffer pages (8,192 bytes). Every index tuple — including the vector representation in a vector index — has to fit within a page. The fewer bytes per dimension, the more dimensions you can pack.

Here’s the back-of-the-envelope arithmetic:

EncodingBytes per dimensionTheoretical max in 8 KB
vector (float32)4 bytes8,192 / 4 = 2,048
halfvec (float16)2 bytes8,192 / 2 = 4,096
4-bit quantized (PQ)0.5 bytes8,192 * 2 = 16,384
1-bit binary (SBQ)0.125 bytes8,192 * 8 = 65,536

The theoretical numbers explain the intuition — why halfvec doubles the limit and quantized encodings push it further. The actual limits are slightly lower because of page headers, tuple overhead, and the index metadata stored alongside each vector. For HNSW, each page also stores neighbor connection lists (up to m * 2 neighbor IDs per node); for IVFFlat, each page carries centroid references and list pointers. These eat into the available space, which is why pgvector’s HNSW sets 2,000 (not 2,048) and pgvectorscale’s DiskANN sets 16,000 (not 16,384). But the pattern is unmistakable: the fewer bits per dimension, the more dimensions you can fit in a page.

This is why DiskANN can handle 16,000 dimensions where HNSW on vector tops out at 2,000 — DiskANN stores a compressed representation in the index page, not the full vector. And why halfvec doubled the HNSW/IVFFlat limit from 2,000 to 4,000: half the bytes per dimension, twice the capacity.
This is more than enough for the vast majority of use cases. Most embedding models in production today default to 768–1536 dimensions, well within the 2,000-dimension limit. This also proves how future proof the curent vector store implementation is on PostgreSQL.

How pgvectorscale Compresses: Statistical Binary Quantization (SBQ)

pgvectorscale’s DiskANN from Tiger Data uses a method called Statistical Binary Quantization (SBQ). The idea is deceptively simple: for each dimension, replace the float value with a 1 or 2-bit code.

1-bit mode (default for dimensions >= 900): Each dimension is compressed to a single bit. But not naively — standard binary quantization uses 0.0 as the threshold (positive = 1, negative = 0), which works poorly because real embedding distributions are rarely centered on zero. SBQ instead computes the per-dimension mean across all vectors during index build and uses that as the threshold:

if value &gt; mean_of_this_dimension → 1
else → 0

A 3,072-dimension float32 vector (12,288 bytes) becomes a 3,072-bit string (384 bytes). That’s 32x compression. During search, the query vector is also SBQ-encoded, and distances are computed using XOR + popcount on the bit strings — which modern CPUs execute in a single instruction.

2-bit mode (default for dimensions < 900): Each dimension gets two bits, encoding four “zones” based on the z-score (how many standard deviations from the mean). This gives finer granularity at 16x compression instead of 32x.

The accuracy loss is real but small. On common benchmarks, SBQ achieves 96-99% recall compared to exact search. The rescore step (controlled by diskann.query_rescore, default 50) compensates: after the graph traversal finds the top-50 candidates using quantized distances, pgvectorscale fetches the full-precision vectors from the heap and re-computes exact distances to produce the final top-10.

What’s stored where:

LocationWhat’s storedAccessed when
Index pagesSBQ-compressed vectors + graph edgesEvery query (graph traversal)
Heap (table)Full float32 vectorsOnly during rescore (top-N candidates)

This two-tier architecture is why DiskANN achieves 21 MB index size: the index only stores 384-byte compressed vectors, not 12 KB originals. The full vectors stay in the table, fetched only for the final re-ranking of a few dozen candidates.

Microsoft’s pg_diskann: A Different Approach

There’s a second DiskANN implementation for PostgreSQL: Microsoft’s pg_diskann, currently documented and distributed for Azure Database for PostgreSQL Flexible Server. It uses the same Vamana graph algorithm but a fundamentally different compression strategy: Product Quantization (PQ).

Where SBQ asks “is this dimension above or below the mean?”, Product Quantization asks “which of 16 codewords best represents this group of dimensions?”

Here’s how PQ works, step by step:

  1. Divide the vector into chunks. A 3,072-dimension vector is split into, say, 1,024 chunks of 3 dimensions each.
  2. Train a codebook per chunk. For each chunk, k-means clustering finds 16 representative codewords (centroids). Why 16? Because 16 values fit in 4 bits — a single hex digit.
  3. Encode each chunk as a 4-bit symbol. During index build, each chunk of 3 dimensions is replaced by the index (0-15) of its closest codeword. The entire 3,072-dimension vector becomes 1,024 symbols of 4 bits each = 512 bytes.
  4. Decode via lookup table at query time. To compute the distance to a query vector, you pre-compute the distance from the query to all 16 codewords for each chunk, creating a 16-row x 1,024-column lookup table. Then for each stored vector, you sum up the table entries corresponding to its symbols. No floating-point multiplication needed — just table lookups and additions.

The compression is dramatic: 3,072 dimensions * 4 bytes = 12,288 bytes → 512 bytes with PQ. That’s 24x compression, in the same ballpark as SBQ’s 32x.

Comparing the Two DiskANN Implementations

pgvectorscale (Timescale)pg_diskann (Microsoft)
CompressionSBQ (1-2 bits/dim)Product Quantization (4 bits/chunk)
Compression ratio32x (1-bit) or 16x (2-bit)~24x (depends on chunks)
How it worksPer-dimension thresholdingCodebook lookup per chunk
Distance computationXOR + popcount (very fast)Table lookup + sum
TrainableMinimal (just means + stddev)Heavy (k-means per chunk)
Max dimensions16,00016,000
AvailabilityOpen source, self-hostedAzure Database for PostgreSQL
LicensePostgreSQL LicenseDistributed via Azure
Iterative scanNoYes (relaxed/strict, ON by default)
PG versionPG 14-18 (self-hosted)Azure DB for PostgreSQL

Both implementations share the core DiskANN algorithm (Vamana graph) and the two-phase search pattern (compressed scan + full-precision rescore). The difference is how they compress:

  • SBQ is simpler and faster to build (just compute means). It’s a blunt instrument — 1 bit per dimension loses a lot of information, but XOR + popcount is blazingly fast, and the rescore step recovers accuracy.
  • PQ is more sophisticated and retains more information per bit (a 4-bit symbol captures relationships between groups of dimensions). It’s slower to build (k-means training) but can achieve better recall at the same compression ratio, especially for vectors with correlated dimensions.

Takeaway: If you’re self-hosting PostgreSQL, pgvectorscale is your DiskANN option — open source, well-maintained, and the SBQ compression is effective. If you’re on Azure Database for PostgreSQL, you also have pg_diskann, whose PQ compression may give better recall on very high-dimensional data. The underlying algorithm is the same; the compression strategy is the difference.


Iterative Scans

This is the most important query-time feature added to pgvector since HNSW support. If you take one thing from this post, let it be this section.

The Problem

Vector search indexes return the K nearest neighbors, then PostgreSQL applies your WHERE clause. If your filter is selective, you get fewer results than requested.

-- Our dataset has 9 categories with varying selectivity
  category   | cnt   | % of 25K
-------------+-------+---------
 History     | 8719  | 34.9%
 General     | 6221  | 24.9%
 Science     | 2232  |  8.9%
 Mathematics |  116  |  0.5%

When you search for the 10 nearest “Science” articles, the HNSW index returns its ef_search nearest neighbors (say, 40), PostgreSQL filters to keep only Science articles, and you get however many matched. For Science (8.9%), you’ll usually get your 10 results. For Mathematics (0.5%), you won’t.

Let’s prove it. Here’s a search for Mathematics articles with ef_search=40, forcing the HNSW index path. (We disable sequential scan here to force the index path on our small 25K dataset. On production tables with 100K+ rows, the optimizer would naturally choose the index without this hint.)

SET enable_seqscan = off;
SET hnsw.iterative_scan = 'off';
SET hnsw.ef_search = 40;

SELECT id, title, category,
       content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1) AS distance
FROM articles
WHERE category = 'Mathematics'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;

Index Scan using idx_content_hnsw_halfvec on articles
  Order By: (content_halfvec <=> (InitPlan 1).col1)
  Filter: (category = 'Mathematics'::text)
  Rows Removed by Filter: 40
  Index Searches: 1
Execution Time: 0.766 ms
(0 rows)

Zero rows returned. The index fetched 40 candidates, all 40 were filtered out (none were Mathematics), and the query silently returned an empty result set. This is the core problem: your application asked for 10 results and got 0.

Before iterative scans, you had two bad options:

  1. Over-fetch (LIMIT 1000) and hope enough rows match — wasteful and unreliable
  2. Sequential scan — correct but slow on large tables

The Solution: Iterative Index Scans

pgvector 0.8.0 introduced iterative scans. Instead of fetching a fixed batch and filtering, the index keeps fetching more candidates until the filter is satisfied.

SET enable_seqscan = off;
SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 40;

SELECT id, title, category,
       content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1) AS distance
FROM articles
WHERE category = 'Mathematics'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;

Index Scan using idx_content_hnsw_halfvec on articles
  Order By: (content_halfvec <=> (InitPlan 1).col1)
  Filter: (category = 'Mathematics'::text)
  Rows Removed by Filter: 6809
  Index Searches: 171
Execution Time: 4235.805 ms
(10 rows)

10 rows returned. The index scanned 171 times, examined 6,819 candidates, filtered out 6,809 non-Mathematics rows, and delivered exactly 10 results. It took 4.2 seconds — much slower than the 0.8 ms empty result — but you got a correct answer instead of silence.

At 6,819 tuples scanned, this was well within the default max_scan_tuples of 20,000. On a table with millions of rows and the same 0.5% selectivity, the scan might hit the 20,000 limit before finding 10 matches — you’d get a partial result set. That’s the trade-off the safety valve makes: bounded latency vs guaranteed result count.

That 4.2-second cost reflects the extreme selectivity: Mathematics is only 0.5% of the data, so the index had to traverse deep into the graph. For moderate selectivity like Science (8.9%), the overhead is negligible:

SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 40;

SELECT ... WHERE category = 'Science' ORDER BY ... LIMIT 10;

Index Scan using idx_content_hnsw_halfvec on articles
  Order By: (content_halfvec <=> (InitPlan 1).col1)
  Filter: (category = 'Science'::text)
  Rows Removed by Filter: 26
  Index Searches: 1
Execution Time: 1.058 ms

Same feature, but for Science the index found 10 matching rows in a single search pass — no extra work needed.

Two Modes

  • relaxed_order: Results are approximately ordered by distance. Slightly faster. Good enough for most use cases.
  • strict_order: Results are exactly ordered by distance. Slightly slower. Use when ranking precision matters. SET hnsw.iterative_scan = ‘strict_order’;
    — Execution Time: 0.885 ms

The Safety Valve: max_scan_tuples

To prevent runaway scans on extremely selective filters (imagine filtering for a category that has 1 row in 10 million), there’s a safety limit:

SET hnsw.max_scan_tuples = 500;   -- Restrictive: stop after 500 index tuples
SET hnsw.max_scan_tuples = 20000; -- Default: generous enough for most workloads
SET hnsw.max_scan_tuples = 0;     -- Unlimited (use with caution)

IVFFlat Too

Same concept, different GUC prefix:

SET ivfflat.iterative_scan = 'relaxed_order';
SET ivfflat.probes = 3;

SELECT ... WHERE category = 'Science' ORDER BY ... LIMIT 10;
-- Execution Time: 2.001 ms

Filtered Results in Action

With iterative scan, every result matches the filter:

SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 100;

SELECT id, title, category,
       round((content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1))::numeric, 4) AS distance
FROM articles
WHERE category = 'Science'
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;

  id   |  title   | category | distance
-------+----------+----------+----------
     1 | April    | Science  |   0.0000
  7862 | April 23 | Science  |   0.2953
  9878 | April 25 | Science  |   0.3076
   469 | May      | Science  |   0.3082
  9880 | April 24 | Science  |   0.3451
   402 | July     | Science  |   0.3453
  5156 | April 4  | Science  |   0.3531
  9530 | April 7  | Science  |   0.3588
 34906 | 2013     | Science  |   0.3674
  9149 | April 8  | Science  |   0.3690

All 10 results are Science. All sorted by cosine distance. No over-fetching, no sequential scan. (The titles are Wikipedia date articles — “April”, “May”, etc. — that happen to be classified under Science in this dataset.)

Multi-Filter Combinations

Iterative scans work with compound WHERE clauses too:

SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.ef_search = 100;

SELECT id, title, category, word_count
FROM articles
WHERE category = 'Science' AND word_count > 1000
ORDER BY content_halfvec <=> (SELECT content_halfvec FROM articles WHERE id = 1)
LIMIT 10;

Limit  (actual time=7.098..7.105 rows=10.00 loops=1)
  ->  Sort  (actual time=7.097..7.101 rows=10.00 loops=1)
        Sort Key: ((content_halfvec <=> (InitPlan 1).col1))
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on articles  (actual time=0.690..7.027 rows=444.00 loops=1)
              Recheck Cond: ((word_count > 1000) AND (category = 'Science'::text))
              ->  BitmapAnd  (actual time=0.641..0.643 rows=0.00 loops=1)
                    ->  Bitmap Index Scan on idx_articles_word_count  (rows=1806.00)
                    ->  Bitmap Index Scan on idx_articles_category    (rows=2232.00)
Execution Time: 7.542 ms

Here the PostgreSQL optimizer did something clever: it combined two B-tree indexes (BitmapAnd) instead of using the HNSW iterative scan. On a 25K-row table that fits in memory, this is often cheaper. At scale (millions of rows), the iterative scan path wins because the bitmap approach requires reading too many heap pages.

Takeaway: Iterative scans are the answer to “vector search + WHERE doesn’t work.” Enable them with SET LOCAL hnsw.iterative_scan = 'relaxed_order' inside transactions. The safety valve max_scan_tuples = 20000 is a sensible default.


Quantization and Storage

With 3,072-dimension embeddings, storage is the elephant in the room. Here’s what each representation costs per row:

SELECT
  pg_size_pretty(avg(pg_column_size(content_vector)))  AS vector_size,
  pg_size_pretty(avg(pg_column_size(content_halfvec))) AS halfvec_size,
  pg_size_pretty(avg(pg_column_size(content_bq)))      AS binary_size
FROM articles WHERE content_vector IS NOT NULL;

 vector_size | halfvec_size | binary_size
-------------+--------------+-------------
 12 kB       | 6148 bytes   | 392 bytes
TypeBytes per dimensionPer-row (3072d)Savings
vector(3072)4 bytes (float32)12,288 bytesbaseline
halfvec(3072)2 bytes (float16)6,144 bytes50%
bit(3072)1/8 byte (1 bit)384 bytes97%

At 1 million rows:

TypeColumn storageHNSW index
vector(3072)~12 GBN/A (2000-dim limit)
halfvec(3072)~6 GB~7.7 GB
bit(3072)~0.4 GB~0.8 GB (bit_hamming_ops)

The storage math is brutal for high-dimensional embeddings. This is why DiskANN’s built-in SBQ compression matters: it gets you to 21 MB where HNSW on halfvec costs 193 MB.

Binary Quantize + Re-Ranking

Binary quantization crushes each dimension to a single bit (positive = 1, negative = 0). It’s lossy, but very fast for coarse filtering. The pattern is a two-phase search:

-- Phase 1: Hamming distance on binary (fast, coarse) → 100 candidates
-- Phase 2: Cosine distance on full vector (precise) → 10 results

WITH coarse AS (
    SELECT id, title, content_vector
    FROM articles
    WHERE content_bq IS NOT NULL
    ORDER BY content_bq <~> (
        SELECT binary_quantize(content_vector)::bit(3072)
        FROM articles WHERE id = 1
    )
    LIMIT 100
)
SELECT id, title,
       content_vector <=> (SELECT content_vector FROM articles WHERE id = 1) AS distance
FROM coarse
ORDER BY content_vector <=> (SELECT content_vector FROM articles WHERE id = 1)
LIMIT 10;

Limit  (actual time=29.021..29.026 rows=10.00 loops=1)
  ->  Sort on coarse  (actual time=29.020..29.023 rows=10.00 loops=1)
        ->  Subquery Scan  (actual time=27.805..28.999 rows=100.00 loops=1)
              ->  Sort by content_bq <~>  (actual time=27.744..27.754 rows=100.00 loops=1)
                    ->  Seq Scan on articles  (actual time=0.435..24.288 rows=24700.00 loops=1)
Execution Time: 29.127 ms

29 ms without an index on content_bq. With an HNSW index using bit_hamming_ops, Phase 1 would be sub-millisecond. The re-ranking in Phase 2 only touches 100 full vectors instead of 25,000.

DBA Takeaway: Use halfvec as your default for 3072-dimension embeddings. Use binary quantize + re-ranking when you need to search billions of rows and can tolerate a two-phase approach. DiskANN’s SBQ gives you similar compression automatically.


Operators and Sargability

pgvector provides four distance operators:

OperatorDistanceOperator ClassUse When
<=>Cosinevector_cosine_opsNormalized embeddings (most common)
<->L2 (Euclidean)vector_l2_opsAbsolute distance matters
<#>Inner Product (negative)vector_ip_opsPre-normalized, slight speed edge
<+>L1 (Manhattan)vector_l1_opsSparse-like behavior on dense vectors

Wrong Operator = No Index

This is the single most common mistake. If your index uses halfvec_cosine_ops but your query uses <-> (L2), the index cannot be used:

-- CORRECT: cosine operator on cosine index → Index Scan
EXPLAIN (COSTS OFF)
SELECT ... ORDER BY content_halfvec <=> (...) LIMIT 10;

 Limit
   ->  Index Scan using idx_content_hnsw_halfvec on articles
         Order By: (content_halfvec <=> (InitPlan 1).col1)

-- WRONG: L2 operator on cosine index → Seq Scan!
EXPLAIN (COSTS OFF)
SELECT ... ORDER BY content_halfvec <-> (...) LIMIT 10;

 Limit
   ->  Sort
         Sort Key: ((articles.content_halfvec <-> (InitPlan 1).col1))
         ->  Seq Scan on articles

The planner can’t use a cosine-distance index for an L2-distance query. They’re different metrics with different orderings. If you see an unexpected Seq Scan on a vector query, check your operator first.

Sargable Queries: The Cross-Join Trap

This is the pattern I see most often in the wild, and it’s wrong:

-- BAD: cross-join prevents index use
SELECT a.id, a.title,
       a.content_halfvec <=> b.content_halfvec AS distance
FROM articles a, articles b
WHERE b.id = 1
ORDER BY a.content_halfvec <=> b.content_halfvec
LIMIT 10;

Limit
  ->  Sort
        Sort Key: ((a.content_halfvec <=> b.content_halfvec))
        ->  Nested Loop
              ->  Index Scan using articles_pkey on articles b
                    Index Cond: (id = 1)
              ->  Seq Scan on articles a     ← NO INDEX!

The planner sees a.content_halfvec <=> b.content_halfvec as a join condition, not an index-scan ordering. It can’t push the ORDER BY into the vector index because the right-hand side comes from a different table reference.

The fix: use a scalar subquery:

-- GOOD: scalar subquery → index used
SELECT id, title,
       content_halfvec <=> (
           SELECT content_halfvec FROM articles WHERE id = 1
       ) AS distance
FROM articles
ORDER BY content_halfvec <=> (
    SELECT content_halfvec FROM articles WHERE id = 1
)
LIMIT 10;

Limit
  InitPlan 1
    ->  Index Scan using articles_pkey on articles articles_1
          Index Cond: (id = 1)
  ->  Index Scan using idx_content_hnsw_halfvec on articles
        Order By: (content_halfvec <=> (InitPlan 1).col1)

The scalar subquery is evaluated once (InitPlan), then the result is treated as a constant. Now the planner can push the ORDER BY into the HNSW index scan. Same results, but with the index instead of a sequential scan.

Takeaway: For this nearest-neighbor ORDER BY pattern, always use scalar subqueries for vector distance calculations, never cross-joins. This is the vector-search equivalent of writing sargable WHERE clauses for B-tree indexes.

Partial Indexes

If you frequently filter by a specific category, a partial index is dramatically more efficient:

CREATE INDEX idx_content_hnsw_science
ON articles USING hnsw (content_halfvec halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64)
WHERE category = 'Science';

Time: 1420.207 ms (00:01.420)

        indexname         |  size
--------------------------+--------
 idx_content_hnsw_halfvec | 193 MB   ← full table (25,000 rows)
 idx_content_hnsw_science |  17 MB   ← Science only (2,232 rows)

11x smaller, 20x faster to build. If your application queries consistently filter by a known set of categories, partial indexes are the single biggest optimization available. Build one per high-value category.


Monitoring

Index Inventory

SELECT indexname,
       pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
       indexdef
FROM pg_indexes
WHERE tablename = 'articles'
ORDER BY pg_relation_size(indexname::regclass) DESC;

        indexname        |  size   | indexdef
-------------------------+---------+-------------------------------------------
 idx_content_hnsw_halfvec| 193 MB  | USING hnsw ... WITH (m='16', ef_construction='64')
 idx_content_ivfflat     | 193 MB  | USING ivfflat ... WITH (lists='25')
 idx_content_diskann     | 21 MB   | USING diskann ... WITH (storage_layout=memory_optimized)
 articles_pkey           | 1384 kB | USING btree (id)
 idx_articles_category   | 1192 kB | USING btree (category)
 idx_articles_word_count | 904 kB  | USING btree (word_count)

Adding up the three vector indexes (193 + 193 + 21 = 407 MB) plus B-tree indexes (3 MB), the total index footprint is over 410 MB for a 90 MB table. The indexes are ~4.5x the data. This is typical for high-dimensional vector data — plan your storage accordingly.

Note: In practice you’d pick one vector index, not all three. With just HNSW + B-tree indexes, the ratio drops to ~2x.

Settings

Know what settings exist and what they default to:

SELECT name, setting, short_desc
FROM pg_settings
WHERE name ~ '^(hnsw|ivfflat|diskann)\.'
ORDER BY name;

                    name                    | setting | short_desc
--------------------------------------------+---------+-------------------------------------------
 hnsw.ef_search                             | 40      | Dynamic candidate list size for search
 hnsw.iterative_scan                        | off     | Mode for iterative scans
 hnsw.max_scan_tuples                       | 20000   | Max tuples to visit for iterative scans
 hnsw.scan_mem_multiplier                   | 1       | Multiple of work_mem for iterative scans
 ivfflat.iterative_scan                     | off     | Mode for iterative scans
 ivfflat.max_probes                         | 32768   | Max probes for iterative scans
 ivfflat.probes                             | 1       | Number of probes
 diskann.query_search_list_size             | 100     | Search list size for queries
 diskann.query_rescore                      | 50      | Rescore candidates

Takeaway: hnsw.iterative_scan and ivfflat.iterative_scan default to off. If your application relies on vector search with WHERE clauses, you need to explicitly enable iterative scans.

Access Method Capabilities

Not sure which operator class works with which index type? Query the catalog:

SELECT am.amname AS access_method,
       opc.opcname AS operator_class,
       t.typname AS data_type
FROM pg_opclass opc
JOIN pg_am am ON am.oid = opc.opcmethod
JOIN pg_type t ON t.oid = opc.opcintype
WHERE am.amname IN ('hnsw', 'ivfflat', 'diskann')
ORDER BY am.amname, opc.opcname;
Access MethodData TypesOperator Classes
HNSWvector, halfvec, bit, sparsevec18 classes (broadest support)
IVFFlatvector, halfvec, bit7 classes
DiskANNvector only (+ label filtering)4 classes

HNSW is the most versatile. DiskANN is the most constrained. IVFFlat falls in between.

Build Progress Monitoring

While building a large index:

SELECT phase,
       round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS pct_done,
       tuples_done, tuples_total
FROM pg_stat_progress_create_index;

This works for HNSW and IVFFlat builds (pgvector reports progress). DiskANN builds from pgvectorscale don’t currently report to this view.


Decision Guidelines

Here’s how to choose:

Start with HNSW unless you have a reason not to. It has the broadest operator support, good recall, and predictable performance. Use halfvec if your dimensions exceed 2,000.

Choose IVFFlat if build time matters more than query time. IVFFlat builds 5-6x faster than HNSW. If your data distribution shifts materially, plan a REINDEX to refresh clustering quality (centroid drift). A practical signal: if recall drops without any change in query patterns, or if you’ve inserted more than ~30% new rows since the last build, the centroids are likely stale. For a deeper look at how embedding model upgrades trigger reindexing at scale, see Embedding Versioning with pgvector.

Choose DiskANN if storage is the constraint. The 9x compression is decisive at scale. It handles high dimensions natively (no halfvec needed) and supports CONCURRENTLY for production deployments.

Enable iterative scans for vector search + filtering in production. The trade-off is potentially higher latency on very selective filters (the index does more work to find matching rows), but that’s usually the right trade-off for correctness. Tune max_scan_tuples / max_probes to bound worst-case work. Use relaxed_order by default, strict_order when ranking precision matters.

Use partial indexes for category-specific searches. An 11x size reduction and 20x faster build is hard to argue with.

Use SET LOCAL for all vector parameter changes in production AFTER having tested them.


Summary

FeatureHNSWIVFFlatDiskANN (pgvectorscale)DiskANN (pg_diskann)
ProviderpgvectorpgvectorTimescale (open source)Microsoft (Azure DB for PG)
Max dims (vector)2,0002,00016,00016,000
Max dims (halfvec)4,0004,000N/AN/A
CompressionVia halfvecVia halfvecSBQ (1-2 bits/dim)PQ (4 bits/chunk)
Build time (25K, 3072d)29s5s49sN/A
Index size193 MB193 MB21 MBSimilar
Query time2-6 ms2-10 ms3 ms~3 ms
Key tuning paramef_searchprobesquery_search_list_sizesearch list / PQ params
Iterative scanYesYesNoYes (ON by default)
CONCURRENTLYYesYesYes (0.9.0+)Yes
Data typesvector, halfvec, bit, sparsevecvector, halfvec, bitvector onlyvector only

Dimension limits are largely explained by PostgreSQL’s 8 KB page size and encoding density (exact cutoffs are implementation-defined):

EncodingBits/dimTheoretical maxActual limitWho uses it
float32 (vector)322,0482,000HNSW, IVFFlat
float16 (halfvec)164,0964,000HNSW, IVFFlat
PQ symbol416,38416,000pg_diskann
SBQ binary165,53616,000pgvectorscale

The compression story is ultimately a story about how many bits of information you need per dimension to navigate the index. pgvector stores full-precision values; DiskANN stores just enough to find the right neighborhood, then goes back to the heap for exact distances.
But here’s the thing: exact results are not necessarily what you want. In a RAG pipeline, the retrieved documents are context for a language model that will synthesize and rephrase an answer — not return rows verbatim. Whether your top-10 results are ranked by exact cosine distance or by a 96%-accurate approximation rarely changes the generated answer. The same is true for recommendations, semantic deduplication, and most classification workflows: the downstream consumer is tolerant of small ranking variations.

The real production trade-off is not precision vs approximation — it’s the balance between retrieval speed, resource efficiency, and result quality at your scale. An HNSW index that doesn’t fit in shared_buffers and hits disk on every query will give you worse effective results than a DiskANN index that stays cached and returns slightly less precise distances in a fraction of the time. The best retrieval is the one that actually runs within your latency budget.

The lab with all SQL scripts, and Python embedding pipeline are available here : lab/06_pgvector_indexes.


All benchmarks: PostgreSQL 18, pgvector 0.8.1, pgvectorscale 0.9.0, 25K Wikipedia articles, 3072d text-embedding-3-large embeddings, (4 vCPUs, 8 GB RAM).