Introduction
This is Part 2 of the embedding versionin, in Part 1, I covered the theory: why event-driven embedding refresh matters, the three levels of architecture (triggers, logical replication, Flink CDC), and how to detect and skip insignificant changes. If you haven’t read it, go there first, this post won’t through the entire intent of the designs but just demonstrate how it can work.
Here, I’m going to run the whole thing on the Wikipedia dataset from the pgvector_RAG_search_lab repository. 25,000 articles, triggers, OpenAI API calls, real numbers.
The goal is to answer the questions you’d actually have when implementing this:
- How do you adapt the schema to an existing table that wasn’t designed for versioning?
- What do the SKIP vs EMBED decisions actually look like with real data?
- Does
SELECT FOR UPDATE SKIP LOCKEDreally work with concurrent workers? - What does the freshness monitoring report show in practice?
- How does the quality feedback loop close the circle?
All the code is in the lab/05_embedding_versioning/ directory of the repository.
What’s in the lab directory
Before diving in, here’s what each file does — so you know what you’re running:
lab/05_embedding_versioning/
├── schema.sql # DDL: tables, triggers, indexes
├── worker.py # Embedding worker (claims queue items, calls OpenAI, writes vectors)
├── change_detector.py # Compares new vs old embeddings to decide SKIP or EMBED
├── freshness_monitor.py # Generates a full health report on embedding staleness
└── examples/
├── simulate_document_changes.py # Generates a realistic mix of article mutations
├── targeted_mutations.py # Applies specific change types to specific articles
├── demo_skip_locked.py # Demonstrates concurrent worker queue distribution
├── demo_trigger_flow.py # End-to-end: UPDATE → trigger → queue → embed
└── demo_quality_drift.py # Simulates declining search quality + automatic re-queuing
Every script connects to the local wikipedia database and uses the same embedding queue. They’re designed to run sequentially — each step builds on the state left by the previous one.
The Starting Point
My lab environment runs PostgreSQL 17.6 with pgvector 0.8.0 and pgvectorscale (DiskANN). The articles table already has 25,000 Wikipedia articles with dense and sparse embeddings from the previous labs (the sparsevec(30522) column holds SPLADE sparse vectors — 30,522 is the BERT WordPiece vocabulary size):
wikipedia=# \d articles
Table "public.articles"
Column | Type | Collation | Nullable | Default
------------------------+------------------+-----------+----------+---------
id | integer | | not null |
url | text | | |
title | text | | |
content | text | | |
title_vector | vector(1536) | | |
content_vector | vector(1536) | | |
vector_id | integer | | |
content_tsv | tsvector | | |
title_content_tsvector | tsvector | | |
content_sparse | sparsevec(30522) | | |
title_vector_3072 | vector(3072) | | |
content_vector_3072 | vector(3072) | | |
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"articles_content_3072_diskann" diskann (content_vector_3072)
"articles_sparse_hnsw" hnsw (content_sparse sparsevec_cosine_ops)
"articles_title_vector_3072_diskann" diskann (title_vector_3072)
"idx_articles_content_tsv" gin (content_tsv)
"idx_articles_title_content_tsvector" gin (title_content_tsvector)
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON articles FOR EACH ROW ...
tsvupdate BEFORE INSERT OR UPDATE ON articles FOR EACH ROW ...
No content_hash, no updated_at, no versioned embeddings. This is the reality of most existing deployments — you need to retrofit versioning without breaking what’s already working.
Step 1: Apply the Versioning Schema
What schema.sql does
The schema file adapts the generic pattern from Part 1 to the existing articles table. It runs inside a single transaction and performs these operations in order:
- Adds two columns to
articles:content_hash TEXTandupdated_at TIMESTAMPTZ DEFAULT now() - Creates a BEFORE trigger (
trg_content_hash) that automatically computesmd5(content)before every INSERT or UPDATE of thecontentcolumn — this is our change detection fingerprint - Backfills
content_hashfor all 25,000 existing articles withUPDATE articles SET content_hash = md5(content) - Creates
article_embeddings_versioned— the versioned embeddings table withmodel_name,model_version,source_hash,is_current, and a partial DiskANN index onWHERE is_current = true - Creates
embedding_queue— the work queue withstatus,content_hash,change_type,claimed_at, and retry tracking - Creates
embedding_change_log— records every SKIP/EMBED decision with similarity scores for audit - Creates
retrieval_quality_log— for the quality feedback loop (Step 9b) - Creates an AFTER trigger (
trg_queue_embedding) that fires onINSERT OR UPDATE OF contentand inserts a queue entry automatically
Key differences from the “clean” schema in Part 1:
- No generated column for
content_hash:GENERATED ALWAYS AS (md5(content)) STOREDwould rewrite the entire 25K-row table. The BEFORE trigger achieves the same result without a table rewrite — important for large production tables. - Column-targeted trigger:
AFTER UPDATE OF contentinstead ofAFTER UPDATE. The trigger only fires when thecontentcolumn is touched — title-only or metadata-only updates are ignored at the PostgreSQL level, not inside application code. - Table naming:
article_embeddings_versioned(notdocument_embeddings) to match the existingarticlestable naming convention.
psql -d wikipedia -f lab/05_embedding_versioning/schema.sql
BEGIN
ALTER TABLE
CREATE FUNCTION
DROP TRIGGER
CREATE TRIGGER
UPDATE 25000
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
DROP TRIGGER
CREATE TRIGGER
COMMIT
Let me walk through the important lines:
ALTER TABLE— addscontent_hashandupdated_atcolumnsCREATE FUNCTION+CREATE TRIGGER(first pair) — the BEFORE trigger that computesmd5(content)UPDATE 25000— the backfill. This is the most expensive line: PostgreSQL computes MD5 for every article and writes the hash. On 25K rows it takes a few seconds; on millions of rows, plan a maintenance windowCREATE TABLE+CREATE INDEX(×3) — the versioned embeddings table with its partial DiskANN index, version lookup index, and staleness detection indexCREATE FUNCTION+CREATE TRIGGER(second pair) — the AFTER trigger that queues embedding work
After applying, the table now has versioning infrastructure:
wikipedia=# \d articles
Column | Type | Nullable | Default
------------------------+--------------------------+----------+---------
...existing columns...
content_hash | text | |
updated_at | timestamp with time zone | | now()
Referenced by:
TABLE "article_embeddings_versioned" CONSTRAINT ... FOREIGN KEY (article_id) ...
TABLE "embedding_queue" CONSTRAINT ... FOREIGN KEY (article_id) ...
Triggers:
trg_content_hash BEFORE INSERT OR UPDATE OF content ON articles ...
trg_queue_embedding AFTER INSERT OR UPDATE OF content ON articles ...
tsvectorupdate BEFORE INSERT OR UPDATE ON articles ...
tsvupdate BEFORE INSERT OR UPDATE ON articles ...
Two new triggers alongside the existing tsvector triggers. They coexist without conflict because trg_content_hash is BEFORE (updates the hash) and trg_queue_embedding is AFTER (queues the embedding work using the already-computed hash).
Five new tables: article_embeddings_versioned, embedding_queue, embedding_change_log, retrieval_quality_log, and the queue’s indexes.
Step 2: Test the Trigger Manually
Before running anything complex, verify the trigger actually works. This is just a sanity check — one UPDATE, then look at the queue:
wikipedia=# SELECT id, title, content_hash FROM articles WHERE id = 1;
id | title | content_hash
----+-------+----------------------------------
1 | April | 47761052aee1158134fc07f3f7337952
wikipedia=# UPDATE articles SET content = content || ' [test trigger]' WHERE id = 1;
UPDATE 1
wikipedia=# SELECT id, article_id, status, content_hash, change_type, queued_at
FROM embedding_queue ORDER BY queued_at DESC LIMIT 5;
id | article_id | status | content_hash | change_type | queued_at
----+------------+---------+----------------------------------+----------------+-------------------------------
1 | 1 | pending | 59e5ebe6fa9fce7ab87beccf6523dda6 | content_update | 2026-02-18 14:38:01.626792+00
What happened here, step by step:
- We checked article 1 (“April”) — its
content_hashwas4776... - We appended
' [test trigger]'to its content - The BEFORE trigger (
trg_content_hash) fired first, recomputingcontent_hashto59e5...(the new MD5) - The AFTER trigger (
trg_queue_embedding) fired next, inserting a row intoembedding_queuewith the new hash andchange_type = 'content_update' - The queue entry has
status = 'pending'— nothing has processed it yet
The change_type column is important: it’s how we’ll later distinguish content-triggered re-embeddings from quality-triggered ones (Step 9b).
Step 3: Simulate 50 Document Mutations
Real knowledge bases don’t get 1 change at a time. The simulate_document_changes.py script generates a realistic mix of changes to random articles.
What the script does
The script picks 50 random articles from the database and applies one of five mutation types to each, chosen by a weighted random distribution that mimics real-world editing patterns:
typo_fix(most common): appends a period or fixes a word — the kind of minor edit that shouldn’t trigger re-embeddingparagraph_add: appends a substantial paragraph (3-5 sentences) — new information that changes the semantic contentsection_rewrite: replaces a portion of the article with new text — significant semantic shiftmajor_rewrite: rewrites most of the article — entirely new embedding neededmetadata_only: changes only the title (not the content) — should NOT trigger the embedding pipeline at all
python examples/simulate_document_changes.py --count 50
Mutation Summary:
----------------------------------------
major_rewrite 3
metadata_only 6
paragraph_add 15
section_rewrite 4
typo_fix 22
TOTAL 50
This distribution is realistic: most changes are minor fixes, a smaller portion adds new content, and a few are major rewrites. The 6 metadata_only changes simulate edits to fields other than content — think correcting a title or updating a URL.
Now check the queue:
wikipedia=# SELECT status, count(*) FROM embedding_queue GROUP BY status;
status | count
---------+-------
pending | 44
50 mutations, but only 44 queue entries. Where did the other 6 go?
The 6 metadata_only mutations changed only the title (not content), so the trigger — which fires on UPDATE OF content — didn’t fire for them. Those 6 changes never reached the embedding pipeline. This is the first cost optimization, and it happens at the PostgreSQL trigger level with zero application code, zero API calls, zero overhead.
Why this matters: In a real knowledge base, a meaningful fraction of updates are metadata-only — tags, categories, status flags, author fields (in some orgs, 30-50% of all UPDATEs). Filtering them at the trigger level means your embedding worker never even sees them.
Step 4: Change Detection Without a Baseline
Now let’s run the change detector to see which items should be embedded vs skipped.
What change_detector.py does
The change detector is the “smart filter” in our pipeline. For each pending queue item, it:
- Fetches the article’s current content from the
articlestable - Looks up the most recent embedding for that article in
article_embeddings_versioned - If no previous embedding exists: marks the item as EMBED (similarity = 0.0) — there’s nothing to compare against
- If a previous embedding exists: generates a new embedding for the current content via OpenAI, computes the cosine similarity between old and new embeddings, and applies the threshold:
- Similarity ≥ 0.95 → SKIP (the semantic meaning barely changed, re-embedding would be wasteful)
- Similarity < 0.95 → EMBED (the meaning shifted enough to warrant a new embedding)
- Logs every decision to
embedding_change_logwith the similarity score — this is your audit trail
Multi-chunk articles: When an article has multiple chunks (like “Dean Martin” with 3), the detector compares against chunk_index = 0 only — the lead section, which concentrates the article’s core topic. This is a deliberate tradeoff: it’s fast (one comparison, not N), and for Wikipedia-style content where the introduction summarizes the whole article, it’s a reliable proxy. For corpora where meaning is spread more evenly across chunks, you’d want a centroid approach (average the L2-normalized chunk vectors) or max pairwise similarity across corresponding chunks. The threshold may need recalibration depending on which strategy you choose.
The --analyze-queue flag tells it to analyze all pending items without actually embedding anything. Think of it as a dry run that records decisions.
python change_detector.py --analyze-queue
2026-02-18 14:43:22 [DETECTOR] INFO Analyzing 44 pending queue items (threshold=0.95)
2026-02-18 14:43:22 [DETECTOR] INFO Article 6607: EMBED (similarity=0.0000)
2026-02-18 14:43:22 [DETECTOR] INFO Article 36870: EMBED (similarity=0.0000)
...all 44 show similarity=0.0000...
2026-02-18 14:43:22 [DETECTOR] INFO Results: 44 EMBED, 0 SKIP
Every single article shows similarity=0.0. Why?
Because article_embeddings_versioned is empty. There are no previous embeddings to compare against. The change detector hit step 3 for every article: “no previous embedding exists → must EMBED.”
This is an important operational insight: the change detector needs a baseline to work. On the very first run — or when you deploy to a new system — everything must be embedded. The SKIP optimization only kicks in on subsequent changes, after embeddings exist to compare against. If you’re migrating from a system that already has embeddings in a different format, you’d need to populate the source_hash column from those existing embeddings first to bootstrap the comparison.
Step 5: Create Baseline Embeddings
Now we need to establish that baseline. Let’s run the worker for one small batch.
What worker.py does
The worker is the component that actually calls the OpenAI API and writes embeddings to PostgreSQL. Here’s its internal flow:
- Claim items from the queue using
SELECT ... FOR UPDATE SKIP LOCKED— this is the concurrency primitive from Part 1. Multiple workers can run simultaneously, and each gets a non-overlapping set of items. - For each claimed item: fetch the article content, split it into chunks (2000-character windows with overlap), and call the OpenAI
text-embedding-3-smallAPI to generate a 1536-dimensional vector for each chunk. - Write the embeddings to
article_embeddings_versionedwithis_current = true,model_name,model_version, andsource_hash(the content’s MD5 at the moment of embedding). - Mark old embeddings for the same article as
is_current = false(soft delete — they’re kept for rollback). - Update the queue item to
status = 'completed'withprocessed_at = now().
The --once flag means “process one batch and exit” (instead of running in an infinite polling loop). The --batch-size 10 flag means “claim up to 10 items at a time.”
python worker.py --once --batch-size 10
2026-02-18 14:45:58 [8526] INFO Worker worker-once claimed 10 items
2026-02-18 14:46:02 [8526] INFO Article 6607: embedded 1 chunks
2026-02-18 14:46:02 [8526] INFO Article 36870: embedded 1 chunks
2026-02-18 14:46:05 [8526] INFO Article 19078: embedded 1 chunks
2026-02-18 14:46:05 [8526] INFO Article 7947: embedded 1 chunks
2026-02-18 14:46:05 [8526] INFO Article 75802: embedded 2 chunks
2026-02-18 14:46:05 [8526] INFO Article 5150: embedded 1 chunks
2026-02-18 14:46:06 [8526] INFO Article 55579: embedded 1 chunks
2026-02-18 14:46:06 [8526] INFO Article 92697: embedded 1 chunks
2026-02-18 14:46:06 [8526] INFO Article 49417: embedded 3 chunks
2026-02-18 14:46:06 [8526] INFO Article 70595: embedded 1 chunks
Processed 10 items
Reading the output:
claimed 10 items— the worker took 10 items from the queue using SKIP LOCKED. If another worker ran simultaneously, it would get different items.Article 6607: embedded 1 chunks— this article’s content fit within a single 2000-character chunk. One API call, one embedding vector stored.Article 75802: embedded 2 chunks— “Brandenburg Gate” was longer and required two chunks. Two API calls, two embedding vectors, both linked to the same article withchunk_index0 and 1.Article 49417: embedded 3 chunks— “Dean Martin” was the longest article in this batch, requiring three chunks.
Let’s verify the data in PostgreSQL:
wikipedia=# SELECT count(DISTINCT article_id) AS articles, count(*) AS chunks
FROM article_embeddings_versioned WHERE is_current = true;
articles | chunks
----------+--------
10 | 13
10 articles, 13 chunks. The numbers match the worker output.
Total time: ~8 seconds for 10 articles. The bottleneck is the OpenAI API call (~300-600ms per embedding request), not PostgreSQL. In this lab, the trigger overhead, queue operations, and embedding writes were all negligible compared to API latency. If you need faster throughput, the answer is more workers (see Step 8) or a local embedding model — not database optimization.
Step 6: The Real Demo — SKIP vs EMBED
Now we have a baseline: 10 articles with embeddings and known source_hash values. This is the step where the change detector can finally do its job properly.
What targeted_mutations.py does
This script applies specific, known mutation types to the 10 articles we just embedded. Unlike simulate_document_changes.py (which picks random articles and random mutations), this script is deterministic — we control exactly what changes happen so we can verify the detector’s decisions:
- 5 articles: append a single period character (
.) to the content — the smallest possible content change. This is a typo-level edit that should not change the semantic meaning at all. - 3 articles: append a substantial paragraph (~100 words of new information) — this adds genuine semantic content that should shift the embedding.
- 2 articles: rewrite the second half of the content — a major structural change that dramatically alters the meaning.
python examples/targeted_mutations.py
Embedded articles: [5150, 6607, 7947, 19078, 36870, 49417, 55579, 70595, 75802, 92697]
Article 5150: appended period (typo fix)
Article 6607: appended period (typo fix)
Article 7947: appended period (typo fix)
Article 19078: appended period (typo fix)
Article 36870: appended period (typo fix)
Article 49417: appended major paragraph
Article 55579: appended major paragraph
Article 70595: appended major paragraph
Article 75802: rewrote second half
Article 92697: rewrote second half
Done - 10 targeted mutations applied
Each of these UPDATEs fires the trigger, which creates a new queue entry. But now — unlike Step 4 — we have existing embeddings to compare against.
Now run the change detector again:
What happens inside the detector this time
For each of the 10 mutated articles, the detector:
- Takes the article’s current (modified) content
- Generates a new embedding via OpenAI
- Retrieves the existing embedding from
article_embeddings_versioned - Computes cosine similarity between old and new
- Applies the 0.95 threshold
For the 34 other pending items (from Step 3, still without baseline embeddings), it still returns similarity=0.0.
python change_detector.py --analyze-queue
...34 articles without baseline still show EMBED (similarity=0.0000)...
2026-02-18 14:54:03 [DETECTOR] INFO Article 5150: SKIP (similarity=0.9981)
2026-02-18 14:54:03 [DETECTOR] INFO Article 6607: SKIP (similarity=0.9994)
2026-02-18 14:54:03 [DETECTOR] INFO Article 7947: SKIP (similarity=0.9993)
2026-02-18 14:54:03 [DETECTOR] INFO Article 19078: SKIP (similarity=0.9993)
2026-02-18 14:54:04 [DETECTOR] INFO Article 36870: SKIP (similarity=0.9997)
2026-02-18 14:54:04 [DETECTOR] INFO Article 49417: EMBED (similarity=0.9263)
2026-02-18 14:54:04 [DETECTOR] INFO Article 55579: EMBED (similarity=0.9255)
2026-02-18 14:54:04 [DETECTOR] INFO Article 70595: EMBED (similarity=0.9369)
2026-02-18 14:54:04 [DETECTOR] INFO Article 75802: EMBED (similarity=0.6256)
2026-02-18 14:54:04 [DETECTOR] INFO Article 92697: EMBED (similarity=0.5090)
2026-02-18 14:54:04 [DETECTOR] INFO Results: 39 EMBED, 5 SKIP
Reading the results
What the similarity numbers mean:
- 0.998–0.999 (typo fixes): The old and new embeddings are nearly identical. Adding a period barely shifts the vector in 1536-dimensional space. The detector correctly says: “this content hasn’t meaningfully changed — skip the re-embed.” That avoids 5 unnecessary write operations, index churn, and version flips.
- 0.925–0.937 (paragraph additions): Adding 100 words of new information shifts the embedding enough to drop below 0.95. The detector correctly says: “the semantic content changed — re-embed.” The new paragraph about Dean Martin’s film career or Brandenburg Gate’s Cold War history needs to be reflected in the vector.
- 0.509–0.626 (section rewrites): Rewriting half the article dramatically changes the meaning. These similarities are far below the threshold — clearly needing re-embedding.
- 0.0 (no baseline): The 34 articles from Step 3 that still have no embeddings. Can’t compare what doesn’t exist yet.
Cost honesty note: The detector uses embedding similarity, which means it calls OpenAI once per article to generate the comparison vector — even for articles it ultimately SKIPs. So SKIP doesn’t eliminate API spend; it eliminates unnecessary writes, index churn, and version flips. For single-chunk articles (the majority in this lab), the detection call is the same cost as the embedding call itself. The real savings show up with multi-chunk articles: the detector spends 1 API call to decide, versus N calls to re-embed all chunks. In production, you’d add cheaper pre-filters first,
content_hashcomparison (free, catches identical content), text diff ratio (cheap, catches typos), and reserve embedding-similarity checks for borderline cases where the content changed but the semantic impact is unclear. That’s the graduation path Part 1 describes.
The key insight: there’s a clean gap between the typo group (lowest: 0.9981) and the paragraph group (highest: 0.9369). That gap from 0.937 to 0.998 is where our 0.95 threshold sits. It doesn’t fall in ambiguous territory. The change types cluster naturally, which is what makes threshold-based detection practical in the real world.
The queue now reflects the decisions:
wikipedia=# SELECT status, count(*) FROM embedding_queue GROUP BY status;
status | count
-----------+-------
skipped | 5
completed | 10
pending | 39
- 5 skipped: the typo-level changes — unnecessary writes avoided, no quality loss
- 10 completed: the baseline embeddings from Step 5
- 39 pending: 34 no-baseline articles + 5 newly-detected EMBED items, waiting for the worker
The skipped status is an audit trail — you can always go back and see what was skipped, when, and at what similarity score (recorded in embedding_change_log).
Step 7: Freshness Monitoring Report
In production, you need a dashboard — not individual log lines. The freshness_monitor.py script consolidates all the monitoring queries from Part 1 into a single diagnostic report.
What freshness_monitor.py does
The script runs five monitoring queries against the database and formats them into a human-readable report:
- Freshness summary: How many articles have embeddings? How many are stale (content changed since last embedding)?
- Stale articles detail: Which specific articles have drifted — showing both the current content hash and the embedding’s source hash so you can see the mismatch
- Queue health: Breakdown by status with timestamps — tells you if items are stuck or if the queue is draining properly
- Version coverage: Which embedding models are in use and how many articles/chunks each covers
- Change detection decisions: Aggregated SKIP/EMBED statistics with average similarity scores
python freshness_monitor.py --report
Embedding Freshness Report — 2026-02-18 14:57:53
============================================================
Freshness Summary
============================================================
Total articles: 25000
With embeddings: 10 (0.0%)
Without embeddings: 24990
Stale embeddings: 10 (100.0%)
Reading this: Only 10 of 25,000 articles have versioned embeddings (from Step 5). All 10 are “stale” because we just mutated all of them in Step 6. In a real deployment, you’d see something like “23,450 with embeddings (93.8%), 312 stale (1.3%)” — and you’d alert if stale exceeded, say, 5%.
============================================================
Stale Articles (content changed since embedding)
============================================================
ID | Title | Current Hash | Embed Hash | ...
------+----------------------------------------+------------------+------------------+----
5150 | 1787 | 5b14bc4a2d... | 11e81bc4de... | ...
6607 | Needle | 3ebb3c3cbb... | 5c5290b5a7... | ...
49417 | Dean Martin | 7061f1803f... | f7fd9f30e6... | ...
75802 | Brandenburg Gate | 7da53df7a0... | 5a2dcc01f9... | ...
...6 more...
The Current Hash and Embed Hash columns are the two MD5 fingerprints. When they don’t match, it means the article’s content has changed since we last generated its embedding. Article 5150 (“1787”) shows different hashes even though we only appended a period — the MD5 captures any content change, even trivial ones. The change detector is what decides whether the difference matters semantically (and it said SKIP for this one).
============================================================
Queue Health
============================================================
Status | Count | Oldest | Newest
----------+-------+------------------------+------------------------
pending | 39 | 2026-02-18 14:39:59 | 2026-02-18 14:53:55
completed | 10 | 2026-02-18 14:39:58 | 2026-02-18 14:39:59
skipped | 5 | 2026-02-18 14:53:55 | 2026-02-18 14:53:55
The queue is healthy but has a backlog. 39 items pending, oldest from ~15 minutes ago. In production, you’d watch the gap between “Oldest” and “Newest” — if the oldest item keeps getting older while new items are added, your workers can’t keep up. That’s when you scale up workers (see Step 8) or increase batch size.
The 10 completed items are from Step 5, the 5 skipped from Step 6’s change detector.
============================================================
Embedding Version Coverage
============================================================
Model Version | Articles | Chunks | Current
-----------------------+----------+--------+--------
text-embedding-3-small | 10 | 13 | 13
Only one model version in use, covering 10 articles with 13 chunks, all current. During a blue-green model upgrade (Part 1’s model versioning section), you’d see two rows here — v1 and v2 — and track coverage convergence.
============================================================
Change Detection Decisions
============================================================
Decision | Count | Avg Similarity
---------+-------+---------------
EMBED | 83 | 0.0473
SKIP | 5 | 0.9992
The average similarity for EMBED decisions is 0.0473 because most of those 83 decisions had similarity=0.0 (no baseline). The 5 SKIPs have an average of 0.9992 — confirming these were truly trivial changes. In a mature deployment, the EMBED average similarity would be higher (0.7–0.9 range) and the SKIP/EMBED ratio would tell you how efficient your threshold is.
Step 8: SKIP LOCKED — Multi-Worker Concurrency
This is the demo that proves the theory from Part 1’s deep dive on SELECT FOR UPDATE SKIP LOCKED.
What demo_skip_locked.py does
The script launches multiple Python threads that each behave like independent embedding workers. Each thread:
- Opens its own database connection
- Runs
UPDATE embedding_queue SET status='processing' WHERE queue_id IN (SELECT queue_id FROM embedding_queue WHERE status='pending' ORDER BY queued_at FOR UPDATE SKIP LOCKED LIMIT n)— the exact same claim query the real worker uses (note theORDER BY queued_at— without it, selection order is not deterministic and oldest-first is not guaranteed) - Records which
queue_idvalues it got - Does NOT actually call OpenAI (this is a concurrency demo, not an embedding demo)
After all threads finish, the script checks for overlap: did any two workers claim the same item? The answer should always be zero.
python examples/demo_skip_locked.py --workers 4 --items 39
============================================================
Demo: SKIP LOCKED Multi-Worker Concurrency
Workers: 4 | Target items: 39
============================================================
Launching 4 workers (each requesting up to 14 items)...
demo-worker-0: claimed 14 items (articles: [96746, 37330, 67708, 32834, 46541]...)
demo-worker-1: claimed 14 items (articles: [57924, 20028, 65749, 92016, 24921]...)
demo-worker-2: claimed 11 items (articles: [66390, 27221, 30148, 97917, 30449]...)
demo-worker-3: claimed 0 items (articles: [])
========================================
Total items claimed: 39
Unique articles: 39
Elapsed time: 0.05s
ZERO OVERLAP — SKIP LOCKED working correctly!
============================================================
Reading the output
- 14 + 14 + 11 + 0 = 39 — every pending item was claimed exactly once
- Zero overlap — no item was processed by more than one worker
- 0.05 seconds — the entire distribution happened in 50 milliseconds
- Worker 3 got 0 items: This is actually the ideal outcome. The first 3 workers were fast enough to drain the queue before Worker 3’s
SELECT ... SKIP LOCKEDcould find any unlocked rows. In a real deployment where each item takes 300-500ms (OpenAI API call), all 4 workers would stay busy and you’d see approximately even distribution.
Why SKIP LOCKED and not regular FOR UPDATE? With regular FOR UPDATE, Worker 1 would lock rows and Worker 2 would wait (block) until Worker 1’s transaction commits. With SKIP LOCKED, Worker 2 skips the locked rows and grabs the next available ones immediately. No blocking, no deadlocks, no coordination.
This is pure PostgreSQL. No Redis, no RabbitMQ, no SQS. One SQL query, one feature (SKIP LOCKED), and you have a production-grade concurrent work queue. If you need to process your embedding queue faster, just add workers — throughput scales linearly.
Step 9a: End-to-End Trigger Flow
Every previous step ran parts of the pipeline in isolation. This demo shows the complete lifecycle of a single article change — from UPDATE to searchable embedding.
What demo_trigger_flow.py does
The script picks one article and walks through the full pipeline synchronously:
- Checks the queue for this article (should be empty)
- Updates the article’s content (appending demo text)
- Verifies the trigger fired by checking the queue again (should now have a pending entry)
- Shows the article’s metadata (new content_hash, updated_at)
- Runs the worker for exactly this one item (calls OpenAI, writes embeddings)
- Verifies the embeddings are in
article_embeddings_versioned
python examples/demo_trigger_flow.py
============================================================
Demo: End-to-End Trigger Flow
Article: [86698] Thin film transistor liquid crystal display
============================================================
1. Queue entries (pending) for article 86698 BEFORE update: 0
Nothing in the queue yet — clean starting state.
2. Updated article content (appended demo text)
An UPDATE articles SET content = content || '...' WHERE id = 86698 just ran. Two triggers fired: trg_content_hash (BEFORE, recomputed the MD5) and trg_queue_embedding (AFTER, inserted a queue entry).
3. Trigger fired! Queue entry created:
Queue ID: 57
Status: pending
Content Hash: b5a7c0820832fd54...
Queued At: 2026-02-18 15:05:29.303062+00:00
The trigger did its job. A new pending item is in the queue with the article’s current content hash. Note the timestamp — in this lab, the trigger overhead was negligible.
4. Article metadata updated:
Content Hash: b5a7c0820832fd54...
Updated At: 2026-02-18 15:05:29.303062+00:00
The article’s content_hash matches the queue entry’s hash — they were set by the same trigger. This hash will later be stored as source_hash on the embedding, creating the audit chain: “this embedding was generated from this exact version of the content.”
5. Running worker for one batch...
Article 86698: embedded 3 chunks
Processed 1 items
The worker claimed this item, called OpenAI 3 times (3 chunks), and wrote the embeddings to article_embeddings_versioned.
6. Embeddings for article 86698:
Current chunks: 3
Last created: 2026-02-18 15:05:29.388968+00:00
============================================================
Demo complete!
============================================================
The complete flow — from content modification to searchable embeddings — took about 1 second. The latency breakdown: ~50ms for PostgreSQL (trigger + queue + insert), ~900ms for OpenAI (3 embedding API calls). In a production system with a continuously running worker, this latency would be the norm for every content change.
Step 9b: Quality Feedback Loop
The final piece, and the one that closes the architecture. Everything so far reacts to content changes. But what if the embeddings are technically “fresh” (content hasn’t changed) yet search quality is degrading? Maybe the model isn’t capturing certain topics well, or the chunking strategy doesn’t work for some article types.
What demo_quality_drift.py does
This script simulates the quality feedback loop described in Part 1’s monitoring section. It works in four phases:
Phase 1 — Simulate retrieval quality logs: The script generates 20 fake search queries with associated quality metrics (nDCG, precision@k, user satisfaction scores). It deliberately creates a pattern where quality metrics decline for certain topics — simulating what would happen if embeddings for some subject areas became less effective over time.
Phase 2 — Quality analysis: The script scans retrieval_quality_log looking for queries with poor results: low nDCG scores (below a configurable threshold) or negative user feedback. It identifies 8 queries where quality dropped.
Phase 3 — Article correlation: For each poor-performing query, the script finds related articles using title ILIKE '%keyword%' matching. This is a simplified version of what a production system would do (where you’d use the query’s actual retrieved results instead of keyword matching). It identifies 29 articles that might be causing poor search results.
Phase 4 — Automatic re-queuing: All 29 articles are inserted into embedding_queue with change_type = 'quality_reembed' instead of the usual 'content_update'. This distinction is critical — it means the re-embedding is happening not because the content changed, but because the quality metrics flagged a problem.
python examples/demo_quality_drift.py
The demo runs through all four phases and produces a final queue state:
wikipedia=# SELECT change_type, status, count(*)
FROM embedding_queue GROUP BY change_type, status ORDER BY change_type, status;
change_type | status | count
-----------------+-----------+-------
content_update | completed | 50
content_update | skipped | 5
quality_reembed | pending | 29
Reading the queue state
Three distinct categories tell the full pipeline story:
- 50
content_update/completed: the normal pipeline flow — content changed, trigger fired, worker embedded. This is Layers 1 and 2 doing their job. - 5
content_update/skipped: the typo-level changes from Step 6 — the change detector said “not worth re-embedding.” This is Layer 2’s cost optimization. - 29
quality_reembed/pending: the feedback loop’s contribution — these articles weren’t re-queued because their content changed (it may not have). They were re-queued because search quality dropped for queries related to them.
Why the quality_reembed change type matters: When the worker processes these items, it bypasses the change significance detector. If the detector were to analyze them, it might say “similarity=0.998 → SKIP” because the content barely changed. But that’s the whole point — the content didn’t change, yet the embeddings aren’t serving search well. The quality feedback overrides the filter.
This is the three-layer architecture from Part 1 working in practice:
- Triggers (Layer 1): react to content changes immediately — the broadest net
- Change significance (Layer 2): filter out trivial changes, saving API cost — the optimization layer
- Quality feedback (Layer 3): catch what the filter missed or what wasn’t about content changes at all — the safety net
Each layer compensates for the blind spots of the previous one.
Key Takeaways
1. The trigger is smarter than you think. Using UPDATE OF content means metadata-only changes never touch the embedding pipeline. In our test, 12% of mutations (6 out of 50) were filtered out at the trigger level, before any Python code ran. In a real knowledge base with tag edits, status changes, and metadata updates, this fraction could be substantially higher.
2. The change detector needs a baseline. On the first run, every article shows similarity=0.0 because there’s nothing to compare against. This is correct behavior, but you need to plan for the initial backfill being 100% EMBED. Budget the API cost and time accordingly.
3. The 0.95 threshold is validated. Typo-level changes (appending a period) scored 0.998+, paragraph additions scored ~0.93, and section rewrites scored 0.51–0.63. There’s a clear gap between “trivial” and “significant” that the threshold exploits. You don’t need machine learning or complex heuristics — cosine similarity with a simple threshold works.
4. SKIP LOCKED is production-ready. 4 workers, 39 items, zero overlap, 0.05 seconds. No external dependencies, no coordination service. This is the simplest correct way to build a concurrent work queue in PostgreSQL. Need more throughput? Add workers.
5. Quality metrics close the loop. The change significance filter reduces unnecessary writes and index churn, but it can’t know if a small change was semantically important — or if the embedding was poor to begin with. The quality feedback loop catches those cases by correlating low-quality retrievals with specific articles and forcing re-embedding. Three layers, each compensating for the blind spots of the previous one.
6. The bottleneck is the API, not PostgreSQL. 10 articles embedded in ~8 seconds, with each OpenAI call taking 300-600ms. In this lab, PostgreSQL’s trigger + queue overhead was negligible compared to API latency. If you need faster throughput, add workers (SKIP LOCKED scales linearly) or switch to a local embedding model like nomic-embed-text via Ollama.
Running It Yourself
git clone https://github.com/boutaga/pgvector_RAG_search_lab.git
cd pgvector_RAG_search_lab
# Ensure Wikipedia database is loaded (see Lab 2 in README)
# You'll need: PostgreSQL 17+, pgvector, pgvectorscale, an OpenAI API key
# Step 1: Apply schema
psql -d wikipedia -f lab/05_embedding_versioning/schema.sql
# Step 3: Simulate changes (Step 2 is a manual SQL test)
python lab/05_embedding_versioning/examples/simulate_document_changes.py --count 50
# Step 4: Run change detector (all EMBED on first run — no baseline yet)
python lab/05_embedding_versioning/change_detector.py --analyze-queue
# Step 5: Create baseline embeddings (requires OPENAI_API_KEY env var)
python lab/05_embedding_versioning/worker.py --once --batch-size 10
# Step 6: Apply targeted mutations, then re-run detector
python lab/05_embedding_versioning/examples/targeted_mutations.py
python lab/05_embedding_versioning/change_detector.py --analyze-queue
# Step 7: Full freshness report
python lab/05_embedding_versioning/freshness_monitor.py --report
# Step 8: SKIP LOCKED concurrency demo
python lab/05_embedding_versioning/examples/demo_skip_locked.py --workers 4
# Step 9a: End-to-end trigger flow
python lab/05_embedding_versioning/examples/demo_trigger_flow.py
# Step 9b: Quality feedback loop
python lab/05_embedding_versioning/examples/demo_quality_drift.py
What’s Next
In the next post, I’ll explore benchmarking pgvectorscale’s StreamingDiskANN at scale — with real numbers on query latency, recall, index build time, and memory footprint at different dataset sizes. We’ll use the same Wikipedia dataset and the versioned embedding infrastructure from this lab.