{"id":43038,"date":"2026-02-22T15:23:42","date_gmt":"2026-02-22T14:23:42","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=43038"},"modified":"2026-02-22T22:41:24","modified_gmt":"2026-02-22T21:41:24","slug":"rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/","title":{"rendered":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>&#8220;Make it simple.&#8221; This is a principle I keep repeating, and I&#8217;ll repeat it again here. Because when it comes to keeping your RAG system&#8217;s embeddings fresh, the industry has somehow made it complicated. External orchestrators, custom Python cron jobs, microservices that call microservices, Airflow DAGs with 47 tasks,  all to answer a simple question: <strong>when my source data changes, how do I update the corresponding embeddings?<\/strong><\/p>\n\n\n\n<p>If you&#8217;ve followed this RAG series from <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/\">Naive RAG<\/a> through <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/\">Hybrid Search<\/a>, <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\">Adaptive RAG<\/a>, and <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-agentic-rag\/\">Agentic RAG<\/a>, you&#8217;ve seen how retrieval quality is the backbone of any RAG system. But here&#8217;s what I didn&#8217;t cover explicitly: <strong>what happens when your retrieval quality silently degrades because your embeddings are stale?<\/strong><\/p>\n\n\n\n<p>This is the silent killer of RAG in production. Nobody complains about the embedding pipeline, they complain that the chatbot gives wrong answers. And by the time you trace it back to stale embeddings, the trust is already gone.<\/p>\n\n\n\n<p>In this post, I want to bridge two worlds that I&#8217;ve been working in simultaneously: the <strong>CDC\/event-driven pipelines<\/strong> I demonstrated in my <a href=\"https:\/\/www.dbi-services.com\/blog\/postgresql-cdc-to-jdbc-sink-minimal-event-driven-architecture\/\">PostgreSQL CDC to JDBC Sink<\/a> and <a href=\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\">Oracle to PostgreSQL Migration with Flink CDC<\/a> posts, and the <strong>RAG\/pgvector<\/strong> world from this series.<\/p>\n\n\n\n<p>The thesis is straightforward: <strong>if you&#8217;re serious about production RAG, you need event-driven embedding refresh. Batch re-embedding is technical debt waiting to happen.<\/strong> Event-driven architecture and data pipelines are a precondition to hosting similarity search. Organizations that are still 100% batch-processed are all migrating towards event-driven because of a probable need for live KPIs instead of daily refreshes. This is facilitated by the current maturity of the solutions that are out there. The &#8220;hidden&#8221; bonus of streaming data from your data sources to a data lake and to your data marts is that it facilitates refreshes of embeddings as well.<\/p>\n\n\n\n<p>This is Part 1 covering the architecture and design patterns I feel are relevant. In <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-lab\/\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2<\/a>, I walk through a hands-on LAB on 25,000 Wikipedia articles with real output, actual numbers, and some of the edge cases you would encounter applying this in practice.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-problem-stale-embeddings\">The Problem: Stale Embeddings<\/h2>\n\n\n\n<p>Let me paint a picture that I&#8217;ve seen in real consulting engagements.<\/p>\n\n\n\n<p>A company builds a RAG system for internal documentation. Knowledge base: 50,000 documents in PostgreSQL. Embeddings generated with <code>text-embedding-3-small<\/code>, stored in pgvector. Everything works great on day one.<\/p>\n\n\n\n<p>Three months later:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>2,000 documents have been updated<\/li>\n\n\n\n<li>500 new documents have been added<\/li>\n\n\n\n<li>300 documents have been deprecated<\/li>\n\n\n\n<li>The embedding pipeline? It ran once during initial setup. Maybe someone re-ran it manually last month. Maybe not.<\/li>\n<\/ul>\n\n\n\n<p>The result: <strong>your vector index is lying to you<\/strong>. Similarity search returns chunks from outdated documents. The LLM generates answers based on stale context. Users lose trust.<\/p>\n\n\n\n<p>This is not a hypothetical. This is the reality of most RAG deployments I&#8217;ve encountered.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-why-batch-re-embedding-doesn-t-scale\">Why batch re-embedding doesn&#8217;t scale<\/h3>\n\n\n\n<p>The naive approach is: &#8220;just re-embed everything periodically.&#8221; Let&#8217;s do the math.<\/p>\n\n\n\n<p>For 50,000 documents, assuming an average of 10 chunks per document:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>500,000 chunks<\/strong> to embed, ~500 tokens each \u2014 that&#8217;s 250 million tokens<\/li>\n\n\n\n<li>At ~$0.02 per 1M tokens with <code>text-embedding-3-small<\/code>: <strong>~$5 per full re-embed<\/strong> (not terrible)<\/li>\n\n\n\n<li>The OpenAI embeddings endpoint accepts <strong>arrays of inputs<\/strong>, so you can batch ~100 chunks per request. That&#8217;s ~5,000 requests. At Tier 1&#8217;s 3,000 RPM, RPM isn&#8217;t the bottleneck \u2014 <strong>TPM is<\/strong>. Depending on your tier&#8217;s token-per-minute limit (check your <a href=\"https:\/\/platform.openai.com\/settings\/organization\/limits\">project limits<\/a>), the real constraint is how fast the API will accept 250M tokens. Depending on your usage tier, this could take <strong>anywhere from under an hour to several hours<\/strong> of wall-clock time.<\/li>\n\n\n\n<li>During which, if you&#8217;re replacing embeddings in-place (the typical batch approach), your index is in a <strong>partially-stale state<\/strong> \u2014 some embeddings are new, some are old. The versioned schema I&#8217;ll show below avoids this, but most batch implementations don&#8217;t bother with versioning.<\/li>\n\n\n\n<li>In our lab experience, heavy churn from bulk re-inserts can degrade <strong>StreamingDiskANN recall<\/strong> (pgvectorscale). The index handles incremental updates well, but re-embedding 500K rows at once is not &#8220;incremental&#8221; \u2014 validate this on your own workload and treat large backfills as an operational event.<\/li>\n<\/ul>\n\n\n\n<p>Now multiply this by:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multiple embedding models you might want to test <\/li>\n\n\n\n<li>Multiple environments (dev, staging, production)<\/li>\n\n\n\n<li>Frequency: weekly? daily? hourly?<\/li>\n<\/ul>\n\n\n\n<p>The cost isn&#8217;t the API calls. The cost is the <strong>operational complexity<\/strong>: coordinating the backfill, monitoring progress, handling rate limit errors, and \u2014 critically \u2014 the lack of <strong>observability<\/strong> into which documents actually changed. Batch treats every document the same, whether it was modified yesterday or hasn&#8217;t been touched in six months.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-deeper-problem-you-can-t-fix-what-you-don-t-measure\">The deeper problem: you can&#8217;t fix what you don&#8217;t measure<\/h3>\n\n\n\n<p>But there&#8217;s a problem that comes before stale embeddings, and in my consulting experience, it&#8217;s far more common: <strong>most organizations don&#8217;t measure retrieval quality at all.<\/strong> They deploy a RAG system, it works in demo, it goes to production, and then nobody instruments it. There is no precision@k, no nDCG, no confidence scoring. The embedding pipeline might be stale, or it might be fine \u2014 they literally cannot tell.<\/p>\n\n\n\n<p>In the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\">Adaptive RAG<\/a> post, I introduced the metrics framework that makes retrieval quality measurable: <strong>precision@k<\/strong> (are the retrieved documents relevant?), <strong>recall@k<\/strong> (are we finding all the relevant documents?), <strong>nDCG@k<\/strong> (are the best results ranked first?), and <strong>confidence scores<\/strong> (how certain is the system about its top result?). In the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-agentic-rag\/\">Agentic RAG<\/a> post, I added decision metrics on top of that \u2014 tracking whether the agent made the right call about when to retrieve. The evaluation framework in the <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\">pgvector_RAG_search_lab repository<\/a> (<code>lab\/evaluation\/metrics.py<\/code>, <code>compare_search_configs.py<\/code>, <code>k_balance_experiment.py<\/code>) implements all of this concretely.<\/p>\n\n\n\n<p>These metrics were originally designed to compare search strategies and tune parameters. But here&#8217;s the connection to embedding freshness that I want to make explicit: <strong>the same metrics that tell you whether your search is working also tell you whether your embeddings are drifting.<\/strong> If your weekly nDCG is declining, if your confidence distribution is shifting toward lower values, if precision@10 is dropping for a subset of queries \u2014 those are the leading indicators that your embeddings are falling behind your content. Not the queue depth, not the pipeline latency. The quality metrics.<\/p>\n\n\n\n<p>I have seen architectures where teams built elaborate embedding pipelines \u2014 cron jobs, Airflow DAGs, custom orchestrators \u2014 but never implemented the measurement layer. The pipeline runs on schedule, embeddings get refreshed, and everyone assumes it&#8217;s working. But without retrieval quality metrics, you have no way to know if you are going in the right direction. You might be re-embedding documents that don&#8217;t need it (wasting API spend) and missing documents that do (degrading search quality). Worse, I have seen setups where the metrics exist but are so poorly instrumented \u2014 wrong ground truth sets, no temporal dimension, no per-topic breakdown \u2014 that the numbers are misleading. An aggregate nDCG of 0.82 can hide the fact that an entire topic cluster has dropped to 0.45.<\/p>\n\n\n\n<p>Building the pipeline is one thing. Proving you&#8217;re going in the right direction is everything.<\/p>\n\n\n\n<p>This is why this post covers both. The first two-thirds address the pipeline: how to detect changes, how to queue and process them, how to decide what&#8217;s worth re-embedding. But the final section \u2014 <a href=\"https:\/\/claude.ai\/chat\/30d5dae6-fac6-4da7-b1b6-336dfd902a16#monitoring-embedding-freshness\">Monitoring Embedding Freshness<\/a> \u2014 is where it all comes together. That&#8217;s where the retrieval quality metrics from the Adaptive RAG post become <strong>operational canaries<\/strong> for embedding drift. The pipeline reacts to content changes; the monitoring layer tells you whether the pipeline is actually keeping your RAG system healthy. You need both.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-solution-event-driven-embedding-refresh\">The Solution: Event-Driven Embedding Refresh<\/h2>\n\n\n\n<p>The answer is the same pattern I demonstrated in the CDC posts: <strong>react to changes as they happen.<\/strong><\/p>\n\n\n\n<p>Instead of asking &#8220;when should I re-embed?&#8221;, the question becomes: <strong>&#8220;a row changed \u2014 which embeddings need updating?&#8221;<\/strong><\/p>\n\n\n\n<p>Here&#8217;s the architecture I&#8217;m proposing:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"680\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9-1024x680.png\" alt=\"\" class=\"wp-image-43062\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9-1024x680.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9-300x199.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9-768x510.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9-1536x1019.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-9.png 1588w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>There are three levels of sophistication here, and I want to walk through each one because <strong>not every project needs the most complex solution<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-level-1-postgresql-triggers-the-simplest-path\">Level 1: PostgreSQL Triggers \u2014 The Simplest Path<\/h2>\n\n\n\n<p>If your source data and embeddings live in the same PostgreSQL instance (which they probably do if you&#8217;ve been following this series), you don&#8217;t need Flink. You don&#8217;t need Kafka. You need a trigger.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-schema-design-with-versioning\">Schema design with versioning<\/h3>\n\n\n\n<p>First, let&#8217;s design a proper embedding table that supports versioning. This is the piece most tutorials skip:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Source table (your knowledge base)\nCREATE TABLE documents (\n    doc_id          BIGSERIAL PRIMARY KEY,\n    title           TEXT NOT NULL,\n    content         TEXT NOT NULL,\n    category        TEXT,\n    content_hash    TEXT GENERATED ALWAYS AS (md5(content)) STORED,\n    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),\n    is_active       BOOLEAN NOT NULL DEFAULT true\n);\n\n-- Embedding table with versioning support\nCREATE TABLE document_embeddings (\n    embedding_id    BIGSERIAL PRIMARY KEY,\n    doc_id          BIGINT NOT NULL REFERENCES documents(doc_id) ON DELETE CASCADE,\n    chunk_index     INT NOT NULL,\n    chunk_text      TEXT NOT NULL,\n    embedding       vector(1536),       -- text-embedding-3-small\n    model_name      TEXT NOT NULL DEFAULT &#039;text-embedding-3-small&#039;,\n    model_version   TEXT NOT NULL DEFAULT &#039;v1&#039;,\n    source_hash     TEXT NOT NULL,       -- md5 of the source content at embed time\n    embedded_at     TIMESTAMPTZ NOT NULL DEFAULT now(),\n    is_current      BOOLEAN NOT NULL DEFAULT true,\n    \n    UNIQUE(doc_id, chunk_index, model_name, model_version)\n);\n\n-- Index for similarity search (only current embeddings)\n-- Using pgvectorscale&#039;s StreamingDiskANN for better performance at scale\nCREATE INDEX idx_embeddings_diskann ON document_embeddings \n    USING diskann (embedding vector_cosine_ops)\n    WHERE is_current = true;\n\n-- Index for version lookups\nCREATE INDEX idx_embeddings_version ON document_embeddings (doc_id, model_version, is_current);\n\n-- Index for staleness detection\nCREATE INDEX idx_embeddings_staleness ON document_embeddings (source_hash, is_current)\n    WHERE is_current = true;\n\n-- Safety: prevent two &quot;current&quot; chunk sets for the same doc + model space\nCREATE UNIQUE INDEX uq_doc_current_per_space\n    ON document_embeddings (doc_id, model_name, model_version, chunk_index)\n    WHERE is_current;\n\n<\/pre><\/div>\n\n\n<p>A few things to notice here:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><code>content_hash<\/code><\/strong>: a generated column that gives us a fast way to detect if content actually changed (not just <code>updated_at<\/code>). If you&#8217;re adding this to an existing table with data, note that <code>ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STORED<\/code> requires touching\/recomputing all rows \u2014 plan a maintenance window, or use a <code>BEFORE UPDATE<\/code> trigger with <code>NEW.content_hash := md5(NEW.content)<\/code> instead. Both approaches are functionally equivalent.<\/li>\n\n\n\n<li><strong><code>source_hash<\/code><\/strong> on the embedding: captures what the source content looked like when the embedding was generated<\/li>\n\n\n\n<li><strong><code>is_current<\/code><\/strong>: soft versioning \u2014 old embeddings are kept for rollback. The partial unique index <code>uq_doc_current_per_space<\/code> guarantees at the database level that you can never have two &#8220;current&#8221; chunk sets for the same document within the same model space \u2014 even if your application has a bug.<\/li>\n\n\n\n<li><strong>Partial DiskANN index<\/strong>: only indexes current embeddings, so similarity search is clean and performant at scale. Partial indexes (<code>CREATE INDEX ... WHERE ...<\/code>) are standard PostgreSQL \u2014 validated in our lab with pgvectorscale&#8217;s StreamingDiskANN (see <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-lab\/\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2 \u2014 Lab Walkthrough<\/a>). If your pgvectorscale version doesn&#8217;t support partial predicates, pgvector&#8217;s HNSW partial index is an equivalent fallback.<\/li>\n\n\n\n<li><strong><code>model_version<\/code><\/strong>: critical for model upgrades (more on this later)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-embedding-queue-pattern\">The embedding queue pattern<\/h3>\n\n\n\n<p>Rather than embedding synchronously in a trigger (which would block the transaction and hit external APIs), we use a <strong>queue pattern<\/strong>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Queue table for pending embedding work\nCREATE TABLE embedding_queue (\n    queue_id        BIGSERIAL PRIMARY KEY,\n    doc_id          BIGINT NOT NULL REFERENCES documents(doc_id),\n    change_type     TEXT NOT NULL DEFAULT &#039;content_update&#039;,\n    content_hash    TEXT,\n    queued_at       TIMESTAMPTZ NOT NULL DEFAULT now(),\n    claimed_at      TIMESTAMPTZ,            -- set when a worker claims the item\n    processed_at    TIMESTAMPTZ,\n    status          TEXT NOT NULL DEFAULT &#039;pending&#039; \n                    CHECK (status IN (&#039;pending&#039;, &#039;processing&#039;, &#039;completed&#039;, &#039;failed&#039;, &#039;skipped&#039;)),\n    error_message   TEXT,\n    retry_count     INT NOT NULL DEFAULT 0\n);\n\nCREATE INDEX idx_queue_pending ON embedding_queue (status, queued_at) \n    WHERE status = &#039;pending&#039;;\n\n<\/pre><\/div>\n\n\n<p>Note the <strong><code>skipped<\/code><\/strong> status \u2014 this is used by the change significance detector (covered later) when it determines that a content change is too minor to warrant re-embedding. The item stays in the queue for audit purposes, but no embedding API call is made.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-trigger\">The trigger<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE OR REPLACE FUNCTION fn_queue_embedding_update()\nRETURNS TRIGGER AS $$\nBEGIN\n    IF TG_OP = &#039;INSERT&#039; THEN\n        INSERT INTO embedding_queue (doc_id, change_type, content_hash)\n        VALUES (NEW.doc_id, &#039;content_update&#039;, NEW.content_hash);\n        RETURN NEW;\n        \n    ELSIF TG_OP = &#039;UPDATE&#039; THEN\n        -- Only queue if content actually changed (not just metadata)\n        IF OLD.content_hash IS DISTINCT FROM NEW.content_hash THEN\n            INSERT INTO embedding_queue (doc_id, change_type, content_hash)\n            VALUES (NEW.doc_id, &#039;content_update&#039;, NEW.content_hash);\n        END IF;\n        RETURN NEW;\n        \n    ELSIF TG_OP = &#039;DELETE&#039; THEN\n        INSERT INTO embedding_queue (doc_id, change_type, content_hash)\n        VALUES (OLD.doc_id, &#039;delete&#039;, OLD.content_hash);\n        RETURN OLD;\n    END IF;\nEND;\n$$ LANGUAGE plpgsql;\n\nCREATE TRIGGER trg_embedding_queue\n    AFTER INSERT OR UPDATE OR DELETE ON documents\n    FOR EACH ROW\n    EXECUTE FUNCTION fn_queue_embedding_update();\n\n<\/pre><\/div>\n\n\n<p><strong>The key insight here<\/strong> is the <code>content_hash<\/code> comparison on UPDATE. If someone updates the <code>category<\/code> or <code>title<\/code> but the actual content hasn&#8217;t changed, we don&#8217;t waste an API call re-embedding identical text. This is a simple optimization but it saves real money at scale. In my lab tests on 25K Wikipedia articles, 12% of simulated mutations were metadata-only \u2014 the trigger correctly skipped all of them.<\/p>\n\n\n\n<p>An alternative approach that&#8217;s even more targeted: use <code>AFTER INSERT OR UPDATE OF content<\/code> to only fire the trigger when the content column is modified. This is what I did in the LAB (see <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-lab\/\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2<\/a>) because the <code>articles<\/code> table didn&#8217;t have a <code>content_hash<\/code> column originally. Both approaches achieve the same goal.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>DBA note on <code>UPDATE OF<\/code><\/strong>: PostgreSQL&#8217;s column-specific trigger fires based on the <code>SET<\/code> list of the <code>UPDATE<\/code> command, not the actual row diff. If a <code>BEFORE UPDATE<\/code> trigger on another function silently modifies <code>NEW.content<\/code> without <code>content<\/code> appearing in the original <code>SET<\/code> clause, an <code>AFTER UPDATE OF content<\/code> trigger won&#8217;t fire \u2014 the content changed, but PostgreSQL doesn&#8217;t know. This is <a href=\"https:\/\/www.postgresql.org\/docs\/current\/sql-createtrigger.html\">documented behavior<\/a>. The <code>content_hash<\/code> comparison approach above doesn&#8217;t have this blind spot, because it compares actual values regardless of which columns were in the <code>SET<\/code> list.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-worker-python\">The worker (Python)<\/h3>\n\n\n\n<p>The worker process polls the queue and generates embeddings. This is intentionally simple \u2014 no frameworks, no dependencies beyond <code>psycopg<\/code> and <code>openai<\/code>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n#!\/usr\/bin\/env python3\n&quot;&quot;&quot;\nembedding_worker.py \u2014 polls the embedding_queue and processes pending items.\nRun as: python3 embedding_worker.py\n        python3 embedding_worker.py --once --batch-size 10  (single batch, for testing)\n        python3 embedding_worker.py --workers 4             (multi-process)\n&quot;&quot;&quot;\n\nimport os, time, hashlib, json\nimport psycopg\nfrom openai import OpenAI\n\nDB_URL = os.environ&#x5B;&quot;DATABASE_URL&quot;]\nclient = OpenAI()\n\nMODEL_NAME    = &quot;text-embedding-3-small&quot;\nMODEL_VERSION = &quot;v1&quot;\nCHUNK_SIZE    = 500   # tokens (approximate via chars \/ 4)\nCHUNK_OVERLAP = 50\nBATCH_SIZE    = 10    # queue items per cycle\nPOLL_INTERVAL = 5     # seconds\n\n\ndef chunk_text(text: str, size: int = CHUNK_SIZE, overlap: int = CHUNK_OVERLAP) -&gt; list&#x5B;str]:\n    &quot;&quot;&quot;Simple character-based chunking. Replace with your preferred strategy.&quot;&quot;&quot;\n    char_size = size * 4  # rough token-to-char ratio\n    char_overlap = overlap * 4\n    chunks = &#x5B;]\n    start = 0\n    while start &lt; len(text):\n        end = start + char_size\n        chunks.append(text&#x5B;start:end])\n        start = end - char_overlap\n    return chunks\n\n\ndef generate_embeddings(texts: list&#x5B;str]) -&gt; list&#x5B;list&#x5B;float]]:\n    &quot;&quot;&quot;Batch embedding call to OpenAI.&quot;&quot;&quot;\n    response = client.embeddings.create(\n        input=texts,\n        model=MODEL_NAME\n    )\n    return &#x5B;item.embedding for item in response.data]\n\n\ndef process_insert_or_update(conn, doc_id: str, content_hash: str):\n    &quot;&quot;&quot;Generate fresh embeddings for a document.&quot;&quot;&quot;\n    with conn.cursor() as cur:\n        # Fetch current document content\n        cur.execute(\n            &quot;SELECT content FROM documents WHERE doc_id = %s AND is_active = true&quot;,\n            (doc_id,)\n        )\n        row = cur.fetchone()\n        if not row:\n            return  # document was deleted or deactivated since queuing\n        \n        content = row&#x5B;0]\n        \n        # Verify content hasn&#039;t changed again since queuing\n        current_hash = hashlib.md5(content.encode()).hexdigest()\n        if current_hash != content_hash:\n            return  # content changed again, a newer queue entry will handle it\n        \n        # Check if embeddings already exist for this hash (idempotency)\n        # Scoped to model_name + model_version so parallel shadow-mode workers\n        # don&#039;t falsely consider each other&#039;s embeddings as &quot;already done&quot;\n        cur.execute(\n            &quot;&quot;&quot;SELECT 1 FROM document_embeddings \n               WHERE doc_id = %s AND source_hash = %s\n                 AND model_name = %s AND model_version = %s\n                 AND is_current = true\n               LIMIT 1&quot;&quot;&quot;,\n            (doc_id, content_hash, MODEL_NAME, MODEL_VERSION)\n        )\n        if cur.fetchone():\n            return  # already embedded with this content\n        \n        # Chunk and embed\n        chunks = chunk_text(content)\n        embeddings = generate_embeddings(chunks)\n        \n        # Mark old embeddings as not current \u2014 scoped to this model space only\n        # so shadow-mode v2 embeddings aren&#039;t flipped by v1 workers (or vice versa)\n        cur.execute(\n            &quot;&quot;&quot;UPDATE document_embeddings \n               SET is_current = false \n               WHERE doc_id = %s\n                 AND model_name = %s AND model_version = %s\n                 AND is_current = true&quot;&quot;&quot;,\n            (doc_id, MODEL_NAME, MODEL_VERSION)\n        )\n        \n        # Insert new embeddings\n        for idx, (chunk, emb) in enumerate(zip(chunks, embeddings)):\n            cur.execute(\n                &quot;&quot;&quot;INSERT INTO document_embeddings \n                   (doc_id, chunk_index, chunk_text, embedding, \n                    model_name, model_version, source_hash)\n                   VALUES (%s, %s, %s, %s, %s, %s, %s)&quot;&quot;&quot;,\n                (doc_id, idx, chunk, emb, MODEL_NAME, MODEL_VERSION, content_hash)\n            )\n        \n        conn.commit()\n\n\ndef process_delete(conn, doc_id: str):\n    &quot;&quot;&quot;Mark embeddings as not current when source is deleted.&quot;&quot;&quot;\n    with conn.cursor() as cur:\n        cur.execute(\n            &quot;&quot;&quot;UPDATE document_embeddings \n               SET is_current = false \n               WHERE doc_id = %s\n                 AND model_name = %s AND model_version = %s\n                 AND is_current = true&quot;&quot;&quot;,\n            (doc_id, MODEL_NAME, MODEL_VERSION)\n        )\n        conn.commit()\n\n\ndef poll_and_process():\n    &quot;&quot;&quot;Main loop: claim a batch, process, repeat.&quot;&quot;&quot;\n    with psycopg.connect(DB_URL) as conn:\n        while True:\n            with conn.cursor() as cur:\n                # Claim a batch (SELECT FOR UPDATE SKIP LOCKED)\n                cur.execute(&quot;&quot;&quot;\n                    UPDATE embedding_queue \n                    SET status = &#039;processing&#039;, claimed_at = now()\n                    WHERE queue_id IN (\n                        SELECT queue_id FROM embedding_queue\n                        WHERE status = &#039;pending&#039;\n                        ORDER BY queued_at\n                        LIMIT %s\n                        FOR UPDATE SKIP LOCKED\n                    )\n                    RETURNING queue_id, doc_id, change_type, content_hash\n                &quot;&quot;&quot;, (BATCH_SIZE,))\n                \n                batch = cur.fetchall()\n                conn.commit()\n            \n            if not batch:\n                time.sleep(POLL_INTERVAL)\n                continue\n            \n            for queue_id, doc_id, change_type, content_hash in batch:\n                try:\n                    if change_type in (&#039;content_update&#039;,):\n                        process_insert_or_update(conn, doc_id, content_hash)\n                    elif change_type == &#039;delete&#039;:\n                        process_delete(conn, doc_id)\n                    \n                    with conn.cursor() as cur:\n                        cur.execute(\n                            &quot;&quot;&quot;UPDATE embedding_queue \n                               SET status = &#039;completed&#039;, processed_at = now()\n                               WHERE queue_id = %s&quot;&quot;&quot;,\n                            (queue_id,)\n                        )\n                        conn.commit()\n                        \n                except Exception as e:\n                    conn.rollback()\n                    with conn.cursor() as cur:\n                        cur.execute(\n                            &quot;&quot;&quot;UPDATE embedding_queue \n                               SET status = CASE WHEN retry_count &gt;= 3 THEN &#039;failed&#039; ELSE &#039;pending&#039; END,\n                                   retry_count = retry_count + 1,\n                                   error_message = %s\n                               WHERE queue_id = %s&quot;&quot;&quot;,\n                            (str(e), queue_id)\n                        )\n                        conn.commit()\n                    print(f&quot;Error processing queue_id={queue_id}: {e}&quot;)\n\n\nif __name__ == &quot;__main__&quot;:\n    print(&quot;Embedding worker started. Polling...&quot;)\n    poll_and_process()\n\n<\/pre><\/div>\n\n\n<p><strong>What I like about this pattern:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It&#8217;s <strong>transactional<\/strong>: the trigger and the queue insert are in the same transaction. If the INSERT\/UPDATE fails, no queue entry is created.<\/li>\n\n\n\n<li>It&#8217;s <strong>idempotent<\/strong>: the worker checks <code>content_hash<\/code> before embedding, so duplicate queue entries are harmless.<\/li>\n\n\n\n<li>It uses <strong><code>SELECT FOR UPDATE SKIP LOCKED<\/code><\/strong> for safe concurrency (see below).<\/li>\n\n\n\n<li>It handles <strong>retries<\/strong> gracefully: failed items go back to pending with a counter.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-deep-dive-select-for-update-skip-locked\">Deep dive: SELECT FOR UPDATE SKIP LOCKED<\/h3>\n\n\n\n<p>This is the core of why this queue pattern works, and it&#8217;s a PostgreSQL feature that most people underuse. Let me explain it properly because it&#8217;s one of those things that looks simple in the SQL but has profound implications for how you scale workers.<\/p>\n\n\n\n<p>The problem: you want to run <strong>multiple embedding workers in parallel<\/strong> to process the queue faster. But if two workers pick the same queue item, you&#8217;ve wasted an API call (double embedding) or worse, you get race conditions on the <code>document_embeddings<\/code> table.<\/p>\n\n\n\n<p>The classic solutions are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>External locking<\/strong> (Redis, ZooKeeper): adds infrastructure, adds failure modes<\/li>\n\n\n\n<li><strong>Application-level partitioning<\/strong> (worker 1 handles doc_id % 3 = 0, worker 2 handles doc_id % 3 = 1&#8230;): rigid, doesn&#8217;t adapt to load<\/li>\n\n\n\n<li><strong>SELECT FOR UPDATE<\/strong>: locks the rows, but the second worker <strong>blocks and waits<\/strong> until the first one commits. This serializes your workers \u2014 you&#8217;re back to single-threaded throughput.<\/li>\n<\/ul>\n\n\n\n<p><code>SKIP LOCKED<\/code> changes everything. Here&#8217;s what happens step by step:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\nTimeline:\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n\nWorker A (t=0):\n    BEGIN;\n    UPDATE embedding_queue SET status = &#039;processing&#039;, claimed_at = now()\n    WHERE queue_id IN (\n        SELECT queue_id FROM embedding_queue\n        WHERE status = &#039;pending&#039;\n        ORDER BY queued_at\n        LIMIT 5\n        FOR UPDATE SKIP LOCKED    -- \u2190 locks rows 1,2,3,4,5\n    )\n    RETURNING queue_id, doc_id, change_type, content_hash;\n    \n    \u2192 Returns: queue_id 1, 2, 3, 4, 5\n    \u2192 These 5 rows are now locked by Worker A&#039;s transaction\n\nWorker B (t=1, while Worker A is still processing):\n    BEGIN;\n    UPDATE embedding_queue SET status = &#039;processing&#039;, claimed_at = now()\n    WHERE queue_id IN (\n        SELECT queue_id FROM embedding_queue\n        WHERE status = &#039;pending&#039;\n        ORDER BY queued_at\n        LIMIT 5\n        FOR UPDATE SKIP LOCKED    -- \u2190 sees rows 1-5 are locked, SKIPS them\n    )\n    RETURNING queue_id, doc_id, change_type, content_hash;\n    \n    \u2192 Returns: queue_id 6, 7, 8, 9, 10\n    \u2192 No blocking, no waiting, no conflict\n\nWorker C (t=2):\n    \u2192 Gets queue_id 11, 12, 13, 14, 15\n    \u2192 Same story: zero contention\n\n<\/pre><\/div>\n\n\n<p>The key behaviors:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><code>FOR UPDATE<\/code><\/strong>: tells PostgreSQL &#8220;I intend to modify these rows, lock them for me&#8221;<\/li>\n\n\n\n<li><strong><code>SKIP LOCKED<\/code><\/strong>: tells PostgreSQL &#8220;if a row is already locked by someone else, <strong>don&#8217;t wait<\/strong> \u2014 just pretend it doesn&#8217;t exist and move to the next one&#8221;<\/li>\n<\/ul>\n\n\n\n<p>This means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Workers <strong>never block each other<\/strong> \u2014 no waiting, no deadlocks<\/li>\n\n\n\n<li>Workers <strong>never process the same item<\/strong> \u2014 each item is claimed by exactly one worker<\/li>\n\n\n\n<li>You can <strong>scale horizontally<\/strong> just by starting more worker processes<\/li>\n\n\n\n<li>If a worker crashes mid-processing, its transaction is rolled back, the locks are released, and the rows become visible to other workers again (the <code>status<\/code> was already set to <code>'processing'<\/code> via the UPDATE, so you&#8217;d need a cleanup mechanism for crashed workers \u2014 more on that below)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-what-happens-without-skip-locked\">What happens without SKIP LOCKED?<\/h4>\n\n\n\n<p>Let&#8217;s compare. With plain <code>FOR UPDATE<\/code> (no SKIP LOCKED):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\nWorker A (t=0):  SELECT ... FOR UPDATE LIMIT 5;  \u2192 gets rows 1-5, locks them\nWorker B (t=1):  SELECT ... FOR UPDATE LIMIT 5;  \u2192 tries row 1... BLOCKED \u23f3\n                                                    waits for Worker A to COMMIT\nWorker A (t=10): COMMIT;                          \u2192 releases locks\nWorker B (t=10): \u2192 finally gets rows 1-5 (but they&#039;re already processed!)\n                 \u2192 returns empty set because status is no longer &#039;pending&#039;\n                 \u2192 wasted 10 seconds waiting for nothing\n\n<\/pre><\/div>\n\n\n<p>With <code>SKIP LOCKED<\/code>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\nWorker A (t=0):  SELECT ... FOR UPDATE SKIP LOCKED LIMIT 5;  \u2192 gets rows 1-5\nWorker B (t=1):  SELECT ... FOR UPDATE SKIP LOCKED LIMIT 5;  \u2192 gets rows 6-10 instantly\n                 \u2192 zero wait time, immediate useful work\n\n<\/pre><\/div>\n\n\n<p>This is exactly the behavior you want for a work queue.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-the-crash-recovery-problem\">The crash recovery problem<\/h4>\n\n\n\n<p>There&#8217;s one subtlety: if Worker A claims rows 1-5, sets their <code>status = 'processing'<\/code>, and then crashes (process killed, OOM, network failure), those rows are stuck in <code>'processing'<\/code> forever. The PostgreSQL locks are released (transaction was rolled back), but the status column still says <code>'processing'<\/code>.<\/p>\n\n\n\n<p>You need a reaper \u2014 a periodic cleanup that reclaims stale items:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Reclaim items stuck in &#039;processing&#039; for more than 5 minutes\n-- (embedding should never take that long)\n-- Uses claimed_at, not queued_at \u2014 an item queued 30 minutes ago\n-- but claimed 10 seconds ago should NOT be reclaimed\nUPDATE embedding_queue \nSET status = &#039;pending&#039;,\n    retry_count = retry_count + 1,\n    error_message = &#039;reclaimed: worker timeout after 5 minutes&#039;\nWHERE status = &#039;processing&#039; \nAND claimed_at &lt; now() - INTERVAL &#039;5 minutes&#039;;\n\n<\/pre><\/div>\n\n\n<p>Run this every minute via <code>pg_cron<\/code> or a simple cron job. It&#8217;s a safety net, not the primary flow.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-why-this-is-better-than-external-queue-systems\">Why this is better than external queue systems<\/h4>\n\n\n\n<p>For this specific use case (embedding queue), <code>SKIP LOCKED<\/code> gives you an <strong>in-database work queue<\/strong> with:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>ACID guarantees<\/strong>: the queue and the embeddings are in the same database, same transaction scope<\/li>\n\n\n\n<li><strong>No external dependencies<\/strong>: no Redis, no RabbitMQ, no SQS<\/li>\n\n\n\n<li><strong>Exactly-once semantics<\/strong>: combined with the <code>content_hash<\/code> idempotency check<\/li>\n\n\n\n<li><strong>Observability<\/strong>: it&#8217;s just a table \u2014 <code>SELECT count(*) FROM embedding_queue WHERE status = 'pending'<\/code> is your queue depth, queryable from any SQL client or monitoring tool<\/li>\n<\/ul>\n\n\n\n<p>The limitation is throughput: if you&#8217;re processing millions of queue items per second, you want Kafka or SQS. For an embedding queue where each item takes 100-500ms to process (API call), PostgreSQL can easily handle thousands of items per minute. That&#8217;s more than enough for any knowledge base I&#8217;ve seen in production.<\/p>\n\n\n\n<p><strong>What I don&#8217;t like:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It&#8217;s <strong>polling-based<\/strong>: the worker checks every 5 seconds. For most use cases this is fine, but if you need sub-second latency, you want <code>LISTEN\/NOTIFY<\/code>.<\/li>\n\n\n\n<li>It requires a <strong>separate process<\/strong> to run. In production, that means a systemd service or a Kubernetes deployment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-upgrading-to-listen-notify\">Upgrading to LISTEN\/NOTIFY<\/h3>\n\n\n\n<p>If you want to eliminate polling and react instantly, PostgreSQL&#8217;s <code>LISTEN\/NOTIFY<\/code> mechanism is your friend. Add this to the trigger:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Add to fn_queue_embedding_update(), after each INSERT INTO embedding_queue:\n-- Use NEW.doc_id for INSERT\/UPDATE, OLD.doc_id for DELETE\nPERFORM pg_notify(&#039;embedding_work&#039;, json_build_object(\n    &#039;doc_id&#039;, COALESCE(NEW.doc_id, OLD.doc_id),\n    &#039;operation&#039;, TG_OP\n)::text);\n\n<\/pre><\/div>\n\n\n<p>And in the worker, replace the <code>time.sleep()<\/code> loop with:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport select\n\nconn = psycopg.connect(DB_URL, autocommit=True)\nconn.execute(&quot;LISTEN embedding_work&quot;)\n\nwhile True:\n    if select.select(&#x5B;conn], &#x5B;], &#x5B;], 5.0) == (&#x5B;], &#x5B;], &#x5B;]):\n        # Timeout \u2014 check for any missed items anyway\n        process_pending_batch(conn)\n    else:\n        conn.execute(&quot;SELECT 1&quot;)  # consume notifications\n        for notify in conn.notifies():\n            process_pending_batch(conn)\n\n<\/pre><\/div>\n\n\n<p>This gives you near-real-time embedding refresh with zero polling overhead.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-level-2-logical-replication-cross-database-embedding-sync\">Level 2: Logical Replication \u2014 Cross-Database Embedding Sync<\/h2>\n\n\n\n<p>Now let&#8217;s go a level up. What if your source data lives in a different PostgreSQL instance than your vector store? Or what if the team that manages the knowledge base doesn&#8217;t want triggers on their production tables?<\/p>\n\n\n\n<p>This is where <strong>PostgreSQL logical replication<\/strong> becomes the CDC mechanism. It&#8217;s built into PostgreSQL, it reads the WAL, and it has near-zero impact on the source.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-setup\">The setup<\/h3>\n\n\n\n<p>On the <strong>source<\/strong> (knowledge base database):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Ensure WAL is configured for logical replication\nALTER SYSTEM SET wal_level = &#039;logical&#039;;\nALTER SYSTEM SET max_replication_slots = 10;\nALTER SYSTEM SET max_wal_senders = 10;\n-- Restart required\n\n-- Create a publication for the documents table\nCREATE PUBLICATION pub_documents FOR TABLE documents;\n\n<\/pre><\/div>\n\n\n<p>On the <strong>target<\/strong> (vector database, different instance):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Create the same documents table structure (or a subset)\nCREATE TABLE documents_replica (\n    doc_id          BIGINT PRIMARY KEY,\n    title           TEXT NOT NULL,\n    content         TEXT NOT NULL,\n    content_hash    TEXT,\n    updated_at      TIMESTAMPTZ,\n    is_active       BOOLEAN\n);\n\n-- Create the subscription\nCREATE SUBSCRIPTION sub_documents\n    CONNECTION &#039;host=source-db port=5432 dbname=knowledge_base user=replicator password=...&#039;\n    PUBLICATION pub_documents\n    WITH (copy_data = true);  -- initial snapshot\n\n<\/pre><\/div>\n\n\n<p>Now the <code>documents_replica<\/code> table on your vector database is automatically kept in sync via WAL streaming. Every INSERT, UPDATE, DELETE on the source is replicated in near-real-time.<\/p>\n\n\n\n<p>From here, you add the same <strong>trigger + queue + worker<\/strong> pattern from Level 1, but on the <code>documents_replica<\/code> table. The source database team doesn&#8217;t need to know or care about your embedding pipeline.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-architecture\">Architecture<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"383\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10-1024x383.png\" alt=\"\" class=\"wp-image-43063\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10-1024x383.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10-300x112.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10-768x287.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10-1536x575.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-10.png 1596w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Why this is powerful:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Zero impact on source<\/strong>: no triggers, no extra connections, just WAL reading<\/li>\n\n\n\n<li><strong>Separation of concerns<\/strong>: the DBA managing the knowledge base doesn&#8217;t need to understand embeddings<\/li>\n\n\n\n<li><strong>Built-in catch-up<\/strong>: if the embedding worker goes down, logical replication buffers changes in the WAL. When it comes back, all changes are processed in order<\/li>\n\n\n\n<li><strong>No external dependencies<\/strong>: this is pure PostgreSQL, no Kafka, no Flink, no cloud services<\/li>\n<\/ul>\n\n\n\n<p><strong>Limitations:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Logical replication is <strong>PG \u2192 PG only<\/strong> (unlike Flink CDC which can source from Oracle, MySQL, etc.)<\/li>\n\n\n\n<li><strong>DDL is not replicated<\/strong>: if the source adds a column, you need to handle it manually<\/li>\n\n\n\n<li>The replication slot retains WAL until consumed \u2014 <strong>\u26a0\ufe0f Production pitfall<\/strong>: if the subscriber is down for too long, WAL can fill up the source disk. Set <code>max_slot_wal_keep_size<\/code> (PG 13+) to cap retention, and monitor <code>pg_replication_slots<\/code> for inactive slots. DBAs: this is the #1 risk with logical replication.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-level-3-flink-cdc-when-the-source-isn-t-postgresql-and-when-to-skip-re-embedding\">Level 3: Flink CDC \u2014 When the Source Isn&#8217;t PostgreSQL (and When to Skip Re-embedding)<\/h2>\n\n\n\n<p>If your knowledge base lives in Oracle, MySQL, or you need to fan out to multiple targets (pgvector + Elasticsearch + data lake), then we&#8217;re back in the territory of my CDC posts.<\/p>\n\n\n\n<p>But here&#8217;s where it gets really interesting. Flink CDC gives us something that the trigger and logical replication approaches don&#8217;t: <strong>access to both the before and after images of every row change<\/strong>. Debezium, which Flink CDC uses under the hood, captures the full row state before and after the UPDATE. This means we can <strong>evaluate whether a change is significant enough to warrant re-embedding<\/strong> \u2014 directly inside the pipeline, before hitting any embedding API.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-why-this-matters\">Why this matters<\/h3>\n\n\n\n<p>Not every UPDATE to a document requires a new embedding. Think about it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Someone fixes a typo: &#8220;PostgreSLQ&#8221; \u2192 &#8220;PostgreSQL&#8221; \u2014 <strong>probably not worth re-embedding<\/strong><\/li>\n\n\n\n<li>Someone updates a metadata field (status, last_reviewed_by) \u2014 <strong>definitely not worth re-embedding<\/strong> (metadata filtering should be done in the WHERE claude)<\/li>\n\n\n\n<li>Someone rewrites two paragraphs and adds a new section \u2014 <strong>yes, re-embed<\/strong><\/li>\n\n\n\n<li>Someone changes a single KPI number in a financial report \u2014 <strong>depends on context, but the semantic meaning shifted<\/strong><\/li>\n<\/ul>\n\n\n\n<p>In a busy knowledge base, most row-level changes are minor. If your pipeline blindly re-embeds on every UPDATE, you&#8217;re burning API credits, creating unnecessary load on the embedding worker, and churning your DiskANN index for no semantic gain. The question is: <strong>can we be smarter about this?<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-architecture-with-change-significance-filtering\">The architecture with change significance filtering<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"934\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-11-1024x934.png\" alt=\"\" class=\"wp-image-43064\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-11-1024x934.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-11-300x274.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-11-768x701.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-11.png 1189w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The key insight: <strong>separate the data replication (all changes) from the embedding trigger (only significant changes)<\/strong>. The data mart gets everything \u2014 it&#8217;s a faithful replica. But the embedding queue only receives changes where the content actually shifted enough to matter semantically.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-change-significance-the-approaches\">Change significance: the approaches<\/h3>\n\n\n\n<p>There are several ways to evaluate whether a change is &#8220;significant enough&#8221; for re-embedding. I want to walk through each one because they have very different trade-offs.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-approach-1-column-aware-filtering-simplest-start-here\">Approach 1: Column-aware filtering (simplest, start here)<\/h4>\n\n\n\n<p>The cheapest filter: only trigger re-embedding when specific content columns change. If someone updates <code>status<\/code>, <code>last_reviewed_by<\/code>, <code>category<\/code>, or any metadata field, skip the embedding entirely.<\/p>\n\n\n\n<p>In Flink SQL, Debezium CDC exposes <code>op<\/code> (operation type) and you can access both the old and new values. Here&#8217;s how to implement it:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- CDC source table with before\/after access\nCREATE TABLE src_documents (\n    doc_id          BIGINT,\n    title           STRING,\n    content         STRING,\n    category        STRING,\n    status          STRING,\n    updated_at      TIMESTAMP(3),\n    PRIMARY KEY (doc_id) NOT ENFORCED\n) WITH (\n    &#039;connector&#039; = &#039;postgres-cdc&#039;,\n    &#039;hostname&#039; = &#039;172.19.0.4&#039;,\n    &#039;port&#039; = &#039;5432&#039;,\n    &#039;username&#039; = &#039;postgres&#039;,\n    &#039;password&#039; = &#039;...&#039;,\n    &#039;database-name&#039; = &#039;knowledge_base&#039;,\n    &#039;schema-name&#039; = &#039;public&#039;,\n    &#039;table-name&#039; = &#039;documents&#039;,\n    &#039;slot.name&#039; = &#039;flink_documents_slot&#039;,\n    &#039;decoding.plugin.name&#039; = &#039;pgoutput&#039;,\n    &#039;scan.incremental.snapshot.enabled&#039; = &#039;true&#039;\n);\n\n-- JDBC sink for ALL changes (data mart replication)\nCREATE TABLE dm_documents (\n    doc_id          BIGINT,\n    title           STRING,\n    content         STRING,\n    category        STRING,\n    status          STRING,\n    updated_at      TIMESTAMP(3),\n    PRIMARY KEY (doc_id) NOT ENFORCED\n) WITH (\n    &#039;connector&#039; = &#039;jdbc&#039;,\n    &#039;url&#039; = &#039;jdbc:postgresql:\/\/172.20.0.4:5432\/vector_db&#039;,\n    &#039;table-name&#039; = &#039;documents_replica&#039;,\n    &#039;username&#039; = &#039;postgres&#039;,\n    &#039;password&#039; = &#039;...&#039;,\n    &#039;driver&#039; = &#039;org.postgresql.Driver&#039;\n);\n\n-- Replicate everything to the data mart\nINSERT INTO dm_documents SELECT * FROM src_documents;\n\n<\/pre><\/div>\n\n\n<p>For the embedding queue, we need to be selective. This is where a Flink SQL view or a <code>ProcessFunction<\/code> comes in. Since Flink SQL CDC doesn&#8217;t natively expose the before-image in the SELECT, the simplest approach is to use the <strong>content_hash<\/strong> strategy from Level 1: the trigger on <code>documents_replica<\/code> compares <code>content_hash<\/code> and only queues when it actually changed.<\/p>\n\n\n\n<p>But if you want the filtering to happen <strong>inside Flink<\/strong> (before hitting the target at all), you need a UDF.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-approach-2-text-diff-ratio-udf-the-sweet-spot\">Approach 2: Text diff ratio (UDF \u2014 the sweet spot)<\/h4>\n\n\n\n<p>This is where it gets interesting. We register a custom Flink UDF that computes the <strong>similarity ratio<\/strong> between the old and new content, and only emits the row to the embedding queue when the change exceeds a threshold.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\/**\n * Flink UDF: computes text change ratio between two strings.\n * Returns a value between 0.0 (completely different) and 1.0 (identical).\n * \n * Uses a simplified approach: character-level diff ratio.\n * For production, consider token-level or sentence-level comparison.\n *\/\n@FunctionHint(output = @DataTypeHint(&quot;DOUBLE&quot;))\npublic class TextChangeRatio extends ScalarFunction {\n    \n    public Double eval(String before, String after) {\n        if (before == null || after == null) return 0.0;\n        if (before.equals(after)) return 1.0;\n        \n        \/\/ Longest Common Subsequence ratio\n        int lcs = lcsLength(before, after);\n        int maxLen = Math.max(before.length(), after.length());\n        \n        return maxLen == 0 ? 1.0 : (double) lcs \/ maxLen;\n    }\n    \n    private int lcsLength(String a, String b) {\n        \/\/ Optimized for streaming: use rolling array, not full matrix\n        int&#x5B;] prev = new int&#x5B;b.length() + 1];\n        int&#x5B;] curr = new int&#x5B;b.length() + 1];\n        for (int i = 1; i &lt;= a.length(); i++) {\n            for (int j = 1; j &lt;= b.length(); j++) {\n                if (a.charAt(i-1) == b.charAt(j-1)) {\n                    curr&#x5B;j] = prev&#x5B;j-1] + 1;\n                } else {\n                    curr&#x5B;j] = Math.max(prev&#x5B;j], curr&#x5B;j-1]);\n                }\n            }\n            int&#x5B;] tmp = prev; prev = curr; curr = tmp;\n            java.util.Arrays.fill(curr, 0);\n        }\n        return prev&#x5B;b.length()];\n    }\n}\n\n<\/pre><\/div>\n\n\n<p>Register and use it in Flink SQL:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n-- Register the UDF\nCREATE FUNCTION text_change_ratio AS &#039;com.example.TextChangeRatio&#039;;\n\n<\/pre><\/div>\n\n\n<p>Now, the challenge here is that <strong>Flink SQL CDC doesn&#8217;t directly expose the &#8220;before&#8221; image as a column you can SELECT<\/strong>. The changelog stream has INSERT (+I), UPDATE_BEFORE (-U), UPDATE_AFTER (+U), and DELETE (-D) operations, but in a standard <code>SELECT * FROM cdc_table<\/code>, you only see the latest state.<\/p>\n\n\n\n<p>To access both before and after, you have two options:<\/p>\n\n\n\n<p><strong>Option A: Stateful ProcessFunction (Java\/Python)<\/strong><\/p>\n\n\n\n<p>This is the cleanest approach. You write a <code>KeyedProcessFunction<\/code> that maintains the previous state of each document in Flink&#8217;s managed state, and compares it with the incoming change:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Pseudocode for the ProcessFunction approach\nclass ChangeSignificanceFilter(KeyedProcessFunction):\n    \n    def __init__(self, threshold=0.95):\n        self.threshold = threshold  # 0.95 = skip if 95%+ similar\n    \n    def open(self, runtime_context):\n        # Flink managed state: stores last known content per doc_id\n        self.last_content = runtime_context.get_state(\n            ValueStateDescriptor(&quot;last_content&quot;, Types.STRING())\n        )\n    \n    def process_element(self, row, ctx):\n        doc_id = row&#x5B;&#039;doc_id&#039;]\n        new_content = row&#x5B;&#039;content&#039;]\n        old_content = self.last_content.value()\n        \n        # Always update state\n        self.last_content.update(new_content)\n        \n        if old_content is None:\n            # INSERT: always emit (new document)\n            yield Row(doc_id=doc_id, needs_embedding=True, \n                      change_ratio=0.0, operation=&#039;INSERT&#039;)\n            return\n        \n        if old_content == new_content:\n            # Content identical: metadata-only change, skip\n            return\n        \n        # Compute change ratio\n        ratio = text_similarity(old_content, new_content)\n        \n        if ratio &lt; self.threshold:\n            # Significant change: emit for re-embedding\n            yield Row(doc_id=doc_id, needs_embedding=True,\n                      change_ratio=round(1.0 - ratio, 4), \n                      operation=&#039;UPDATE&#039;)\n        else:\n            # Minor change (typo fix, formatting): skip embedding\n            # Optionally log for monitoring\n            yield Row(doc_id=doc_id, needs_embedding=False,\n                      change_ratio=round(1.0 - ratio, 4),\n                      operation=&#039;SKIP&#039;)\n\n\ndef text_similarity(a: str, b: str) -&gt; float:\n    &quot;&quot;&quot;Fast similarity using difflib SequenceMatcher.&quot;&quot;&quot;\n    from difflib import SequenceMatcher\n    return SequenceMatcher(None, a, b).ratio()\n\n<\/pre><\/div>\n\n\n<p><strong>Option B: Self-join with temporal table (Flink SQL)<\/strong><\/p>\n\n\n\n<p>If you want to stay in pure SQL, you can maintain a &#8220;previous version&#8221; table and join against it:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Maintain a snapshot of previous content in a JDBC-backed table\nCREATE TABLE content_snapshots (\n    doc_id      BIGINT,\n    content     STRING,\n    content_md5 STRING,\n    PRIMARY KEY (doc_id) NOT ENFORCED\n) WITH (\n    &#039;connector&#039; = &#039;jdbc&#039;,\n    &#039;url&#039; = &#039;jdbc:postgresql:\/\/172.20.0.4:5432\/vector_db&#039;,\n    &#039;table-name&#039; = &#039;content_snapshots&#039;,\n    &#039;username&#039; = &#039;postgres&#039;,\n    &#039;password&#039; = &#039;...&#039;,\n    &#039;driver&#039; = &#039;org.postgresql.Driver&#039;\n);\n\n-- Write ALL changes to the snapshot table (upsert)\nINSERT INTO content_snapshots\nSELECT doc_id, content, MD5(content) FROM src_documents;\n\n<\/pre><\/div>\n\n\n<p>Then in the embedding trigger on the target side, compare the incoming <code>content_md5<\/code> against the previously stored one. If they differ, queue for embedding. This is essentially what the Level 1 trigger does, but now the CDC pipeline is handling the cross-database transport.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-approach-3-structural-change-analysis-most-sophisticated\">Approach 3: Structural change analysis (most sophisticated)<\/h4>\n\n\n\n<p>For knowledge bases with structured content (Markdown, HTML, technical documentation), you can go deeper than raw text diff. Analyze <strong>what kind of change<\/strong> happened:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndef analyze_change_significance(old_content: str, new_content: str) -&gt; dict:\n    &quot;&quot;&quot;\n    Analyze the structural significance of a content change.\n    Returns a dict with metrics to decide whether re-embedding is needed.\n    &quot;&quot;&quot;\n    import re\n    from difflib import SequenceMatcher\n    \n    result = {\n        &#039;char_ratio&#039;: SequenceMatcher(None, old_content, new_content).ratio(),\n        &#039;paragraphs_added&#039;: 0,\n        &#039;paragraphs_removed&#039;: 0,\n        &#039;paragraphs_modified&#039;: 0,\n        &#039;heading_changed&#039;: False,\n        &#039;needs_embedding&#039;: False\n    }\n    \n    # Split into paragraphs\n    old_paras = &#x5B;p.strip() for p in old_content.split(&#039;\\n\\n&#039;) if p.strip()]\n    new_paras = &#x5B;p.strip() for p in new_content.split(&#039;\\n\\n&#039;) if p.strip()]\n    \n    old_set = set(old_paras)\n    new_set = set(new_paras)\n    \n    result&#x5B;&#039;paragraphs_added&#039;] = len(new_set - old_set)\n    result&#x5B;&#039;paragraphs_removed&#039;] = len(old_set - new_set)\n    \n    # Check if headings changed (strong signal for semantic shift)\n    old_headings = set(re.findall(r&#039;^#{1,3}\\s+(.+)$&#039;, old_content, re.MULTILINE))\n    new_headings = set(re.findall(r&#039;^#{1,3}\\s+(.+)$&#039;, new_content, re.MULTILINE))\n    result&#x5B;&#039;heading_changed&#039;] = old_headings != new_headings\n    \n    # Decision logic\n    if result&#x5B;&#039;heading_changed&#039;]:\n        result&#x5B;&#039;needs_embedding&#039;] = True\n        result&#x5B;&#039;reason&#039;] = &#039;heading_changed&#039;\n    elif result&#x5B;&#039;paragraphs_added&#039;] &gt; 0 or result&#x5B;&#039;paragraphs_removed&#039;] &gt; 0:\n        result&#x5B;&#039;needs_embedding&#039;] = True\n        result&#x5B;&#039;reason&#039;] = &#039;structural_change&#039;\n    elif result&#x5B;&#039;char_ratio&#039;] &lt; 0.90:\n        result&#x5B;&#039;needs_embedding&#039;] = True\n        result&#x5B;&#039;reason&#039;] = &#039;significant_text_change&#039;\n    else:\n        result&#x5B;&#039;needs_embedding&#039;] = False\n        result&#x5B;&#039;reason&#039;] = &#039;minor_change&#039;\n    \n    return result\n\n<\/pre><\/div>\n\n\n<p>The idea here is that structural changes (new headings, added\/removed sections) almost always shift the semantic meaning enough to warrant re-embedding, while inline text modifications need to cross a threshold.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-choosing-the-right-threshold\">Choosing the right threshold<\/h3>\n\n\n\n<p>This is the part where I have to be honest: <strong>I don&#8217;t have a definitive answer on the optimal threshold<\/strong>. It depends on your data, your embedding model, and your quality requirements.<\/p>\n\n\n\n<p>What I can tell you from experimentation:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Change Type<\/th><th>Text Diff Ratio<\/th><th>Should Re-embed?<\/th><th>Why<\/th><\/tr><\/thead><tbody><tr><td>Typo fix (&#8220;PostgreSLQ&#8221; \u2192 &#8220;PostgreSQL&#8221;)<\/td><td>0.99+<\/td><td>No<\/td><td>Semantic meaning unchanged<\/td><\/tr><tr><td>Reformatting (whitespace, punctuation)<\/td><td>0.95+<\/td><td>No<\/td><td>Embedding models are robust to formatting<\/td><\/tr><tr><td>Single sentence rewritten<\/td><td>0.85-0.95<\/td><td>Maybe<\/td><td>Depends on the sentence&#8217;s importance<\/td><\/tr><tr><td>Paragraph added\/removed<\/td><td>0.70-0.85<\/td><td>Yes<\/td><td>New information or removed context<\/td><\/tr><tr><td>Major rewrite (&gt;30% changed)<\/td><td>&lt;0.70<\/td><td>Absolutely<\/td><td>Different document semantically<\/td><\/tr><tr><td>Metadata-only (status, category)<\/td><td>1.0 (content)<\/td><td>No<\/td><td>Content columns unchanged<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>My <strong>starting recommendation<\/strong>: set the threshold at <strong>0.95<\/strong> (i.e., re-embed when more than 5% of the text changed). Then monitor your RAG quality metrics (nDCG, retrieval precision from <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\" target=\"_blank\" rel=\"noreferrer noopener\">rag series  &#8211; adaptive RAG<\/a>) and adjust. If you&#8217;re missing relevant results, lower the threshold. If you&#8217;re burning too many API credits on trivial changes, raise it.<\/p>\n\n\n\n<p>I validated these numbers on the Wikipedia dataset in <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-lab\/\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2 of this post<\/a>. The results cleanly confirmed the 0.95 threshold: typo fixes scored 0.998+ (SKIP), paragraph additions scored ~0.93 (EMBED), and section rewrites scored 0.51\u20130.63 (definitely EMBED).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-monitoring-table\">The monitoring table<\/h3>\n\n\n\n<p>Whatever approach you choose, log the decisions. This is invaluable for tuning:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE TABLE embedding_change_log (\n    log_id          BIGSERIAL PRIMARY KEY,\n    doc_id          BIGINT NOT NULL,\n    similarity      NUMERIC(5,4),       -- 0.0000 to 1.0000\n    decision        TEXT NOT NULL,       -- &#039;EMBED&#039; or &#039;SKIP&#039;\n    reason          TEXT,                -- &#039;structural_change&#039;, &#039;minor_change&#039;, etc.\n    old_content_md5 TEXT,\n    new_content_md5 TEXT,\n    details         JSONB,              -- optional: paragraph_similarity, char_diff, etc.\n    decided_at      TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n\n-- How many re-embeddings are we avoiding?\nSELECT decision, count(*), \n       round(100.0 * count(*) \/ sum(count(*)) OVER (), 1) AS pct\nFROM embedding_change_log\nWHERE decided_at &gt; now() - INTERVAL &#039;7 days&#039;\nGROUP BY decision;\n\n-- Result example:\n--  decision | count | pct\n-- ----------+-------+------\n--  SKIP     |  1847 | 73.2\n--  EMBED    |   675 | 26.8\n\n<\/pre><\/div>\n\n\n<p>In this example, 73% of the content changes were minor enough to skip. That&#8217;s 73% fewer embedding API calls, 73% less index churn, and a quieter, more stable RAG system.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-a-note-on-baseline-the-first-run\">A note on baseline: the first run<\/h3>\n\n\n\n<p>One thing that&#8217;s not obvious until you deploy this: <strong>the change detector needs existing embeddings to compare against<\/strong>. On the very first run, or for any document that has never been embedded, the similarity will be 0.0 (no previous embedding to compare), and the decision will always be EMBED. The SKIP optimization only kicks in on subsequent changes after a baseline exists.<\/p>\n\n\n\n<p>This is correct behavior, but it means your initial backfill will process everything regardless of the threshold setting. Plan for it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-full-architecture-recap\">Full architecture recap<\/h3>\n\n\n\n<p>I won&#8217;t repeat the full Flink setup here, refer to my <a href=\"https:\/\/www.dbi-services.com\/blog\/postgresql-cdc-to-jdbc-sink-minimal-event-driven-architecture\/\">CDC to JDBC Sink<\/a> and <a href=\"https:\/\/www.dbi-services.com\/blog\/oracle-to-postgresql-migration-with-flink-cdc\/\">Oracle to PostgreSQL Migration with Flink CDC<\/a> posts for the step-by-step LAB. The addition here is the significance filter sitting between the CDC source and the embedding sink.<\/p>\n\n\n\n<p>One option I want to flag but that I haven&#8217;t fully tested at scale: <strong>embedding directly in the Flink pipeline<\/strong>. You could write a custom <code>ProcessFunction<\/code> that calls the embedding API and writes both the source data and the embeddings to the target in one atomic checkpoint. This eliminates the queue entirely. The concern is rate limiting and latency of embedding API calls within a streaming pipeline, if the API is slow, it creates backpressure all the way to the CDC source. For now, I&#8217;d recommend the <strong>JDBC sink + trigger + worker<\/strong> approach as the safer pattern, and explore inline embedding only if you have a local embedding model (like Ollama) with predictable latency.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-model-versioning-the-upgrade-problem\">Model Versioning: The Upgrade Problem<\/h2>\n\n\n\n<p>Everything above handles <strong>content changes<\/strong>. But there&#8217;s another dimension: <strong>model changes<\/strong>.<\/p>\n\n\n\n<p>When you upgrade from <code>text-embedding-3-small<\/code> to <code>text-embedding-3-large<\/code>, or from <code>v1<\/code> to <code>v2<\/code> of any model, <strong>all your existing embeddings become incompatible<\/strong>. This is not optional. Different models produce different vector spaces. You cannot mix embeddings from different models in the same index \u2014 the similarity scores become meaningless.<\/p>\n\n\n\n<p>This is why the <code>model_version<\/code> column in our schema matters. Here&#8217;s the upgrade procedure:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-1-deploy-new-embeddings-alongside-old-ones\">Step 1: Deploy new embeddings alongside old ones<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n-- Create a new worker (or update the config) with the new model\n-- MODEL_VERSION = &quot;v2&quot;\n-- MODEL_NAME = &quot;text-embedding-3-large&quot;\n\n-- The worker will populate document_embeddings with model_version = &#039;v2&#039;\n-- while model_version = &#039;v1&#039; embeddings remain untouched and is_current = true\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-2-build-a-separate-index-for-the-new-model\">Step 2: Build a separate index for the new model<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- New partial index for v2 embeddings (3072 dimensions for text-embedding-3-large)\nCREATE INDEX idx_embeddings_diskann_v2 ON document_embeddings \n    USING diskann (embedding vector_cosine_ops)\n    WHERE is_current = true AND model_version = &#039;v2&#039;;\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-3-run-both-in-parallel-shadow-mode\">Step 3: Run both in parallel (shadow mode)<\/h3>\n\n\n\n<p>During shadow mode, <strong>both v1 and v2 have <code>is_current = true<\/code><\/strong>,  that&#8217;s intentional. Your search queries must always scope by <code>model_version<\/code>, not just <code>is_current<\/code>. Each partial index covers one version, so PostgreSQL uses the correct index when the query includes <code>AND model_version = 'v2'<\/code>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# In your RAG query pipeline, query both and compare\nresults_v1 = search(query, model_version=&#039;v1&#039;)\nresults_v2 = search(query, model_version=&#039;v2&#039;)\n\n# Log both, serve v1 to users, compare nDCG scores\nlog_comparison(query, results_v1, results_v2)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-4-cut-over\">Step 4: Cut over<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- Once confident, mark v1 as not current\nUPDATE document_embeddings \nSET is_current = false \nWHERE model_version = &#039;v1&#039;;\n\n-- Drop old index\nDROP INDEX idx_embeddings_diskann;\n\n-- Optionally archive old embeddings\n-- DELETE FROM document_embeddings WHERE model_version = &#039;v1&#039;;\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-5-update-the-worker-config\">Step 5: Update the worker config<\/h3>\n\n\n\n<p>Switch the worker to produce <code>v2<\/code> embeddings for all new changes going forward.<\/p>\n\n\n\n<p><strong>The point is<\/strong>: with the versioned schema and partial indexes, model upgrades become a <strong>blue-green deployment for embeddings<\/strong>. No downtime, no inconsistent state, full rollback capability. This is exactly the same principle as <a href=\"https:\/\/www.dbi-services.com\/blog\/postgresql-17-to-18-migration-blue-green\/\">the PostgreSQL 17\u219218 blue-green upgrade<\/a> I wrote about, applied to vector data.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-note-on-pgai-vectorizer\">A Note on pgai Vectorizer<\/h2>\n\n\n\n<p>I want to mention <strong>pgai Vectorizer<\/strong> by Timescale because it solves a lot of what I&#8217;ve described above out of the box. It uses PostgreSQL triggers internally, handles automatic synchronization, supports chunking configuration, and manages the embedding lifecycle with a declarative SQL command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT ai.create_vectorizer(\n    &#039;documents&#039;::regclass,\n    loading     =&gt; ai.loading_column(&#039;content&#039;),\n    destination =&gt; ai.destination_table(&#039;document_embeddings&#039;),\n    embedding   =&gt; ai.embedding_openai(&#039;text-embedding-3-small&#039;, 768),\n    chunking    =&gt; ai.chunking_recursive_character_text_splitter(500, 50)\n);\n\n<\/pre><\/div>\n\n\n<p>After this, any INSERT\/UPDATE\/DELETE on <code>documents<\/code> automatically triggers re-embedding. The vectorizer worker handles batching, rate limit retries, and error recovery. It&#8217;s essentially the Level 1 pattern I described, but packaged as a production-ready tool, and since April 2025, it works with any PostgreSQL database (not just Timescale Cloud) via a Python library.<\/p>\n\n\n\n<p><strong>Why I still showed you the manual approach first<\/strong>: because in consulting, I rarely see a greenfield setup. Most projects have constraints, specific PostgreSQL versions, restricted extensions, air-gapped environments, or the need to integrate with an existing CDC pipeline. Understanding the underlying pattern lets you adapt it to your context. pgai Vectorizer is excellent if it fits your deployment, but the principles remain the same regardless of the tooling.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-monitoring-embedding-freshness\">Monitoring Embedding Freshness<\/h2>\n\n\n\n<p>One more thing that nobody talks about: <strong>how do you know your embeddings are stale?<\/strong><\/p>\n\n\n\n<p>There are two categories of signals: <strong>infrastructure signals<\/strong> (is the pipeline healthy?) and <strong>quality signals<\/strong> (is retrieval degrading?). Most teams only monitor the first. The second is what actually matters to your users.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-infrastructure-signals-pipeline-health\">Infrastructure signals: pipeline health<\/h3>\n\n\n\n<p>Here are the queries I use in production to monitor the embedding pipeline itself:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- 1. Documents with no current embeddings\nSELECT d.doc_id, d.title, d.updated_at\nFROM documents d\nLEFT JOIN document_embeddings e \n    ON d.doc_id = e.doc_id AND e.is_current = true\nWHERE e.embedding_id IS NULL AND d.is_active = true\nORDER BY d.updated_at DESC;\n\n-- 2. Documents where content changed since last embedding\n-- Uses LATERAL join to pick one representative row per document deterministically,\n-- avoiding edge cases where chunks have mixed source_hash values (partial retries, etc.)\nSELECT d.doc_id, d.title,\n       d.updated_at AS doc_updated,\n       e.embedded_at AS last_embedded,\n       d.updated_at - e.embedded_at AS staleness\nFROM documents d\nJOIN LATERAL (\n    SELECT embedded_at, source_hash\n    FROM document_embeddings\n    WHERE doc_id = d.doc_id\n      AND is_current = true\n    ORDER BY embedded_at DESC\n    LIMIT 1\n) e ON true\nWHERE d.is_active = true\n  AND d.content_hash IS DISTINCT FROM e.source_hash\nORDER BY staleness DESC;\n\n-- 3. Queue health check\nSELECT status, count(*), \n       avg(EXTRACT(EPOCH FROM (processed_at - claimed_at)))::int AS avg_processing_sec,\n       avg(EXTRACT(EPOCH FROM (claimed_at - queued_at)))::int AS avg_wait_sec\nFROM embedding_queue\nWHERE queued_at &gt; now() - INTERVAL &#039;24 hours&#039;\nGROUP BY status;\n\n-- 4. Embedding coverage by model version\nSELECT model_version, \n       count(DISTINCT doc_id) AS documents,\n       count(*) AS total_chunks,\n       count(*) FILTER (WHERE is_current) AS current_chunks\nFROM document_embeddings\nGROUP BY model_version;\n\n<\/pre><\/div>\n\n\n<p>Put these in a Grafana dashboard or your monitoring of choice. The staleness query (#2) is your early warning system \u2014 if documents are drifting from their embeddings, something is wrong in your pipeline.<\/p>\n\n\n\n<p>But here&#8217;s the thing: <strong>a healthy pipeline doesn&#8217;t guarantee good retrieval<\/strong>. Your queue could be empty, your workers could be processing in sub-second latency, and your embeddings could still be degraded. Why? Because the pipeline only tells you that <em>something was embedded<\/em> \u2014 not that <em>the embeddings are good<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-quality-signals-when-your-rag-tells-you-embeddings-are-stale\">Quality signals: when your RAG tells you embeddings are stale<\/h3>\n\n\n\n<p>This is the section I promised earlier when I said building the pipeline is one thing, but proving you&#8217;re going in the right direction is everything. This is where the work we did in the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\">Adaptive RAG<\/a> post becomes critical. The metrics we introduced there,  <strong>precision@k, recall@k, nDCG@k, and confidence scores<\/strong> are not just evaluation tools for tuning your search weights. They are early warning signals for embedding drift. <\/p>\n\n\n\n<p>Think about what happens when embeddings go stale:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A document was updated with important new information, but the embedding still reflects the old content<\/li>\n\n\n\n<li>Similarity search retrieves the document (the old embedding is close enough), but the chunk text no longer matches the query&#8217;s intent<\/li>\n\n\n\n<li>The LLM generates an answer based on outdated context<\/li>\n\n\n\n<li><strong>Precision drops<\/strong>: retrieved documents are less relevant<\/li>\n\n\n\n<li><strong>nDCG drops<\/strong>: the ranking quality degrades because truly relevant (updated) documents are ranked lower than stale ones that happen to have closer embeddings<\/li>\n\n\n\n<li><strong>Confidence drops<\/strong>: the gap between top results narrows, the system becomes less certain<\/li>\n<\/ul>\n\n\n\n<p>The pattern is subtle but measurable. Here&#8217;s how to capture it.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-retrieval-quality-logging-table\">Retrieval quality logging table<\/h4>\n\n\n\n<p>Extend the evaluation log from the Adaptive RAG post to include a timestamp dimension that allows you to track drift over time:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE TABLE retrieval_quality_log (\n    log_id          BIGSERIAL PRIMARY KEY,\n    query_text      TEXT NOT NULL,\n    query_type      TEXT,                -- &#039;factual&#039;, &#039;conceptual&#039;, &#039;exploratory&#039;\n    search_method   TEXT NOT NULL,       -- &#039;adaptive&#039;, &#039;hybrid&#039;, &#039;naive&#039;\n    confidence      NUMERIC(4,3),        -- 0.000 to 1.000\n    precision_at_10 NUMERIC(4,3),\n    recall_at_10    NUMERIC(4,3),\n    ndcg_at_10      NUMERIC(4,3),\n    avg_similarity  NUMERIC(4,3),        -- average cosine similarity of top-10\n    top1_score      NUMERIC(4,3),        -- score of the #1 result\n    score_gap       NUMERIC(4,3),        -- gap between #1 and #2 (confidence signal)\n    logged_at       TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n\n-- Index for time-series analysis\nCREATE INDEX idx_quality_log_time ON retrieval_quality_log (logged_at DESC);\n\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\" id=\"h-the-drift-detection-queries\">The drift detection queries<\/h4>\n\n\n\n<p>Now the interesting part. These queries detect embedding staleness <em>through retrieval quality degradation<\/em>, not through pipeline metrics:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\n-- 5. Weekly nDCG trend \u2014 is ranking quality degrading over time?\nSELECT date_trunc(&#039;week&#039;, logged_at) AS week,\n       round(avg(ndcg_at_10), 3) AS avg_ndcg,\n       round(avg(precision_at_10), 3) AS avg_precision,\n       round(avg(confidence), 3) AS avg_confidence,\n       count(*) AS queries\nFROM retrieval_quality_log\nWHERE logged_at &gt; now() - INTERVAL &#039;3 months&#039;\nGROUP BY week\nORDER BY week;\n\n-- What you&#039;re looking for:\n-- A slow, steady decline in avg_ndcg and avg_confidence over weeks\n-- This is the signature of embedding drift \u2014 the pipeline is running,\n-- but the embeddings are gradually falling behind the content\n\n-- 6. Confidence distribution shift \u2014 are more queries becoming uncertain?\nSELECT date_trunc(&#039;week&#039;, logged_at) AS week,\n       round(100.0 * count(*) FILTER (WHERE confidence &gt;= 0.7) \/ count(*), 1) \n           AS pct_high_confidence,\n       round(100.0 * count(*) FILTER (WHERE confidence BETWEEN 0.5 AND 0.7) \/ count(*), 1) \n           AS pct_medium_confidence,\n       round(100.0 * count(*) FILTER (WHERE confidence &lt; 0.5) \/ count(*), 1) \n           AS pct_low_confidence\nFROM retrieval_quality_log\nWHERE logged_at &gt; now() - INTERVAL &#039;3 months&#039;\nGROUP BY week\nORDER BY week;\n\n-- If pct_low_confidence is climbing week over week, your embeddings \n-- are losing alignment with the actual content\n\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\" id=\"h-closing-the-loop-from-quality-signal-to-re-embedding-trigger\">Closing the loop: from quality signal to re-embedding trigger<\/h4>\n\n\n\n<p>Here&#8217;s where this connects back to the event-driven architecture. The quality metrics don&#8217;t just sit in a dashboard, they can <strong>trigger re-embedding<\/strong> for documents that the pipeline&#8217;s change significance filter might have skipped.<\/p>\n\n\n\n<p>Remember the threshold discussion from Level 3: we set a 0.95 similarity ratio as the default, meaning changes under 5% are skipped. But what if a 3% change in a critical document is causing retrieval failures?<\/p>\n\n\n\n<p>The feedback loop:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"705\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-12-1024x705.png\" alt=\"\" class=\"wp-image-43065\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-12-1024x705.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-12-300x207.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-12-768x529.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2026\/02\/image-12.png 1474w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In practice, you would implement this as a periodic job (daily or weekly) that correlates low-quality retrievals with stale documents. The correlation can be as simple as ILIKE matching query terms against document titles, or as sophisticated as tracking which document IDs were returned in low-confidence results. The key is that <code>change_type = 'quality_reembed'<\/code> is a distinct signal in your queue \u2014 it tells you the re-embedding was triggered by quality degradation, not by a content change event.<\/p>\n\n\n\n<p>This is the complete picture: the event-driven pipeline handles the primary flow (react to data changes), the change significance filter optimizes it (skip trivial changes), and the quality monitoring loop catches what the filter missed. Three layers, each progressively more sophisticated, each compensating for the blind spots of the previous one.<\/p>\n\n\n\n<p>As I wrote in the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\">Adaptive RAG post<\/a>: an old BI principle is to know your KPI, what it really measures but also when it fails to measure. The infrastructure metrics (queue depth, latency, skip rate) measure pipeline health. The quality metrics (precision, nDCG, confidence) measure <strong>what your users experience<\/strong>. You need both.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary\">Summary<\/h2>\n\n\n\n<p>Throughout this post, we&#8217;ve covered a progression from simple to complex:<\/p>\n\n\n\n<p><strong>Level 1 \u2014 Triggers + Queue<\/strong>: Best for single-database setups. Zero external dependencies. PostgreSQL does the heavy lifting. Use <code>LISTEN\/NOTIFY<\/code> for sub-second latency. This covers 80% of use cases.<\/p>\n\n\n\n<p><strong>Level 2 \u2014 Logical Replication<\/strong>: Best when source and vector databases are separate PostgreSQL instances. The source team doesn&#8217;t need to modify anything. Built-in WAL-based CDC with automatic catch-up.<\/p>\n\n\n\n<p><strong>Level 3 \u2014 Flink CDC + Change Significance Filtering<\/strong>: Best for heterogeneous sources (Oracle, MySQL) or fan-out to multiple targets. The change significance filter is the key addition \u2014 by comparing before\/after images in the pipeline, you skip re-embedding for minor changes (typo fixes, metadata-only updates, formatting), which in practice eliminates 60-80% of unnecessary embedding API calls. Start with column-aware filtering, graduate to text diff ratio with a 0.95 threshold, and tune based on your RAG quality metrics.<\/p>\n\n\n\n<p><strong>Model Versioning<\/strong>: Regardless of which level you choose, version your embeddings. Track <code>model_name<\/code>, <code>model_version<\/code>, and <code>source_hash<\/code>. Use partial DiskANN indexes (pgvectorscale). Treat model upgrades like blue-green deployments.<\/p>\n\n\n\n<p><strong>Measurement<\/strong>: None of the above matters if you don&#8217;t instrument retrieval quality. The precision@k, recall@k, nDCG@k, and confidence metrics from the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-adaptive-rag-understanding-confidence-precision-ndcg\/\">Adaptive RAG<\/a> post aren&#8217;t a nice-to-have \u2014 they&#8217;re the only way to know whether your pipeline is actually keeping your RAG system healthy. Track them over time. Break them down by topic. Watch for drift. If you build the pipeline without the measurement layer, you&#8217;re flying blind. The evaluation framework in the <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\">pgvector_RAG_search_lab repository<\/a> (<code>lab\/evaluation\/<\/code>) gives you a concrete starting point.<\/p>\n\n\n\n<p><strong>The core principle<\/strong>: Event-driven architecture is a precondition for production RAG \u2014 but it&#8217;s not sufficient. The moment you accept batch re-embedding as &#8220;good enough,&#8221; you&#8217;re accepting that your RAG system will silently degrade between batches. The trigger\/CDC approach doesn&#8217;t just keep embeddings fresh \u2014 it gives you <strong>observability<\/strong> into what changed, when it was embedded, and whether the change was significant enough to matter. But the pipeline only proves that work was done. The quality metrics prove the work was effective. Log every decision. Measure the skip rate. Tune the threshold. Track nDCG weekly. This is how you operationalize RAG.<\/p>\n\n\n\n<p>If you&#8217;ve been building RAG systems without thinking about embedding freshness, now is the time to retrofit it. And if you&#8217;re starting a new RAG project \u2014 please, design the embedding pipeline as event-driven from day one. Your future self will thank you. One of the thing that I didn&#8217;t mention as well is what should be embedded ? This is not really a technical question in the sense that it is more link to your knowledge of the data, your business, your applications, your data workflows&#8230; <\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-s-next\">What&#8217;s Next<\/h2>\n\n\n\n<p>In <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-lab\/\" target=\"_blank\" rel=\"noreferrer noopener\">Part 2<\/a>, I apply everything from this post to the <strong>25,000-article Wikipedia dataset<\/strong> from the <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\">pgvector_RAG_search_lab<\/a> repository. You&#8217;ll see:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How to adapt the schema to an existing table (no greenfield luxury)<\/li>\n\n\n\n<li>Real SKIP vs EMBED decisions with actual similarity scores<\/li>\n\n\n\n<li>The <code>SKIP LOCKED<\/code> multi-worker demo with 4 concurrent workers and zero overlap<\/li>\n\n\n\n<li>A complete freshness monitoring report<\/li>\n\n\n\n<li>The quality feedback loop triggering re-embeddings automatically<\/li>\n<\/ul>\n\n\n\n<p>A probable futur blog post would I guess, benchmarks with pgvector and diskann indexes&#8230; <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction &#8220;Make it simple.&#8221; This is a principle I keep repeating, and I&#8217;ll repeat it again here. Because when it comes to keeping your RAG system&#8217;s embeddings fresh, the industry has somehow made it complicated. External orchestrators, custom Python cron jobs, microservices that call microservices, Airflow DAGs with 47 tasks, all to answer a simple [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":40270,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[83],"tags":[3523,77,3678],"type_dbi":[3869,2749,3868],"class_list":["post-43038","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-pgvector","tag-postgresql","tag-rag","type-pgvector","type-postgresql","type-rag"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows - dbi Blog<\/title>\n<meta name=\"description\" content=\"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows\" \/>\n<meta property=\"og:description\" content=\"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-22T14:23:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-22T21:41:24+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1344\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"25 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows\",\"datePublished\":\"2026-02-22T14:23:42+00:00\",\"dateModified\":\"2026-02-22T21:41:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\"},\"wordCount\":5548,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png\",\"keywords\":[\"pgvector\",\"PostgreSQL\",\"RAG\"],\"articleSection\":[\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\",\"name\":\"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png\",\"datePublished\":\"2026-02-22T14:23:42+00:00\",\"dateModified\":\"2026-02-22T21:41:24+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"description\":\"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png\",\"contentUrl\":\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png\",\"width\":1344,\"height\":768},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows - dbi Blog","description":"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/","og_locale":"en_US","og_type":"article","og_title":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows","og_description":"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.","og_url":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/","og_site_name":"dbi Blog","article_published_time":"2026-02-22T14:23:42+00:00","article_modified_time":"2026-02-22T21:41:24+00:00","og_image":[{"width":1344,"height":768,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png","type":"image\/png"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"25 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows","datePublished":"2026-02-22T14:23:42+00:00","dateModified":"2026-02-22T21:41:24+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/"},"wordCount":5548,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png","keywords":["pgvector","PostgreSQL","RAG"],"articleSection":["PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/","url":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/","name":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png","datePublished":"2026-02-22T14:23:42+00:00","dateModified":"2026-02-22T21:41:24+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"description":"Learn how to keep RAG embeddings fresh with event-driven pipelines in PostgreSQL. Covers triggers, logical replication, Flink CDC, pgvector versioning, and quality monitoring with nDCG.","breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/elephant.png","width":1344,"height":768},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"RAG Series \u2013 Embedding Versioning with pgvector: Why Event-Driven Architecture Is a Precondition to AI data workflows"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43038","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=43038"}],"version-history":[{"count":27,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43038\/revisions"}],"predecessor-version":[{"id":43099,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/43038\/revisions\/43099"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/40270"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=43038"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=43038"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=43038"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=43038"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}