{"id":40355,"date":"2025-09-28T17:12:08","date_gmt":"2025-09-28T15:12:08","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=40355"},"modified":"2025-09-28T17:12:10","modified_gmt":"2025-09-28T15:12:10","slug":"rag-series-naive-rag","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/","title":{"rendered":"RAG Series &#8211; Naive RAG"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\"><strong>Introduction<\/strong><\/h2>\n\n\n\n<p>Since my last series on pgvector I had quite a fun time to work on RAG workflows on pgvector and learned some valuable lessons and decided to share some of it in a blog post series on the matter. <br>We will discover together where all RAG best practices are landing for the past 2 years and how can as a DBA or &#8220;AI workflow engineer&#8221; improve your designs to be production fit. <br>We start this series with Na\u00efve RAG, this is quite known but important and foundational for the next posts of this series.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-retrieval-augmented-generation-rag\"><strong>What is Retrieval-Augmented Generation (RAG)?<\/strong><\/h2>\n\n\n\n<p><strong>Retrieval-Augmented Generation (RAG)<\/strong> is a technique that combines the power of large language models (LLMs) with information retrieval. Instead of relying solely on an LLM\u2019s internal knowledge (which may be outdated or limited, and prone to hallucinations), a RAG system retrieves relevant external documents and provides them as context for the LLM to generate a response. In practice, this means when a user asks a question, the system will <strong>retrieve<\/strong> a set of relevant text snippets (often from a knowledge base or database) and <strong>augment<\/strong> the LLM\u2019s input with those snippets, so that the answer can be grounded in real data. This technique is key for integrating businesses or organizations data with LLMs capabilities because it allows you to implement business rules, guidelines, governance, data privacy constraints&#8230;etc.  <br>Na\u00efve RAG is the first logical step to understand how the retrieval part works and how it can impact the LLM output. <\/p>\n\n\n\n<p>A RAG pipeline typically involves the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Document Embedding Storage<\/strong> \u2013 Your knowledge base documents are split into chunks and transformed into vector embeddings, which are stored in a vector index or database.<\/li>\n\n\n\n<li><strong>Query Embedding &amp; Retrieval<\/strong> \u2013 The user\u2019s query is converted into an embedding and the system performs a similarity search in the vector index to retrieve the top-$k$ most relevant chunks.<\/li>\n\n\n\n<li><strong>Generation using LLM<\/strong> \u2013 The retrieved chunks (as context) plus the query are given to an LLM which generates the final answer.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Try It Yourself<\/strong><\/h2>\n\n\n\n<p>Clone the repository and explore this implementation:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\ngit clone https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\ncd pgvector_RAG_search_lab\n<\/pre><\/div>\n\n\n<p>The lab includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Streamlit interface<\/strong> for testing different search methods<\/li>\n\n\n\n<li><strong>n8n workflows<\/strong> for orchestrating the RAG pipeline<\/li>\n\n\n\n<li><strong>Embedding generation scripts<\/strong> supporting multiple models<\/li>\n\n\n\n<li><strong>Performance comparison tools<\/strong> to evaluate different approaches<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Semantic Vector Search vs. Traditional SQL\/Full-Text Search<\/strong><\/h2>\n\n\n\n<p>Before diving deeper, it\u2019s worth contrasting the vector-based semantic search used in RAG with traditional keyword-based search techniques (like SQL <code>LIKE<\/code> queries or full-text search indexes). This is especially important for DBAs who are familiar with SQL and may wonder why a vector approach is needed.<\/p>\n\n\n\n<p><strong>Traditional Search<\/strong> (SQL LIKE, full-text): Matches literal terms or boolean combinations. Precise for exact matches but fails when queries use different wording. A search for &#8220;car&#8221; won&#8217;t find documents about &#8220;automobiles&#8221; without explicit synonym handling.<\/p>\n\n\n\n<p><strong>Semantic Vector Search<\/strong>: Converts queries and documents into high-dimensional vectors encoding semantic meaning. Finds documents whose embeddings are closest to the query&#8217;s embedding in vector space, enabling retrieval based on context rather than exact matches.<br><br><strong>The key advantage<\/strong>: semantic search improves <strong>recall <\/strong>when wording varies and excels with natural language queries. However, traditional search still has value for exact phrases or specific identifiers. Many production systems implement hybrid search combining both approaches (covered in a later post). <br><br>I am not going to go through all types of searches available in PostgreSQL but here is a diagram that is showing the historical and logical steps we went through the past decades.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2220\" height=\"1358\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12.png\" alt=\"\" class=\"wp-image-40363\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12.png 2220w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12-300x184.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12-1024x626.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12-768x470.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12-1536x940.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-12-2048x1253.png 2048w\" sizes=\"auto, (max-width: 2220px) 100vw, 2220px\" \/><\/figure>\n\n\n\n<p><em><strong>Key point:<\/strong><\/em> moving to vector search enables semantic retrieval that goes beyond what SQL <code>LIKE<\/code> or standard full-text indexes can achieve. It allows your RAG system to find the right information even when queries use different phrasing, making it far more robust for knowledge-based Q&amp;A.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Building a Na\u00efve RAG Pipeline (Step by Step)<\/h2>\n\n\n\n<p>Let\u2019s break down how to implement a Na\u00efve RAG pipeline properly, using the example from the <strong>pgvector_RAG_search_lab<\/strong> repository. We\u2019ll go through the major components and discuss best practices at each step: document chunking, embedding generation, vector indexing, the retrieval query, and finally the generation step.<\/p>\n\n\n\n<p>Here is a diagram of the entire data process :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"808\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-scaled.png\" alt=\"\" class=\"wp-image-40365\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-scaled.png 2560w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-300x95.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-1024x323.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-768x242.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-1536x485.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-13-2048x646.png 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">1. Document Ingestion \u2013 Chunking and Embeddings<\/h3>\n\n\n\n<p><strong>Chunking documents:<\/strong> Large documents (e.g. long articles, manuals, etc.) need to be split into smaller pieces called <em>chunks<\/em> before embedding. Choosing the right chunking strategy is crucial. If chunks are too large, they may include irrelevant text along with relevant info; if too small, you might lose context needed to answer questions.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Chunk size<\/strong>: test on your own data but a rule of thumb could be 100-150 tokens for factoid queries, 300+ for contextual queries but sentence or paragraph chunking are also an option. <\/li>\n<\/ul>\n\n\n\n<p><strong>Generating embeddings:<\/strong> Once the documents are chunked, each chunk is converted to a vector embedding by an embedding model. The choice of <strong>embedding model<\/strong> has a big impact on your RAG system\u2019s effectiveness and is generally coupled with the LLM model you are going to choose. Since I am using ChatGPT-5, I went for OpenAI embedding models with 3072(large) and 1536(small) dimensions. The lab supports OpenAI&#8217;s text-embedding-3-large, small (dvdrental db) and open-source alternatives. Check <a href=\"https:\/\/huggingface.co\/spaces\/mteb\/leaderboard\">MTEB benchmarks<\/a> for model selection.<\/p>\n\n\n\n<p>In the LAB repository you can generate the embeddings with the following Python script on the wikipedia database <em><mark class=\"has-inline-color has-cyan-bluish-gray-color\">(Note: in the example bellow the SPLADE model is loaded but not used for dense vectors, the script is handling both dense and sparse embedding generation, we will cover this on the next blog post)<\/mark> <\/em>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n(.venv) 12:56:43 postgres@PG1:\/home\/postgres\/RAG_lab_demo\/lab\/embeddings\/ &#x5B;PG17] python generate_embeddings.py --source wikipedia --type dense\n2025-09-27 12:57:28,407 - __main__ - INFO - Loading configuration...\n2025-09-27 12:57:28,408 - __main__ - INFO - Initializing services...\n2025-09-27 12:57:28,412 - lab.core.database - INFO - Database pool initialized with 1-20 connections\n2025-09-27 12:57:29,093 - lab.core.embeddings - INFO - Initialized OpenAI embedder with model: text-embedding-3-large\n2025-09-27 12:57:33,738 - lab.core.embeddings - INFO - Loading SPLADE model: naver\/splade-cocondenser-ensembledistil on device: cpu\nSome weights of BertModel were not initialized from the model checkpoint at naver\/splade-cocondenser-ensembledistil and are newly initialized: &#x5B;&#039;pooler.dense.bias&#039;, &#039;pooler.dense.weight&#039;]\nYou should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n2025-09-27 12:57:34,976 - lab.core.embeddings - INFO - SPLADE embedder initialized on cpu\n\n============================================================\nEMBEDDING GENERATION JOB SUMMARY\n============================================================\nSource: wikipedia\nEmbedding Type: dense\nUpdate Existing: False\n============================================================\n\nCURRENT STATUS:\nWikipedia Articles: 25000 total, 25000 with title embeddings\n                   25000 with content embeddings\n\nProceed with embedding generation? (y\/N): y\n\n============================================================\nEXECUTING JOB 1\/1\nTable: articles\nType: dense\nColumns: &#x5B;&#039;title&#039;, &#039;content&#039;] -&gt; &#x5B;&#039;title_vector_3072&#039;, &#039;content_vector_3072&#039;]\n============================================================\n2025-09-27 12:57:41,367 - lab.embeddings.embedding_manager - INFO - Starting embedding generation job: wikipedia - dense\n2025-09-27 12:57:41,389 - lab.embeddings.embedding_manager - WARNING - No items found for embedding generation\n\nJob 1 completed:\n  Successful: 0\n  Failed: 0\n\n============================================================\nFINAL SUMMARY\n============================================================\nTotal items processed: 0\nSuccessful: 0\nFailed: 0\n\nFINAL EMBEDDING STATUS:\n\n============================================================\nEMBEDDING GENERATION JOB SUMMARY\n============================================================\nSource: wikipedia\nEmbedding Type: dense\nUpdate Existing: False\n============================================================\n\nCURRENT STATUS:\nWikipedia Articles: 25000 total, 25000 with title embeddings\n                   25000 with content embeddings\n2025-09-27 12:57:41,422 - __main__ - INFO - Embedding generation completed successfully\n2025-09-27 12:57:41,422 - lab.core.database - INFO - Database connection pool closed\n(.venv) 12:57:42 postgres@PG1:\/home\/postgres\/RAG_lab_demo\/lab\/embeddings\/ &#x5B;PG17]\n<\/pre><\/div>\n\n\n<p>After these steps, you will have a <strong>vectorized database<\/strong>: each document chunk is represented by a vector, stored in a table (if using a DB like Postgres\/pgvector) or in a vector store. Now it\u2019s ready to be queried.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Vector Indexing and Search in Postgres (pgvector + DiskANN)<\/strong><\/h3>\n\n\n\n<p>For production-scale RAG, how you index and search your vectors is critical for performance. In a naive setup, you might simply do a brute-force nearest neighbor search over all embeddings \u2013 which is fine for a small dataset or testing, but too slow for large collections. Instead, you should use an <strong>Approximate Nearest Neighbor (ANN) index<\/strong> to speed up retrieval. The pgvector extension for PostgreSQL allows you to create such indexes in the database itself.<\/p>\n\n\n\n<p><strong>Using pgvector in Postgres:<\/strong> pgvector stores vectors and supports IVFFlat and HNSW for ANN. For larger-than-RAM or cost-sensitive workloads, add the <strong>pgvectorscale<\/strong> extension, which introduces a <strong>StreamingDiskANN<\/strong> index inspired by Microsoft\u2019s DiskANN, plus compression and filtered search. <\/p>\n\n\n\n<p><mark class=\"has-inline-color has-luminous-vivid-orange-color\">! Not all specialized Vector databases or Vector stores have this feature, if your data needs to scale this is a critical aspect !<\/mark><\/p>\n\n\n\n<p>For our Na\u00efve RAG example : <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nCREATE INDEX idx_articles_content_vec\nON articles\nUSING diskann (content_vector vector_cosine_ops);\n\n<\/pre><\/div>\n\n\n<p><strong>Why use Postgres for RAG?<\/strong> For many organizations, using Postgres with pgvector is convenient because it keeps the vectors alongside other relational data and leverages existing operational familiarity (backup, security, etc.). It avoids introducing a separate vector database. Storing vectors in your existing operational DB can eliminate the complexity of syncing data with a separate vector store, while still enabling semantic search (not just keyword search) on that data. With extensions like pgvector (and vectorscale), Postgres can achieve performance close to specialized vector DBs, if not better. Of course, specialized solutions (Pinecone, Weaviate, etc.) are also options \u2013 but the pgvector approach is very appealing for DBAs who want everything in one familiar ecosystem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Query Processing and Retrieval<\/h3>\n\n\n\n<p>With the data indexed, the runtime query flow of Na\u00efve RAG is straightforward:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Query embedding:<\/strong> When a user question comes in (for example: \u201cWhat is WAL in PostgreSQL and why is it important?\u201d), we first transform that query into an embedding vector using the same model we used for documents. This could be a real-time call to an API (if using an external service like OpenAI for embeddings) or a local model inference. Caching can be applied for repeated queries, though user queries are often unique. Ensure the text of the query is cleaned or processed in the same way document text was (e.g., if you did lowercasing, removal of certain stopwords, etc., apply consistently if needed \u2013 though modern embedding models typically handle raw text well without special preprocessing).<\/li>\n\n\n\n<li><strong>Vector similarity search:<\/strong> We then perform the ANN search in the vector index with the query embedding. As shown earlier, this is an <code>ORDER BY vector &lt;=&gt; query_vector LIMIT k<\/code> type query in SQL (or the equivalent call in your vector DB\u2019s client). The result is the top <strong>k<\/strong> most similar chunks to the query. Choosing <strong>k (the number of chunks)<\/strong> to retrieve is another design parameter: common values are in the range 3\u201310. You want enough pieces of context to cover the answer, but not so many that you overwhelm the LLM or introduce irrelevant noise. A typical default is <code>k=5<\/code>. In the example lab workflow, they use a <code>top_k<\/code> of 5 by default. If you retrieve too few, you might miss part of the answer; too many, and the prompt to the LLM becomes long and could confuse it with extraneous info.<\/li>\n<\/ul>\n\n\n\n<p>The outcome of the retrieval step is a set of top-k text chunks (contexts) that hopefully contain the information needed to answer the user\u2019s question.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. LLM Answer Generation<\/h3>\n\n\n\n<p>Finally, the retrieved chunks are fed into the prompt of a large language model to generate the answer. This step is often implemented with a prompt template such as:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cUse the following context to answer the question. If the context does not have the answer, say you don\u2019t know.<\/em><br><strong>Context:<\/strong><br>[Chunk 1 text]<br>[Chunk 2 text]<br>&#8230;<br><strong>Question:<\/strong> [User\u2019s query]<br><strong>Answer:<\/strong>\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>The LLM (which could be GPT-5, or an open-source model depending on your choice) will then produce an answer, hopefully drawing facts from the provided context rather than hallucinating. <strong>Na\u00efve RAG<\/strong> doesn\u2019t include complex prompt strategies or multiple prompt stages; it\u2019s usually a single prompt that includes all top chunks at once (this is often called the \u201cstuffing\u201d approach \u2013 stuffing the context into the prompt). This is simple and works well when the amount of context is within the model\u2019s input limit.<\/p>\n\n\n\n<p><strong>Best practices for the generation step:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Order and formatting of contexts:<\/strong> Usually, chunks can be simply concatenated. It can help to separate them with headings or bullet points, or any delimiter that clearly resets context. Some frameworks sort retrieved chunks by similarity score (highest first) under the assumption that the first few are most relevant \u2013 this makes sense so that if the prompt gets truncated or the model gives more weight to earlier context (which can happen), the best info is first.<\/li>\n\n\n\n<li><strong>Avoid exceeding token limits:<\/strong> If each chunk is, say, ~100 tokens and you include 5 chunks, that\u2019s ~500 tokens of context plus the prompt overhead and question. This should fit in most LLMs with 4k+ token contexts. But if your chunks or k are larger, be mindful not to exceed the model\u2019s max token limit for input. If needed, reduce k or chunk size, or consider splitting the question into sub-queries (advanced strategy) to handle very broad asks. <\/li>\n\n\n\n<li><strong>Prompt instructions:<\/strong> In naive usage, you rely on the model to use the context well. It\u2019s important to instruct the model clearly to <strong>only use the provided context<\/strong> for answering, and to indicate if the context doesn\u2019t have the answer. This mitigates hallucination. For example, telling it explicitly \u201cIf you don\u2019t find the answer in the context, respond that you are unsure or that it\u2019s not in the provided data.\u201d This way, if retrieval ever fails (e.g., our top-k didn\u2019t actually contain the needed info), the model won\u2019t fabricate an answer. It will either abstain or say \u201cI don\u2019t know.\u201d Depending on your application, you might handle that case by maybe increasing k or using a fallback search method.<\/li>\n\n\n\n<li><strong>Citing sources:<\/strong> A nice practice, especially for production QA systems, is to have the LLM output the source of the information (like document titles or IDs). Since your retrieval returns chunk metadata, you can either have the model include them in the answer or attach them after the fact. This builds trust with users and helps for debugging. For instance, the lab workflow tracks the titles of retrieved articles and could enable showing which Wikipedia article an answer came from. In a naive setup, you might just append a list of sources (\u201cSource: [Title of Article]\u201d) to the answer.<\/li>\n<\/ul>\n\n\n\n<p>With that, the Na\u00efve RAG pipeline is complete: the user\u2019s query is answered by the LLM using real data fetched from your database. Despite its simplicity, this approach can already dramatically improve the factual accuracy of answers and allow your system to handle queries about information that the base LLM was never trained on (for example, very recent or niche knowledge).<\/p>\n\n\n\n<p>In our LAB setup on n8n the workflow without the chunking and embedding generation looks like that : <br>Note that we use <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1955\" height=\"426\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16.png\" alt=\"\" class=\"wp-image-40394\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16.png 1955w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16-300x65.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16-1024x223.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16-768x167.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-16-1536x335.png 1536w\" sizes=\"auto, (max-width: 1955px) 100vw, 1955px\" \/><\/figure>\n\n\n\n<p>In the Streamlit interface also provided in the repo, we have the following : <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1775\" height=\"1110\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17.png\" alt=\"\" class=\"wp-image-40396\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17.png 1775w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17-300x188.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17-1024x640.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17-768x480.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image-17-1536x961.png 1536w\" sizes=\"auto, (max-width: 1775px) 100vw, 1775px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Monitoring and Improving RAG in Production<\/h2>\n\n\n\n<p>Implementing the pipeline is only part of the story. In a production setting, we need to <strong>monitor the system\u2019s performance and gather feedback<\/strong> to continuously improve it. <br>The comparison workflow allows side-by-side testing of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Different embedding models<\/li>\n\n\n\n<li>Chunk sizes and strategies<\/li>\n\n\n\n<li>Retrieval parameters (top-k values)<\/li>\n\n\n\n<li>LLM prompting approaches<\/li>\n<\/ul>\n\n\n\n<p>In summary, treat your RAG system as an evolving product: <strong>monitor retrieval relevance, answer accuracy (groundedness), and system performance<\/strong>. Use a combination of automated metrics and human review to ensure quality. Tools like LangSmith can provide infrastructure for logging queries and scoring outputs on metrics like faithfulness or relevance, flagging issues like \u201cbad retrievals\u201d or hallucinated responses. By keeping an eye on these aspects, you can iterate and improve your Na\u00efve RAG system continuously, making it more robust and trustworthy. Although Langsmith is very usefull, be carefull with the necessary pitfall that come along with any abstraction. A good rule of thumb would be to keep your core logic into custom code while taking leverage of Langsmith tools for peripherals. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion and Next Steps<\/h2>\n\n\n\n<p>Na\u00efve RAG provides the basic blueprint of how to augment LLMs with external knowledge using semantic search. We discussed how to implement it using PostgreSQL with pgvector, covering best practices in chunking your data, selecting suitable embeddings, indexing with advanced methods like DiskANN for speed, and ensuring that you monitor the system\u2019s effectiveness in production. This straightforward dense retrieval approach is often <strong>the first step toward building a production-grade QA system<\/strong>. It\u2019s relatively easy to set up and already yields substantial gains in answer accuracy and currency of information.<\/p>\n\n\n\n<p>However, as powerful as Na\u00efve RAG is, it has its limitations. Pure dense vector similarity can sometimes miss exact matches (like precise figures or rare terms) that a keyword search would catch, and it might bring in semantically relevant but not factually useful context in some cases. In the upcoming posts of this series, we\u2019ll explore more advanced RAG techniques that address these issues:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Hybrid RAG:<\/strong> combining dense vectors with sparse (lexical) search to get the best of both worlds \u2013 we\u2019ll see how a hybrid approach can improve recall and precision by weighting semantic and keyword signals <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\/blob\/6b97d7d2daee5b1315787533a9f2fa995c2c2d5b\/lab\/workflows\/README.md#L12-L20\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a>.<\/li>\n\n\n\n<li><strong>Adaptive RAG:<\/strong> introducing intelligent query classification and dynamic retrieval strategies \u2013 for example, automatically detecting when to favor lexical vs. semantic, or how to route certain queries to specialized retrievers. <\/li>\n\n\n\n<li>and other more trendy RAG types like <strong>Self RAG<\/strong> or <strong>Agentic RAG<\/strong>&#8230;.<\/li>\n<\/ul>\n\n\n\n<p>As you experiment with the provided lab or your own data, remember the core best practices: ensure your retrieval is solid (that often solves most problems), and always ground the LLM\u2019s output in real, retrieved data. The repository will introduce other RAG and lab examples over time, the hybrid and adaptive workflows are already built in though. <br>With all the fuss around AI and LLM it might seem chaotic from an outsider with a lack of stability and maturity. There a good chance that the RAG fundamental component will still be there in the coming years if not decade, it lasted 2 years already and proved useful, we just might see those components be summed and integrated to other systems especially with components provided normative monitoring and evaluation which the big open subject. We can&#8217;t say that RAG patterns we have are mature but we know for sure they are fundamental to what is coming next. <\/p>\n\n\n\n<p>Stay tuned for the next part, where we dive into <strong>Hybrid RAG<\/strong> and demonstrate how combining search strategies can boost performance beyond what naive semantic search can do alone. <br><br><br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Since my last series on pgvector I had quite a fun time to work on RAG workflows on pgvector and learned some valuable lessons and decided to share some of it in a blog post series on the matter. We will discover together where all RAG best practices are landing for the past 2 [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":37679,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[83],"tags":[3524,3677,3678],"type_dbi":[2749],"class_list":["post-40355","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-ai-ml","tag-llm","tag-rag","type-postgresql"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.4) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>RAG Series - Naive RAG - dbi Blog<\/title>\n<meta name=\"description\" content=\"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"RAG Series - Naive RAG\" \/>\n<meta property=\"og:description\" content=\"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-28T15:12:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-28T15:12:10+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"RAG Series &#8211; Naive RAG\",\"datePublished\":\"2025-09-28T15:12:08+00:00\",\"dateModified\":\"2025-09-28T15:12:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/\"},\"wordCount\":2753,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"keywords\":[\"AI\\\/ML\",\"LLM\",\"RAG\"],\"articleSection\":[\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/\",\"name\":\"RAG Series - Naive RAG - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"datePublished\":\"2025-09-28T15:12:08+00:00\",\"dateModified\":\"2025-09-28T15:12:10+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"description\":\"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"contentUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"width\":1024,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-naive-rag\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"RAG Series &#8211; Naive RAG\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/adrienobernesser\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"RAG Series - Naive RAG - dbi Blog","description":"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/","og_locale":"en_US","og_type":"article","og_title":"RAG Series - Naive RAG","og_description":"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.","og_url":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/","og_site_name":"dbi Blog","article_published_time":"2025-09-28T15:12:08+00:00","article_modified_time":"2025-09-28T15:12:10+00:00","og_image":[{"width":1024,"height":1024,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","type":"image\/png"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"RAG Series &#8211; Naive RAG","datePublished":"2025-09-28T15:12:08+00:00","dateModified":"2025-09-28T15:12:10+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/"},"wordCount":2753,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","keywords":["AI\/ML","LLM","RAG"],"articleSection":["PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/","url":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/","name":"RAG Series - Naive RAG - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","datePublished":"2025-09-28T15:12:08+00:00","dateModified":"2025-09-28T15:12:10+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"description":"Hands-on guide to Na\u00efve RAG on Postgres using pgvector and DiskANN. Learn chunking, embeddings, top-k retrieval, and orchestrate it with an n8n template.","breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","width":1024,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"RAG Series &#8211; Naive RAG"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=40355"}],"version-history":[{"count":67,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40355\/revisions"}],"predecessor-version":[{"id":40468,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40355\/revisions\/40468"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/37679"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=40355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=40355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=40355"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=40355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}