{"id":40687,"date":"2025-10-05T21:49:50","date_gmt":"2025-10-05T19:49:50","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=40687"},"modified":"2025-10-05T21:49:52","modified_gmt":"2025-10-05T19:49:52","slug":"rag-series-hybrid-search-with-re-ranking","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/","title":{"rendered":"RAG Series &#8211; Hybrid Search with Re-ranking"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>In the <a href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-naive-rag\/\">first part<\/a> of this RAG series, we established the fundamentals of Naive RAG with dense vector embeddings on PostgreSQL using pgvector. That foundation works well for conceptual queries, but production systems quickly reveal a limitation: pure semantic search misses exact matches like you would have with like or Full-text searches. When someone searches for &#8220;PostgreSQL 17 performance improvements,&#8221; pure vector search might return general performance topics while completely missing the specific version number. This is where hybrid search helps\u2014combining the semantic understanding of dense embeddings with the precision of traditional keyword search.<\/p>\n\n\n\n<p>We will explore hybrid sparse-dense search implementation with PostgreSQL and pgvector, diving a bit into the mathematics behind score fusion, practical implementation patterns using the pgvector_RAG_search_lab repository, and re-ranking techniques that can boost retrieval accuracy by 15-30%. We are building on the Wikipedia dataset (25,000 articles) from the previous post, but this time we will critically examine our embedding choice and optimization strategies. The aim being for you rather to following this guide blindly, to understand it&#8217;s limitations and thus make your own choices based on experimentations.  <\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">What You\u2019ll Learn<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sparse vs dense embeddings: when and why to use each<\/li>\n\n\n\n<li>Hybrid architecture: SPLADE + dense + SQL<\/li>\n\n\n\n<li>Reciprocal Rank Fusion (RRF)<\/li>\n\n\n\n<li>Cross-encoder re-ranking for production precision<\/li>\n\n\n\n<li>Efficiency tips: tuning, storage, cost, latency<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>You can try everything using the same GitHub repo from Part 1:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ngit clone https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\ncd pgvector_RAG_search_lab\n<\/pre><\/div>\n\n\n<p>Explore hybrid implementations in <code>lab\/search\/<\/code>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>hybrid_rrf.py<\/code>: dense + sparse search + RRF<\/li>\n\n\n\n<li><code>hybrid_rerank.py<\/code>: hybrid + cross-encoder rerank<\/li>\n\n\n\n<li>Streamlit UI: <code>streamlit run streamlit_demo.py<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Dense vs Sparse Embeddings<\/h2>\n\n\n\n<p><strong>Dense embeddings<\/strong> (e.g., <code>text-embedding-3-large<\/code>) represent semantic meaning well. But for a corpus like 25K Wikipedia articles, the 3072-dim model is likely <strong>overkill<\/strong>. We implement it here only for testing purpose. <\/p>\n\n\n\n<p>\u2705 You can use <code>text-embedding-3-small<\/code> (1536 dim) instead\u2014it\u2019s cheaper, faster, and nearly as accurate for homogeneous datasets. You can also check this leaderboard to verify the gains of using text-embedding-3-large or small embedding model from OpenAI : <a href=\"https:\/\/huggingface.co\/spaces\/mteb\/leaderboard\">MTEB Leaderboard &#8211; a Hugging Face Space by mteb<\/a><\/p>\n\n\n\n<p><strong>Sparse embeddings<\/strong>, like those from SPLADE, model exact keyword importance via high-dimensional sparse vectors (30K+ dims). They bridge the gap between semantic and lexical search.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SPLADE vs BM25<\/h2>\n\n\n\n<p>SPLADE outperforms BM25 across benchmarks like MS MARCO and TREC for semantic-heavy queries. But BM25 is still useful when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Exact match is critical (codes, legal texts)<\/li>\n\n\n\n<li>Simplicity and explainability matter<\/li>\n\n\n\n<li>You need CPU-only solutions<\/li>\n<\/ul>\n\n\n\n<p>If you are looking for the best technology out there SPLADE might be what you are looking for but FTS search algorithm is always overlooked and even today could improve alone a lot of legacy application with minimal efforts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">PostgreSQL Hybrid Search Schema<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nwikipedia=# \\d articles\n                          Table &quot;public.articles&quot;\n         Column         |       Type       | Collation | Nullable | Default\n------------------------+------------------+-----------+----------+---------\n id                     | integer          |           | not null |\n url                    | text             |           |          |\n title                  | text             |           |          |\n content                | text             |           |          |\n title_vector           | vector(1536)     |           |          |\n content_vector         | vector(1536)     |           |          |\n vector_id              | integer          |           |          |\n content_tsv            | tsvector         |           |          |\n title_content_tsvector | tsvector         |           |          |\n content_sparse         | sparsevec(30522) |           |          |\n title_vector_3072      | vector(3072)     |           |          |\n content_vector_3072    | vector(3072)     |           |          |\nIndexes:\n    &quot;articles_pkey&quot; PRIMARY KEY, btree (id)\n    &quot;articles_content_3072_diskann&quot; diskann (content_vector_3072)\n    &quot;articles_sparse_hnsw&quot; hnsw (content_sparse sparsevec_cosine_ops) WITH (m=&#039;16&#039;, ef_construction=&#039;64&#039;)\n    &quot;articles_title_vector_3072_diskann&quot; diskann (title_vector_3072) WITH (storage_layout=memory_optimized, num_neighbors=&#039;50&#039;, search_list_size=&#039;100&#039;, max_alpha=&#039;1.2&#039;)\n    &quot;idx_articles_content_tsv&quot; gin (content_tsv)\n    &quot;idx_articles_title_content_tsvector&quot; gin (title_content_tsvector)\nTriggers:\n    tsvectorupdate BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION articles_tsvector_trigger()\n    tsvupdate BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION articles_tsvector_trigger()\n\n<\/pre><\/div>\n\n\n<p>This the table structure I implemented to be able to test, Full-text search against splade with or without dense similarity search, you can also try with different indexes and see the results. <br><em>One interesting thing to note here is that you can create dense and sparse embeddings for content but for the title field it might be unnecessary since there is a high likelihood that the title is related to the content.  That said, the wikipedia database has some content where you might end up nowhere either solution and only the proper chunking strategy or FTS will help you for specific terms. That edge case is for example the year articles, like &#8216;2007&#8217;, where the title name is the year and the content is just dates and what happened that day. So there is no relation between the year it self, the dates and the events because no where we mention year and date together and embeddings on title and content are separated. So in this case the best scenario would be to perform a normal WHERE clause search on the year you are looking for and then look into the content for similarity search for example. <\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Rank Fusion with RRF<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"850\" height=\"1000\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1.png\" alt=\"\" class=\"wp-image-40741\" style=\"width:727px;height:auto\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1.png 850w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1-255x300.png 255w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1-768x904.png 768w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"has-text-align-right\"><em>source : www.researchgate.net<br><\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>We use <strong>Reciprocal Rank Fusion<\/strong> (RRF) to combine dense and sparse results without normalizing scores because it ranks by position not raw value. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndef reciprocal_rank_fusion(rankings, k=60):\n    scores = {} # doc_id -&amp;gt; cumulative RRF score across lists\n    for ranking in rankings:  # each &#039;ranking&#039; is a list of doc_ids in order\n        for rank, doc_id in enumerate(ranking, 1): # ranks start at 1\n            scores&#x5B;doc_id] = scores.get(doc_id, 0) + 1 \/ (k + rank)\n    return sorted(scores.items(), key=lambda x: x&#x5B;1], reverse=True)\n<\/pre><\/div>\n\n\n<p>Key Insight: RRF is more robust but weighted allows fine-tuning<br><strong>RRF<\/strong> = <em>robust default<\/em>. No calibration needed. This is a great method when score scales are incomparable or volatile across queries, or when you\u2019re fusing many signals.<br>\u2705 Very low tuning burden (pick k\u224850\u2013100).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-hybrid-search-with-rrf\">Hybrid Search with RRF<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nclass RankingService:\n    &quot;&quot;&quot;\n    Service for ranking and merging search results.\n    \n    Provides multiple ranking strategies:\n    - Reciprocal Rank Fusion (RRF)\n    - Weighted linear combination\n    - Score normalization\n    - Custom ranking functions\n    &quot;&quot;&quot;\n    \n    def __init__(self, default_k: int = 60):\n        &quot;&quot;&quot;\n        Initialize ranking service.\n        \n        Args:\n            default_k: Default k parameter for RRF (typically 60)\n        &quot;&quot;&quot;\n        self.default_k = default_k\n    \n    def reciprocal_rank_fusion(\n        self,\n        result_lists: List&#x5B;List&#x5B;Any]],\n        k: Optional&#x5B;int] = None,\n        id_func: Optional&#x5B;Callable] = None,\n        score_func: Optional&#x5B;Callable] = None\n    ) -&gt; List&#x5B;RankedResult]:\n        &quot;&quot;&quot;\n        Merge multiple result lists using Reciprocal Rank Fusion.\n        \n        RRF score = sum(1 \/ (k + rank_i)) for each list i\n        \n        Args:\n            result_lists: List of result lists to merge\n            k: RRF parameter (default: 60)\n            id_func: Function to extract ID from result\n            score_func: Function to extract score from result\n            \n        Returns:\n            Merged and ranked results\n        &quot;&quot;&quot;\n        k = k or self.default_k\n        id_func = id_func or (lambda x: x.id if hasattr(x, &#039;id&#039;) else x.get(&#039;id&#039;))\n        score_func = score_func or (lambda x: x.score if hasattr(x, &#039;score&#039;) else x.get(&#039;score&#039;, 0))\n        \n        # Calculate RRF scores\n        rrf_scores = defaultdict(float)\n        result_map = {}\n        source_map = defaultdict(list)\n        \n        for list_idx, results in enumerate(result_lists):\n            for rank, result in enumerate(results, 1):\n                result_id = id_func(result)\n                rrf_scores&#x5B;result_id] += 1.0 \/ (k + rank)\n                result_map&#x5B;result_id] = result\n                source_map&#x5B;result_id].append(f&quot;list_{list_idx}&quot;)\n        \n        # Sort by RRF score\n        sorted_ids = sorted(rrf_scores.keys(), key=lambda x: rrf_scores&#x5B;x], reverse=True)\n        \n        # Create ranked results\n        ranked_results = &#x5B;]\n        for rank, result_id in enumerate(sorted_ids, 1):\n            result = result_map&#x5B;result_id]\n            \n            # Extract content based on result type\n            if hasattr(result, &#039;content&#039;):\n                content = result.content\n            elif isinstance(result, dict) and &#039;content&#039; in result:\n                content = result&#x5B;&#039;content&#039;]\n            else:\n                content = str(result)\n            \n            # Extract metadata\n            if hasattr(result, &#039;metadata&#039;):\n                metadata = result.metadata\n            elif isinstance(result, dict):\n                metadata = {k: v for k, v in result.items() if k not in &#x5B;&#039;id&#039;, &#039;content&#039;, &#039;score&#039;]}\n            else:\n                metadata = {}\n            \n            ranked_result = RankedResult(\n                id=result_id,\n                content=content,\n                score=rrf_scores&#x5B;result_id],\n                rank=rank,\n                metadata=metadata,\n                sources=source_map&#x5B;result_id]\n            )\n            ranked_results.append(ranked_result)\n        \n        return ranked_results\n<\/pre><\/div>\n\n\n<p>Added to the RRF method you can add some weight on each side to favor one type of embedding in the ranking like so : <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nif weight_combinations is None:\n            weight_combinations = &#x5B;\n                (1.0, 0.0),  # Dense only\n                (0.7, 0.3),  # Dense heavy\n                (0.5, 0.5),  # Balanced\n                (0.3, 0.7),  # Sparse heavy\n                (0.0, 1.0)   # Sparse only\n            ]\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">When to Use Hybrid + Re-ranking<\/h2>\n\n\n\n<p>\u2705 Use it when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Queries mix keywords and concepts<\/li>\n\n\n\n<li>Domain has specialized or rare terms<\/li>\n\n\n\n<li>Precision matters (compliance, recommendations)<\/li>\n\n\n\n<li>Dataset is diverse or multi-modal<\/li>\n<\/ul>\n\n\n\n<p>\u274c Stick with pure dense when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Queries are exploratory<\/li>\n\n\n\n<li>Low latency is essential<\/li>\n\n\n\n<li>You\u2019re just prototyping<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Final Takeaways<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hybrid search improves accuracy 8\u201315% over pure methods.<\/li>\n\n\n\n<li>PostgreSQL with <code>pgvector<\/code> handles it natively\u2014no need for external vector DBs.<\/li>\n\n\n\n<li>RRF is simple, effective, and production-safe.<\/li>\n\n\n\n<li>Cross-encoder re-ranking is optional but powerful.<\/li>\n\n\n\n<li><strong>Start small<\/strong>: tune <code>sparse_boost<\/code>, use 1536 dims, monitor recall\/failure rates.<\/li>\n<\/ul>\n\n\n\n<p>In Part 3, we will explore <strong>adaptive RAG<\/strong>\u2014dynamic query routing, confidence-based fallbacks, and agentic workflows. Hybrid search sets the foundation for those advanced RAG techniques that can help you reach your goals of integrating AI\/LLM capabilities in your organization. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Try It Yourself<\/h2>\n\n\n\n<p>\ud83d\udd17 GitHub: <a href=\"https:\/\/github.com\/boutaga\/pgvector_RAG_search_lab\">pgvector_RAG_search_lab<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2423\" height=\"1163\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image.png\" alt=\"\" class=\"wp-image-40733\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image.png 2423w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-300x144.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1024x492.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-768x369.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-1536x737.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/10\/image-2048x983.png 2048w\" sizes=\"auto, (max-width: 2423px) 100vw, 2423px\" \/><\/figure>\n\n\n\n<p>Use the included Streamlit app to compare (<em>just be sure to embbed your query with the same model and number of dimensions than your content<\/em>) :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\ud83d\udd0d Semantic-only<\/li>\n\n\n\n<li>\ud83d\udd0d Hybrid (RRF)<\/li>\n\n\n\n<li>\ud83d\udd0d Hybrid + re-ranking<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In the first part of this RAG series, we established the fundamentals of Naive RAG with dense vector embeddings on PostgreSQL using pgvector. That foundation works well for conceptual queries, but production systems quickly reveal a limitation: pure semantic search misses exact matches like you would have with like or Full-text searches. When someone [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":37679,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[83],"tags":[3685,3523,3678,3686],"type_dbi":[2749],"class_list":["post-40687","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-ai-llm","tag-pgvector","tag-rag","tag-rag-search-2","type-postgresql"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.4) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>RAG Series - Hybrid Search with Re-ranking - dbi Blog<\/title>\n<meta name=\"description\" content=\"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"RAG Series - Hybrid Search with Re-ranking\" \/>\n<meta property=\"og:description\" content=\"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-05T19:49:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-10-05T19:49:52+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"RAG Series &#8211; Hybrid Search with Re-ranking\",\"datePublished\":\"2025-10-05T19:49:50+00:00\",\"dateModified\":\"2025-10-05T19:49:52+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/\"},\"wordCount\":895,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"keywords\":[\"AI\\\/LLM\",\"pgvector\",\"RAG\",\"RAG-Search\"],\"articleSection\":[\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/\",\"name\":\"RAG Series - Hybrid Search with Re-ranking - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"datePublished\":\"2025-10-05T19:49:50+00:00\",\"dateModified\":\"2025-10-05T19:49:52+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"description\":\"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"contentUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/03\\\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png\",\"width\":1024,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/rag-series-hybrid-search-with-re-ranking\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"RAG Series &#8211; Hybrid Search with Re-ranking\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/adrienobernesser\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"RAG Series - Hybrid Search with Re-ranking - dbi Blog","description":"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/","og_locale":"en_US","og_type":"article","og_title":"RAG Series - Hybrid Search with Re-ranking","og_description":"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).","og_url":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/","og_site_name":"dbi Blog","article_published_time":"2025-10-05T19:49:50+00:00","article_modified_time":"2025-10-05T19:49:52+00:00","og_image":[{"width":1024,"height":1024,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","type":"image\/png"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"RAG Series &#8211; Hybrid Search with Re-ranking","datePublished":"2025-10-05T19:49:50+00:00","dateModified":"2025-10-05T19:49:52+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/"},"wordCount":895,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","keywords":["AI\/LLM","pgvector","RAG","RAG-Search"],"articleSection":["PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/","url":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/","name":"RAG Series - Hybrid Search with Re-ranking - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","datePublished":"2025-10-05T19:49:50+00:00","dateModified":"2025-10-05T19:49:52+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"description":"Build production-grade hybrid RAG pipelines in PostgreSQL using pgvector, SPLADE sparse vectors and Reciprocal Rank Fusion (RRF).","breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/03\/pixlr-image-generator-5f64d780-c578-477a-9419-7ddcdb807c83.png","width":1024,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/rag-series-hybrid-search-with-re-ranking\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"RAG Series &#8211; Hybrid Search with Re-ranking"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=40687"}],"version-history":[{"count":52,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40687\/revisions"}],"predecessor-version":[{"id":40760,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/40687\/revisions\/40760"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/37679"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=40687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=40687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=40687"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=40687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}