{"id":37080,"date":"2025-02-09T20:19:22","date_gmt":"2025-02-09T19:19:22","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=37080"},"modified":"2025-03-20T16:25:30","modified_gmt":"2025-03-20T15:25:30","slug":"pgvector-a-guide-for-dba-part1-lab-demo","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/","title":{"rendered":"pgvector, a guide for DBA &#8211; Part1: LAB DEMO"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>The first release of pgvector occurred in April 2021. Since then a lot has changed and is still changing. Even though we could say that the level of maturity of AI stacks and usage is currently really amazing, things are still changing fast and will probably continue to do so for a while. This is quite exciting because as a DBA consultant, I am no longer getting questions about theory on vectors, but more about how we can leverage this PostgreSQL extension and deploy it in production data hubs or enterprise applications. <br>We have reached a level of simplicity of use and maturity of the AI\/ML tools that implementing it is no longer reserved for huge organizations. Anyone can build its own stack or use a cloud provider at a fairly decent price. Since I am going to give some talks on that subject throughout the year, I thought it was useful to make a small series of blog posts and share some knowledge on how a &#8220;traditional&#8221; PostgreSQL user can extend his SQL knowledge and solve problems that would have been out of reach otherwise. <br>In this first part, I will talk about the basics, how it works and how you can play with it in a simple LAB DEMO. In the next parts we will dig a bit deeper into performance tuning, scalability, index tuning and internals of vector workflows in our beloved open source RDBMS.<br><br>As a reminder here is the <strong><a href=\"https:\/\/github.com\/boutaga\/Movies_pgvector_lab\">Git repo<\/a><\/strong> with the instructions and all the links for the other parts of this series. <br><a href=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part2-indexes\/\">Part 2 : pgvector, a guide for DBA \u2013 Part2 Indexes<\/a><br><a href=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part3-ai-agent-and-workflows\/\">Part 3 : pgvector, a guide for DBA \u2013 Part3 AI Agent and workflows<\/a><br>Part 4 : pgvector, a guide for DBA &#8211; Part 4 AI Workflows Best practices ( not released yet )<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-why\">Why ?<\/h2>\n\n\n\n<p>Before jumping into the LAB DEMO, let&#8217;s talk about why you would use pgvector. If you&#8217;re not familiar with this, we could say simply that pgvector is an extension of PostgreSQL that adds a vector data type, with facilities (indexes, operators, &#8230;) to store embeddings alongside standard known relational data. <br>Embeddings are mathematical representations of other objects (like text, images, videos,&#8230; etc) into multi-dimensional vectors. The more dimensions your vector has, the more you will be able to encapsulate &#8220;meanings&#8221; into it. This comes with a cost of complexity of the mathematical operations hence, more compute and token consumption. Embeddings are still kinda magic to me, the simplicity and the elegance of putting words meaning into numbers blows my mind. If you don&#8217;t know how it works then it&#8217;s magic \ud83d\ude42<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"274\" height=\"252\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/magic-unicorn.gif\" alt=\"\" class=\"wp-image-37130\" \/><\/figure>\n<\/div>\n\n\n<p>Here are a few use cases of vectors: similarity search, semantic text search, hybrid search, recommendation system, geospatial + vector search, RAG search, &#8230;<br>I would suggest looking at the <a href=\"https:\/\/www.youtube.com\/@pgeu\">PostgreSQLconf.eu Youtube Channel<\/a> and particularly this <a href=\"https:\/\/youtu.be\/n1Qaz4Qi64E?si=SWF_yuxn_L6vImXD\">talk<\/a> from Varun Dhawan&nbsp;which I had the chance to see live in Athens last October. <br><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-lab-demo\">LAB DEMO<\/h2>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-architecture\">Architecture<\/h3>\n\n\n\n<p>We have initially 2 sample databases, dvdrental and netflix_shows. The idea is to merge both databases into one the be able to query tables from both. This can be associated to our standard DWH process.<\/p>\n\n\n\n<p>Then we are going to create new fields on the film and the netflix_shows tables to host our embeddings (this is possible because of the pgvector extension).<br>Embeddings are mathematical representations of the movies descriptions in our case.<br>We are going to use a Python script to connect to OpenAI embedding generation model to inserts those in our tables. Then with another script, we will query those embedding and look for similarities.<br>The aim here is to propose to DVDRental&#8217;s users, Netflix shows they might like based on their rental activity. Similar processes can be found on numerous modern applications in order to propose something that would fit your taste or mood. This practical case can help you see the underlying complexity or in this case the simplicity of extending PostgreSQL. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"997\" height=\"1024\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-997x1024.png\" alt=\"\" class=\"wp-image-37081\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-997x1024.png 997w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-292x300.png 292w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-768x789.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-1495x1536.png 1495w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw-1993x2048.png 1993w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/Drawing-2025-02-07-14.26.40.excalidraw.png 2008w\" sizes=\"auto, (max-width: 997px) 100vw, 997px\" \/><\/figure>\n\n\n\n<p><br><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-installation-steps\">Installation steps <br><\/h3>\n\n\n\n<p><strong>Here are the steps to build your environment : <\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Provision a Linux server that will host your PostgreSQL instance.<\/li>\n\n\n\n<li>Install PostgreSQL 17.2 and pgvector 0.8.0 extension with it.<\/li>\n\n\n\n<li>Import both dvdrental and netflix_shows databases into one database (this makes it easier to query both sources at the same time).<\/li>\n\n\n\n<li>Add the vector fields in the target tables.<\/li>\n\n\n\n<li>On your user environment create the two python scripts to create the embeddings and to run the query search for similarity.<\/li>\n\n\n\n<li>Setup your Python virtual environment.<\/li>\n\n\n\n<li>Get an API key for your AI model, in my case OpenAI, may even host locally your own LLM.<\/li>\n\n\n\n<li>Run the script that creates the embeddings.<\/li>\n\n\n\n<li>Run the script that queries your vectors and look for similarities.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-installing-pgvector\">Installing pgvector<\/h3>\n\n\n\n<p>Link to the source and official documentation : <a href=\"https:\/\/github.com\/pgvector\/pgvector\">pgvector\/pgvector: Open-source vector similarity search for Postgres<\/a>.<\/p>\n\n\n\n<p><strong>Clone the Repository:<\/strong><br>Clone the pgvector extension from GitHub: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>cd \/tmp <\/code>\n<code>git clone https:\/\/github.com\/pgvector\/pgvector.git <\/code>\n<code>cd pgvector<\/code><\/code><\/pre>\n\n\n\n<p><strong>Build and Install:<\/strong><br>Use make to compile the extension and then install it: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>make <\/code>\n<code>sudo make install<\/code><\/code><\/pre>\n\n\n\n<p><strong>Verify Installation:<\/strong><br>In psql, connect to the dvdrental database and run: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>CREATE EXTENSION IF NOT EXISTS vector; <\/code>\n<code>SELECT * FROM pg_extension WHERE extname = 'vector';<\/code> \n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-import-data-and-prepare-the-database\">Import data and prepare the database. <\/h3>\n\n\n\n<p><strong>Download the Data:<\/strong><br>Visit <a href=\"https:\/\/www.kaggle.com\/datasets\/shivamb\/netflix-shows\">Kaggle\u2019s Netflix Shows dataset<\/a> and download the CSV file (e.g., <code>netflix_titles.csv<\/code>).<br>Visit <a href=\"https:\/\/neon.tech\/postgresql\/postgresql-getting-started\/postgresql-sample-database\">Neon\u2019s PostgreSQL Sample Database<\/a> and download the SQL dump file.<\/p>\n\n\n\n<p><strong>Create the Database and make it ready to receive the embeddings :<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>--psql \nCREATE DATABASE dvdrental;\n\n--bash\nunzip dvdrental.zip\n  \npg_restore -U postgres -d dvdrental dvdrental.tar\n\n\n--psql    \n\\c dvdrental  \n    \nCREATE TABLE netflix_shows (                                                                                        \n    show_id TEXT PRIMARY KEY,\n    type VARCHAR(20),\n    title TEXT,\n    director TEXT,\n    \"cast\" TEXT,\n    country TEXT,\n    date_added TEXT,      -- stored as text; later you can convert to DATE with TO_DATE if desired\n    release_year INTEGER,\n    rating VARCHAR(10),\n    duration VARCHAR(20),\n    listed_in TEXT,\n    description TEXT\n);\n\n--import from the csv file\nCOPY netflix_shows(show_id, type, title, director, \"cast\", country, date_added, release_year, rating, duration, listed_in, description)\nFROM '\/home\/postgres\/LAB_vector\/netflix_titles.csv'\nWITH (\n    FORMAT csv,\n    HEADER true,\n    DELIMITER ',',\n    NULL ''\n);\n\n    \nALTER TABLE film ADD COLUMN embedding vector(1536);\nALTER TABLE netflix_shows ADD COLUMN embedding vector(1536);\n    \nCREATE INDEX IF NOT EXISTS film_embedding_idx ON film USING hnsw (embedding vector_l2_ops);   \nCREATE INDEX IF NOT EXISTS netflix_shows_embedding_idx ON netflix_shows USING hnsw (embedding vector_l2_ops);\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-creating-the-embeddings-and-getting-the-first-recommendations\">Creating the embeddings and getting the first recommendations. <\/h2>\n\n\n\n<p>The execution of the create_emb.py script took around 10 minutes on the default batch size of 30 and OpenAI API dashboard showed an input of around 280 000 tokens (on my test plan this equates to $0.03).<\/p>\n\n\n\n<p><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n14:18:02 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] source pgvector_lab\/bin\/activate\n(pgvector_lab) 14:33:36 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] export DATABASE_URL=&quot;postgresql:\/\/postgres@localhost\/dvdrental&quot;\n(pgvector_lab) 14:33:46 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] export OPENAI_API_KEY=&quot;sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&quot;\n(pgvector_lab) 14:34:34 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] python create_emb.py\nUpdating film embeddings...\nProcessing 1000 rows from film in 34 batches (batch size = 30)...\nUpdated batch for film: &#x5B;11, 382, 400, 418, 426, 436, 448, 463, 621, 622, 474, 626, 641, 493, 511, 528, 530, 546, 12, 25, 33, 26, 566, 568, 585, 597, 604, 664, 678, 697]\nUpdated batch for film: &#x5B;745, 752, 771, 787, 806, 213, 824, 842, 859, 876, 54, 932, 949, 950, 951, 969, 976, 116, 143, 170, 198, 293, 15, 133, 58, 60, 61, 63, 66, 52]\nUpdated batch for film: &#x5B;404, 500, 654, 914, 717, 808, 5, 234, 353, 4, 456, 484, 659, 679, 896, 8, 87, 89, 90, 91, 93, 94, 95, 96, 1, 2, 3, 131, 132, 134]\nUpdated batch for film: &#x5B;135, 6, 7, 9, 10, 164, 13, 179, 14, 98, 28, 218, 232, 233, 217, 294, 295, 296, 297, 298, 299, 307, 308, 291, 29, 31, 267, 268, 269, 270]\nUpdated batch for film: &#x5B;271, 255, 253, 32, 16, 27, 292, 300, 301, 302, 303, 304, 305, 306, 17, 19, 331, 332, 333, 20, 347, 346, 374, 375, 376, 377, 379, 380, 381, 365]\nUpdated batch for film: &#x5B;385, 386, 387, 388, 389, 390, 21, 685, 22, 397, 398, 399, 383, 23, 24, 39, 450, 431, 432, 433, 434, 435, 419, 40, 41, 42, 466, 467, 468, 469]\nUpdated batch for film: &#x5B;470, 471, 472, 265, 43, 45, 496, 499, 501, 502, 503, 47, 50, 34, 563, 564, 565, 547, 38, 36, 596, 598, 599, 600, 601, 602, 603, 586, 44, 616]\nUpdated batch for film: &#x5B;46, 631, 632, 633, 634, 635, 636, 639, 48, 49, 37, 665, 666, 667, 55, 683, 680, 726, 727, 728, 729, 736, 737, 738, 739, 740, 741, 742, 743, 744]\nUpdated batch for film: &#x5B;746, 56, 57, 59, 730, 731, 732, 733, 62, 464, 64, 765, 766, 767, 768, 769, 770, 754, 65, 67, 797, 798, 799, 800, 801, 802, 803, 126, 53, 71]\nUpdated batch for film: &#x5B;830, 831, 832, 833, 73, 74, 77, 78, 894, 895, 877, 79, 80, 926, 927, 928, 929, 930, 931, 915, 81, 82, 83, 960, 961, 962, 963, 964, 965, 966]\nUpdated batch for film: &#x5B;84, 85, 72, 70, 75, 997, 998, 999, 1000, 76, 69, 99, 101, 102, 103, 104, 92, 100, 97, 88, 107, 108, 109, 110, 111, 112, 113, 114, 115, 117]\nUpdated batch for film: &#x5B;118, 119, 120, 121, 122, 106, 127, 128, 129, 130, 154, 136, 137, 138, 139, 140, 141, 125, 124, 144, 145, 147, 148, 149, 150, 151, 152, 153, 155, 157]\nUpdated batch for film: &#x5B;158, 159, 160, 146, 156, 163, 184, 165, 166, 167, 168, 169, 171, 172, 173, 174, 175, 176, 177, 162, 180, 181, 182, 183, 185, 186, 187, 188, 189, 191]\nUpdated batch for film: &#x5B;192, 193, 194, 195, 196, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 214, 215, 219, 220, 221, 222, 223, 224, 225, 226, 227]\nUpdated batch for film: &#x5B;228, 229, 230, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 235, 256, 257, 258, 259, 260, 261, 262, 263, 264, 266]\nUpdated batch for film: &#x5B;275, 276, 277, 278, 279, 280, 309, 327, 345, 364, 378, 281, 282, 702, 716, 734, 283, 284, 285, 878, 384, 286, 287, 288, 274, 273, 311, 312, 313, 314]\nUpdated batch for film: &#x5B;315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 310, 329, 328, 330, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 348, 349, 350]\nUpdated batch for film: &#x5B;351, 352, 554, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 366, 367, 368, 369, 370, 371, 372, 373, 391, 392, 393, 394, 449, 395, 396, 402, 403]\nUpdated batch for film: &#x5B;405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 401, 420, 421, 422, 423, 424, 425, 427, 428, 429, 430, 438, 439, 440, 441, 442, 443]\nUpdated batch for film: &#x5B;444, 445, 446, 447, 451, 452, 453, 454, 455, 437, 458, 459, 460, 461, 462, 465, 473, 457, 476, 477, 478, 479, 480, 481, 482, 483, 485, 486, 487, 488]\nUpdated batch for film: &#x5B;489, 490, 491, 492, 475, 495, 497, 498, 504, 505, 753, 506, 507, 508, 509, 510, 494, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 560, 524]\nUpdated batch for film: &#x5B;525, 526, 527, 512, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 561, 542, 543, 544, 545, 529, 548, 549, 550, 551, 552, 553, 555, 556, 557]\nUpdated batch for film: &#x5B;558, 559, 562, 569, 570, 580, 581, 582, 583, 584, 567, 571, 572, 573, 574, 575, 576, 577, 578, 579, 587, 588, 589, 590, 591, 592, 593, 594, 595, 606]\nUpdated batch for film: &#x5B;607, 608, 609, 610, 611, 612, 613, 614, 615, 617, 618, 619, 620, 605, 624, 625, 627, 628, 629, 630, 637, 638, 640, 623, 643, 644, 645, 646, 647, 648]\nUpdated batch for film: &#x5B;649, 650, 651, 652, 653, 655, 656, 657, 658, 642, 661, 662, 663, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 660, 681, 682, 684, 686, 687, 688]\nUpdated batch for film: &#x5B;689, 690, 691, 692, 693, 694, 695, 696, 699, 700, 701, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 698, 718, 719, 720, 721, 722]\nUpdated batch for film: &#x5B;723, 724, 725, 747, 748, 749, 750, 751, 735, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782]\nUpdated batch for film: &#x5B;783, 784, 785, 786, 788, 790, 791, 792, 793, 794, 795, 796, 804, 805, 789, 809, 810, 817, 818, 819, 820, 821, 822, 823, 811, 812, 813, 814, 815, 816]\nUpdated batch for film: &#x5B;807, 826, 827, 828, 829, 834, 835, 836, 837, 838, 839, 840, 841, 825, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 890, 854, 855, 856, 857, 858]\nUpdated batch for film: &#x5B;843, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 891, 872, 873, 874, 875, 860, 879, 880, 905, 881, 882, 883, 884, 885, 886, 887, 888, 889]\nUpdated batch for film: &#x5B;892, 893, 898, 899, 900, 901, 902, 903, 904, 906, 907, 908, 909, 910, 911, 912, 913, 897, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 934, 935]\nUpdated batch for film: &#x5B;936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 933, 953, 954, 955, 956, 957, 958, 959, 967, 968, 952, 971, 972, 973, 974, 975, 977]\nUpdated batch for film: &#x5B;978, 979, 992, 980, 981, 982, 983, 984, 985, 986, 970, 993, 994, 995, 996, 987, 989, 990, 988, 991, 18, 30, 35, 51, 68, 86, 105, 123, 142, 161]\nUpdated batch for film: &#x5B;178, 190, 197, 216, 231, 252, 254, 272, 289, 290]\nUpdating netflix_shows embeddings...\nProcessing 8807 rows from netflix_shows in 294 batches (batch size = 30)...\nUpdated batch for netflix_shows: &#x5B;&#039;s2236&#039;, &#039;s1398&#039;, &#039;s1399&#039;, &#039;s1400&#039;, &#039;s1401&#039;, &#039;s1403&#039;, &#039;s1404&#039;, &#039;s1405&#039;, &#039;s2484&#039;, &#039;s2485&#039;, &#039;s2486&#039;, &#039;s2487&#039;, &#039;s2488&#039;, &#039;s2489&#039;, &#039;s2490&#039;, &#039;s2492&#039;, &#039;s2494&#039;, &#039;s2495&#039;, &#039;s1406&#039;, &#039;s31&#039;, &#039;s32&#039;, &#039;s33&#039;, &#039;s34&#039;, &#039;s35&#039;, &#039;s36&#039;, &#039;s1407&#039;, &#039;s1408&#039;, &#039;s1409&#039;, &#039;s1410&#039;, &#039;s1411&#039;]\nUpdated batch for netflix_shows: &#x5B;&#039;s1412&#039;, &#039;s1413&#039;, &#039;s1414&#039;, &#039;s1415&#039;, &#039;s2253&#039;, &#039;s1417&#039;, &#039;s1418&#039;, &#039;s1423&#039;,\n....\n\n\n(pgvector_lab) 14:44:08 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] python recommend_netflix.py\nEnter customer id: 524\n\nTop 5 Netflix recommendations for customer 524:\n1. Alarmoty in the Land of Fire (similarity distance: 0.4505)\n2. Baaghi (similarity distance: 0.4518)\n3. Into the Badlands (similarity distance: 0.4526)\n4. Antidote (similarity distance: 0.4527)\n5. Duplicate (similarity distance: 0.4553)\n(pgvector_lab) 14:46:16 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17] python recommend_netflix.py\nEnter customer id: 10\n\nTop 5 Netflix recommendations for customer 10:\n1. Antidote (similarity distance: 0.4406)\n2. Krutant (similarity distance: 0.4441)\n3. Kung Fu Yoga (similarity distance: 0.4450)\n4. Hunt for the Wilderpeople (similarity distance: 0.4469)\n5. The Truth (similarity distance: 0.4476)\n(pgvector_lab) 14:46:21 postgres@PG1:\/home\/postgres\/LAB_vector\/ &#x5B;PG17]\n<\/pre><\/div>\n\n\n<p><br>Once the embeddings are created and inserted into the tables, we can check if the recommendations works. In my case I use the text-embedding-ada-002 embedding model in the inference process part and then I use pgvector to look for similarities based in SQL. <br>This has the advantages of being very simple, cost-effective and really powerful at scale although you have to develop your own application part. You could use Chatpgt to replace your application at the cost of doubling the inference.   <br><br><br>That&#8217;s it for now. If you want to have access to the source files, scripts I used and full procedure, you can check this repository : <a href=\"https:\/\/github.com\/boutaga\/Movies_pgvector_lab\">Movies_pgvector_lab.<\/a> <br><br>Enjoy !<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction The first release of pgvector occurred in April 2021. Since then a lot has changed and is still changing. Even though we could say that the level of maturity of AI stacks and usage is currently really amazing, things are still changing fast and will probably continue to do so for a while. This [&hellip;]<\/p>\n","protected":false},"author":153,"featured_media":37151,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[83],"tags":[3524,2832,3523,77],"type_dbi":[2749],"class_list":["post-37080","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-ai-ml","tag-openai","tag-pgvector","tag-postgresql","type-postgresql"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>pgvector, a guide for DBA - Part1: LAB DEMO - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"pgvector, a guide for DBA - Part1: LAB DEMO\" \/>\n<meta property=\"og:description\" content=\"Introduction The first release of pgvector occurred in April 2021. Since then a lot has changed and is still changing. Even though we could say that the level of maturity of AI stacks and usage is currently really amazing, things are still changing fast and will probably continue to do so for a while. This [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-09T19:19:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-20T15:25:30+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"873\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Adrien Obernesser\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrien Obernesser\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/\"},\"author\":{\"name\":\"Adrien Obernesser\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"headline\":\"pgvector, a guide for DBA &#8211; Part1: LAB DEMO\",\"datePublished\":\"2025-02-09T19:19:22+00:00\",\"dateModified\":\"2025-03-20T15:25:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/\"},\"wordCount\":1029,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/02\\\/elephant-1822636_1280.jpg\",\"keywords\":[\"AI\\\/ML\",\"openai\",\"pgvector\",\"PostgreSQL\"],\"articleSection\":[\"PostgreSQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/\",\"name\":\"pgvector, a guide for DBA - Part1: LAB DEMO - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/02\\\/elephant-1822636_1280.jpg\",\"datePublished\":\"2025-02-09T19:19:22+00:00\",\"dateModified\":\"2025-03-20T15:25:30+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/02\\\/elephant-1822636_1280.jpg\",\"contentUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2025\\\/02\\\/elephant-1822636_1280.jpg\",\"width\":1280,\"height\":873},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/pgvector-a-guide-for-dba-part1-lab-demo\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"pgvector, a guide for DBA &#8211; Part1: LAB DEMO\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/fd2ab917212ce0200c7618afaa7fdbcd\",\"name\":\"Adrien Obernesser\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g\",\"caption\":\"Adrien Obernesser\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/adrienobernesser\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"pgvector, a guide for DBA - Part1: LAB DEMO - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/","og_locale":"en_US","og_type":"article","og_title":"pgvector, a guide for DBA - Part1: LAB DEMO","og_description":"Introduction The first release of pgvector occurred in April 2021. Since then a lot has changed and is still changing. Even though we could say that the level of maturity of AI stacks and usage is currently really amazing, things are still changing fast and will probably continue to do so for a while. This [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/","og_site_name":"dbi Blog","article_published_time":"2025-02-09T19:19:22+00:00","article_modified_time":"2025-03-20T15:25:30+00:00","og_image":[{"width":1280,"height":873,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg","type":"image\/jpeg"}],"author":"Adrien Obernesser","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adrien Obernesser","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/"},"author":{"name":"Adrien Obernesser","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"headline":"pgvector, a guide for DBA &#8211; Part1: LAB DEMO","datePublished":"2025-02-09T19:19:22+00:00","dateModified":"2025-03-20T15:25:30+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/"},"wordCount":1029,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg","keywords":["AI\/ML","openai","pgvector","PostgreSQL"],"articleSection":["PostgreSQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/","url":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/","name":"pgvector, a guide for DBA - Part1: LAB DEMO - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg","datePublished":"2025-02-09T19:19:22+00:00","dateModified":"2025-03-20T15:25:30+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/02\/elephant-1822636_1280.jpg","width":1280,"height":873},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/pgvector-a-guide-for-dba-part1-lab-demo\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"pgvector, a guide for DBA &#8211; Part1: LAB DEMO"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/fd2ab917212ce0200c7618afaa7fdbcd","name":"Adrien Obernesser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dc9316c729e50107159e0a1e631b9c1742ce8898576887d0103c83b1ca3bc9e6?s=96&d=mm&r=g","caption":"Adrien Obernesser"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/adrienobernesser\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/37080","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/153"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=37080"}],"version-history":[{"count":58,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/37080\/revisions"}],"predecessor-version":[{"id":37776,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/37080\/revisions\/37776"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/37151"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=37080"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=37080"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=37080"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=37080"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}