{"id":14401,"date":"2020-07-05T19:45:59","date_gmt":"2020-07-05T17:45:59","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/"},"modified":"2023-07-13T16:19:59","modified_gmt":"2023-07-13T14:19:59","slug":"the-myth-of-nosql-vs-rdbms-joins-dont-scale","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/","title":{"rendered":"The myth of NoSQL (vs. RDBMS) &#8220;joins dont scale&#8221;"},"content":{"rendered":"<h2>By Franck Pachot<\/h2>\n<p>.<br \/>\nI&#8217;ll reference <a href=\"https:\/\/twitter.com\/alexbdebrie\" target=\"_blank\" rel=\"noopener noreferrer\">Alex DeBrie<\/a> article &#8220;<a href=\"https:\/\/www.alexdebrie.com\/posts\/dynamodb-no-bad-queries\/#sql-joins-have-bad-time-complexity\" target=\"_blank\" rel=\"noopener noreferrer\">SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don&#8217;t<\/a>&#8220;, especially the paragraph about <em>&#8220;Why relational databases don\u2019t scale&#8221;<\/em>. But I want to make clear that my post here is not against this article, but against a very common myth that even precedes NoSQL databases. Actually, I&#8217;m taking this article as reference because the author, in his <a href=\"https:\/\/www.dynamodbguide.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">website<\/a> and <a href=\"https:\/\/www.dynamodbbook.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">book<\/a>, has really good points about data modeling in NoSQL. And because AWS DynamoDB is probably the strongest NoSQL database today. I&#8217;m challenging some widespread assertions and better do it based on good quality content and product.<\/p>\n<p>There are many use cases where NoSQL is a correct alternative, but moving from a relational database system to a key-value store because you heard that &#8220;joins don&#8217;t scale&#8221; is probably not a good reason. You should choose a solution for the features it offers (the problem it solves), and not because you ignore what you current platform is capable of.<\/p>\n<h3><em>time complexity of joins<\/em><\/h3>\n<p>The idea in this article, taken from the popular Rick Houlihan talk, is that, by joining tables, you read more data. And that it is a CPU intensive operation. And the time complexity of joins is <em>&#8220;( O (M + N) ) or worse&#8221;<\/em> according to Alex DeBrie article or &#8220;<em>O(log(N))+Nlog(M)<\/em>&#8221; according to Rick Houlihan slide. This makes reference to &#8220;time complexity&#8221; and &#8220;Big O&#8221; notation. You can read about it. But this supposes that the cost of a join depends on the size of the tables involved (represented by N and M). That would be right with non-partitioned table full scan. But all relational databases come with B*Tree indexes. And when we compare to a key-value store we are obviously retrieving few rows and the join method will be a Nested Loop with Index Range Scan. This access method is actually not dependent at all on the size of the tables. That&#8217;s why relational database are the king of OTLP applications.<\/p>\n<h3>Let&#8217;s test it<\/h3>\n<p>The article claims that <em>&#8220;there\u2019s one big problem with relational databases: performance is like a black box. There are a ton of factors that impact how quickly your queries will return.&#8221;<\/em> with an example on a query like: <em>&#8220;SELECT * FROM orders JOIN users ON &#8230; WHERE user.id = &#8230; GROUP BY &#8230; LIMIT 20&#8221;<\/em>. It says that <em>&#8220;As the size of your tables grow, these operations will get slower and slower.&#8221;<\/em><\/p>\n<p>I will build those tables, in PostgreSQL here, because that&#8217;s my preferred Open Source RDBMS, and show that:<\/p>\n<ul>\n<li>Performance is <strong>not<\/strong> a black box: all RDBMS have an EXPLAIN command that display exactly the algorithm used (even CPU and memory access) and you can estimate the cost easily from it.<\/li>\n<li>Join and Group By here do <strong>not<\/strong> depend on the size of the table at all but only the rows selected. You will multiply your tables by several orders of magnitude before the number of rows impact the response time by a 1\/10s of millisecond.<\/li>\n<\/ul>\n<h3>Create the tables<\/h3>\n<p>I will create small tables here. Because when reading the execution plan, I don&#8217;t need large tables to estimate the cost and the response-time scalability. You can run the same with larger tables if you want to see how it scales. And then maybe investigate further the features for big tables (partitioning, parallel query,&#8230;). But let&#8217;s start with very simple tables without specific optimization.<\/p>\n<pre><code>\n\\timing\ncreate table USERS ( USER_ID bigserial primary key, FIRST_NAME text, LAST_NAME text)\n;\nCREATE TABLE\ncreate table ORDERS ( ORDER_ID bigserial primary key, ORDER_DATE timestamp, AMOUNT numeric(10,2), USER_ID bigint, DESCRIPTION text)\n;<\/code><\/pre>\n<p>I have created the two tables that I&#8217;ll join. Both with an auto-generated primary key for simplicity.<\/p>\n<h3>Insert test data<\/h3>\n<p>It is important to build the table as they would be in real life. USERS probably come without specific order. ORDERS come by date. This means that ORDERS from one USER are scattered throughout the table. The query would be much faster with clustered data, and each RDBMS has some ways to achieve this, but I want to show the cost of joining two tables here without specific optimisation.<\/p>\n<pre><code>\ninsert into  USERS (FIRST_NAME, LAST_NAME)\n with\n  random_words as (\n\n  select generate_series id,\n   translate(md5(random()::text),'-0123456789','aeioughij') as word\n\n  from generate_series(1,100)\n\n  )\n select\n  words1.word ,words2.word\n\n  from\n   random_words words1\n  cross join\n   random_words words2\n  order by words1.id+words2.id\n\n;\nINSERT 0 10000\nTime: 28.641 ms\n\nselect * from USERS order by user_id fetch first 10 rows only;\n user_id |           first_name           |           last_name\n---------+--------------------------------+--------------------------------\n       1 | iooifgiicuiejiaeduciuccuiogib  | iooifgiicuiejiaeduciuccuiogib\n       2 | iooifgiicuiejiaeduciuccuiogib  | dgdfeeiejcohfhjgcoigiedeaubjbg\n       3 | dgdfeeiejcohfhjgcoigiedeaubjbg | iooifgiicuiejiaeduciuccuiogib\n       4 | ueuedijchudifefoedbuojuoaudec  | iooifgiicuiejiaeduciuccuiogib\n       5 | iooifgiicuiejiaeduciuccuiogib  | ueuedijchudifefoedbuojuoaudec\n       6 | dgdfeeiejcohfhjgcoigiedeaubjbg | dgdfeeiejcohfhjgcoigiedeaubjbg\n       7 | iooifgiicuiejiaeduciuccuiogib  | jbjubjcidcgugubecfeejidhoigdob\n       8 | jbjubjcidcgugubecfeejidhoigdob | iooifgiicuiejiaeduciuccuiogib\n       9 | ueuedijchudifefoedbuojuoaudec  | dgdfeeiejcohfhjgcoigiedeaubjbg\n      10 | dgdfeeiejcohfhjgcoigiedeaubjbg | ueuedijchudifefoedbuojuoaudec\n(10 rows)\n\nTime: 0.384 ms\n<\/code><\/pre>\n<p>This generated 10000 users in the USERS table with random names.<\/p>\n<pre><code>\ninsert into ORDERS ( ORDER_DATE, AMOUNT, USER_ID, DESCRIPTION)\n with\n  random_amounts as (\n      ------------&gt; this generates 10 random order amounts\n  select\n   1e6*random() as AMOUNT\n  from generate_series(1,10)\n )\n ,random_dates as (\n      ------------&gt; this generates 10 random order dates\n  select\n   now() - random() * interval '1 year' as ORDER_DATE\n  from generate_series(1,10)\n )\nselect ORDER_DATE, AMOUNT, USER_ID, md5(random()::text) DESCRIPTION from\n  random_dates\n cross join\n  random_amounts\n cross join\n  users\n order by ORDER_DATE\n      ------------&gt; I sort this by date because that's what happens in real life. Clustering by users would not be a fair test.\n;\nINSERT 0 1000000\nTime: 4585.767 ms (00:04.586)\n<\/code><\/pre>\n<p>I have now one million orders generated as 100 orders for each users in the last year. You can play with the numbers to generate more and see how it scales. This is fast: 5 seconds to generate 1 million orders here, and there are many way to load faster if you feel the need to test on very large data sets. But the point is not there.<\/p>\n<pre><code>\ncreate index ORDERS_BY_USER on ORDERS(USER_ID);\nCREATE INDEX\nTime: 316.322 ms\nalter table ORDERS add constraint ORDERS_BY_USER foreign key (USER_ID) references USERS;\nALTER TABLE\nTime: 194.505 ms\n<\/code><\/pre>\n<p>As, in my data model, I want to navigate from USERS to ORDERS I add an index for it. And I declare the referential integrity to avoid logical corruptions in case there is a bug in my application, or a human mistake during some ad-hoc fixes. As soon as the constraint is declared, I have no need to check this assertion in my code anymore. The constraint also informs the query planner know about this one-to-many relationship as it may open some optimizations. That&#8217;s the relational database beauty: we declare those things rather than having to code their implementation.<\/p>\n<pre><code>\nvacuum analyze;\nVACUUM\nTime: 429.165 ms\n\nselect version();\n                                                 version\n---------------------------------------------------------------------------------------------------------\n PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit\n(1 row)\n<\/code><\/pre>\n<p>I&#8217;m in PostgreSQL, version 12, on a Linux box with 2 CPUs only. I&#8217;ve run a manual VACUUM to get a reproducible testcase rather than relying on autovacuum kicking-in after those table loading.<\/p>\n<pre><code>\nselect relkind,relname,reltuples,relpages,lpad(pg_size_pretty(relpages::bigint*8*1024),20) \"relpages * 8k\" from pg_class natural join (select oid relnamespace,nspname from pg_namespace) nsp where nspname='public' order by relpages;\n\n relkind |       relname       | reltuples | relpages |    relpages * 8k\n---------+---------------------+-----------+----------+----------------------\n S       | orders_order_id_seq |         1 |        1 |           8192 bytes\n S       | users_user_id_seq   |         1 |        1 |           8192 bytes\n i       | users_pkey          |     10000 |       30 |               240 kB\n r       | users               |     10000 |      122 |               976 kB\n i       | orders_pkey         |     1e+06 |     2745 |                21 MB\n i       | orders_by_user      |     1e+06 |     2749 |                21 MB\n r       | orders              |     1e+06 |    13334 |               104 MB\n<\/code><\/pre>\n<p>Here are the sizes of my tables: The 1 million orders take 104MB and the 10 thousand users is 1MB. I&#8217;ll increase it later, but I don&#8217;t need this to understand the performance predictability.<\/p>\n<p>When understanding the size, remember that the size of the table is only the size of data. Metadata (like column names) are not repeated for each row. They are stored once in the RDBMS catalog describing the table. As PostgreSQL has a native JSON datatype you can also test with a non-relational model here.<\/p>\n<h3>Execute the query<\/h3>\n<pre><code>\nselect order_month,sum(amount),count(*)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\n\n     order_month     |     sum     | count\n---------------------+-------------+-------\n 2019-11-01 00:00:00 |  5013943.57 |    10\n 2019-09-01 00:00:00 |  5013943.57 |    10\n 2020-04-01 00:00:00 |  5013943.57 |    10\n 2019-08-01 00:00:00 | 15041830.71 |    30\n 2020-02-01 00:00:00 | 10027887.14 |    20\n 2020-06-01 00:00:00 |  5013943.57 |    10\n 2019-07-01 00:00:00 |  5013943.57 |    10\n(7 rows)\n\nTime: 2.863 ms\n<\/code><\/pre>\n<p>Here is the query mentioned in the article: get all orders for one user, and do some aggregation on them. Looking at the time is not really interesting as it depends on the RAM to cache the buffers at database or filesystem level, and disk latency. But we are developers and, by looking at the operations and loops that were executed, we can understand how it scales and where we are in the &#8220;time complexity&#8221; and &#8220;Big O&#8221; order of magnitude.<\/p>\n<h3>Explain the execution<\/h3>\n<p>This is as easy as running the query with EXPLAIN:<\/p>\n<pre><code>\nexplain (analyze,verbose,costs,buffers)\nselect order_month,sum(amount)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\n                                                                   QUERY PLAN\n\n------------------------------------------------------------------------------------------------------------------------------------------------\n GroupAggregate  (cost=389.34..390.24 rows=10 width=40) (actual time=0.112..0.136 rows=7 loops=1)\n   Output: (date_trunc('month'::text, orders.order_date)), sum(orders.amount)\n   Group Key: (date_trunc('month'::text, orders.order_date))\n   Buffers: shared hit=19\n   -&gt;  Sort  (cost=389.34..389.59 rows=100 width=17) (actual time=0.104..0.110 rows=100 loops=1)\n         Output: (date_trunc('month'::text, orders.order_date)), orders.amount\n         Sort Key: (date_trunc('month'::text, orders.order_date))\n         Sort Method: quicksort  Memory: 32kB\n         Buffers: shared hit=19\n         -&gt;  Nested Loop  (cost=5.49..386.02 rows=100 width=17) (actual time=0.022..0.086 rows=100 loops=1)\n               Output: date_trunc('month'::text, orders.order_date), orders.amount\n               Buffers: shared hit=19\n               -&gt;  Index Only Scan using users_pkey on public.users  (cost=0.29..4.30 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)\n                     Output: users.user_id\n                     Index Cond: (users.user_id = 42)\n                     Heap Fetches: 0\n                     Buffers: shared hit=3\n               -&gt;  Bitmap Heap Scan on public.orders  (cost=5.20..380.47 rows=100 width=25) (actual time=0.012..0.031 rows=100 loops=1)\n                     Output: orders.order_id, orders.order_date, orders.amount, orders.user_id, orders.description\n                     Recheck Cond: (orders.user_id = 42)\n                     Heap Blocks: exact=13\n                     Buffers: shared hit=16\n                     -&gt;  Bitmap Index Scan on orders_by_user  (cost=0.00..5.17 rows=100 width=0) (actual time=0.008..0.008 rows=100 loops=1)\n                           Index Cond: (orders.user_id = 42)\n                           Buffers: shared hit=3\n Planning Time: 0.082 ms\n Execution Time: 0.161 ms\n<\/code><\/pre>\n<p>Here is the execution plan with the execution statistics. We have read 3 pages from USERS to get our USER_ID (Index Only Scan) and navigated, with Nested Loop, to ORDERS and get the 100 orders for this user. This has read 3 pages from the index (Bitmap Index Scan) and 13 pages from the table. That&#8217;s a total of 19 pages. Those pages are 8k blocks.<\/p>\n<h3>4x initial size scales with same performance: 19 page reads<\/h3>\n<p>Let&#8217;s increase the size of the tables by a factor 4. I change the &#8220;generate_series(1,100)&#8221; to &#8220;generate_series(1,200)&#8221; in the USERS generations and run the same. That generates 40000 users and 4 million orders.<\/p>\n<pre><code>\nselect relkind,relname,reltuples,relpages,lpad(pg_size_pretty(relpages::bigint*8*1024),20) \"relpages * 8k\" from pg_class natural join (select\noid relnamespace,nspname from pg_namespace) nsp where nspname='public' order by relpages;\n\n relkind |       relname       |  reltuples   | relpages |    relpages * 8k\n---------+---------------------+--------------+----------+----------------------\n S       | orders_order_id_seq |            1 |        1 |           8192 bytes\n S       | users_user_id_seq   |            1 |        1 |           8192 bytes\n i       | users_pkey          |        40000 |      112 |               896 kB\n r       | users               |        40000 |      485 |              3880 kB\n i       | orders_pkey         |        4e+06 |    10969 |                86 MB\n i       | orders_by_user      |        4e+06 |    10985 |                86 MB\n r       | orders              | 3.999995e+06 |    52617 |               411 MB\n\nexplain (analyze,verbose,costs,buffers)\nselect order_month,sum(amount)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\nexplain (analyze,verbose,costs,buffers)\nselect order_month,sum(amount)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\n                                                                   QUERY PLAN                                                                 \n------------------------------------------------------------------------------------------------------------------------------------------------\n GroupAggregate  (cost=417.76..418.69 rows=10 width=40) (actual time=0.116..0.143 rows=9 loops=1)\n   Output: (date_trunc('month'::text, orders.order_date)), sum(orders.amount)\n   Group Key: (date_trunc('month'::text, orders.order_date))\n   Buffers: shared hit=19\n   -&gt;  Sort  (cost=417.76..418.02 rows=104 width=16) (actual time=0.108..0.115 rows=100 loops=1)\n         Output: (date_trunc('month'::text, orders.order_date)), orders.amount\n         Sort Key: (date_trunc('month'::text, orders.order_date))\n         Sort Method: quicksort  Memory: 32kB\n         Buffers: shared hit=19\n         -&gt;  Nested Loop  (cost=5.53..414.27 rows=104 width=16) (actual time=0.044..0.091 rows=100 loops=1)\n               Output: date_trunc('month'::text, orders.order_date), orders.amount\n               Buffers: shared hit=19\n               -&gt;  Index Only Scan using users_pkey on public.users  (cost=0.29..4.31 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1)\n                     Output: users.user_id\n                     Index Cond: (users.user_id = 42)\n                     Heap Fetches: 0\n                     Buffers: shared hit=3\n               -&gt;  Bitmap Heap Scan on public.orders  (cost=5.24..408.66 rows=104 width=24) (actual time=0.033..0.056 rows=100 loops=1)\n                     Output: orders.order_id, orders.order_date, orders.amount, orders.user_id, orders.description\n                     Recheck Cond: (orders.user_id = 42)\n                     Heap Blocks: exact=13\n                     Buffers: shared hit=16\n                     -&gt;  Bitmap Index Scan on orders_by_user  (cost=0.00..5.21 rows=104 width=0) (actual time=0.029..0.029 rows=100 loops=1)\n                           Index Cond: (orders.user_id = 42)\n                           Buffers: shared hit=3\n Planning Time: 0.084 ms\n Execution Time: 0.168 ms\n(27 rows)\n\nTime: 0.532 ms\n<\/code><\/pre>\n<p>Look how we scaled here: I multiplied the size of the tables by 4x and I have exactly the same cost: 3 pages from each index and 13 from the table. This is the beauty of B*Tree: the cost depends only on the height of the tree and not the size of the table. And you can increase the table exponentially before the height of the index increases.<\/p>\n<h3>16x initial size scales with 21\/19=1.1 cost factor<\/h3>\n<p>Let&#8217;s go further and x4 the size of the tables again. I run with &#8220;generate_series(1,400)&#8221; in the USERS generations and run the same. That generates 160 thousand users and 16 million orders.<\/p>\n<pre><code>\nselect relkind,relname,reltuples,relpages,lpad(pg_size_pretty(relpages::bigint*8*1024),20) \"relpages * 8k\" from pg_class natural join (select\noid relnamespace,nspname from pg_namespace) nsp where nspname='public' order by relpages;\n relkind |       relname       |  reltuples   | relpages |    relpages * 8k\n---------+---------------------+--------------+----------+----------------------\n S       | orders_order_id_seq |            1 |        1 |           8192 bytes\n S       | users_user_id_seq   |            1 |        1 |           8192 bytes\n i       | users_pkey          |       160000 |      442 |              3536 kB\n r       | users               |       160000 |     1925 |                15 MB\n i       | orders_pkey         |      1.6e+07 |    43871 |               343 MB\n i       | orders_by_user      |      1.6e+07 |    43935 |               343 MB\n r       | orders              | 1.600005e+07 |   213334 |              1667 MB\n\nexplain (analyze,verbose,costs,buffers)\nselect order_month,sum(amount)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\n                                                                   QUERY PLAN                                                                 \n------------------------------------------------------------------------------------------------------------------------------------------------\n GroupAggregate  (cost=567.81..569.01 rows=10 width=40) (actual time=0.095..0.120 rows=8 loops=1)\n   Output: (date_trunc('month'::text, orders.order_date)), sum(orders.amount)\n   Group Key: (date_trunc('month'::text, orders.order_date))\n   Buffers: shared hit=21\n   -&gt;  Sort  (cost=567.81..568.16 rows=140 width=17) (actual time=0.087..0.093 rows=100 loops=1)\n         Output: (date_trunc('month'::text, orders.order_date)), orders.amount\n         Sort Key: (date_trunc('month'::text, orders.order_date))\n         Sort Method: quicksort  Memory: 32kB\n         Buffers: shared hit=21\n         -&gt;  Nested Loop  (cost=6.06..562.82 rows=140 width=17) (actual time=0.026..0.070 rows=100 loops=1)\n               Output: date_trunc('month'::text, orders.order_date), orders.amount\n               Buffers: shared hit=21\n               -&gt;  Index Only Scan using users_pkey on public.users  (cost=0.42..4.44 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)\n                     Output: users.user_id\n                     Index Cond: (users.user_id = 42)\n                     Heap Fetches: 0\n                     Buffers: shared hit=4\n               -&gt;  Bitmap Heap Scan on public.orders  (cost=5.64..556.64 rows=140 width=25) (actual time=0.014..0.035 rows=100 loops=1)\n                     Output: orders.order_id, orders.order_date, orders.amount, orders.user_id, orders.description\n                     Recheck Cond: (orders.user_id = 42)\n                     Heap Blocks: exact=13\n                     Buffers: shared hit=17\n                     -&gt;  Bitmap Index Scan on orders_by_user  (cost=0.00..5.61 rows=140 width=0) (actual time=0.010..0.010 rows=100 loops=1)\n                           Index Cond: (orders.user_id = 42)\n                           Buffers: shared hit=4\n Planning Time: 0.082 ms\n Execution Time: 0.159 ms\n<\/code><\/pre>\n<p>Yes, by increasing the table size a lot the index may have to split the root block to add a new branch level. The consequence is minimal: two additional pages to read here. The additional execution time is in tens of microseconds when cached in RAM, up to milliseconds if from mechanical disk. With 10x or 100x larger tables, you will do some physical I\/O and only the top index branches will be in memory. You can expect about about 20 I\/O calls then. With SSD on Fiber Channel (like with AWS RDS PostgreSQL for example), this will still be in single-digit millisecond. The cheapest AWS EBS Provisioned IOPS for RDS is 1000 IOPS then you can estimate the number of queries per second you can run. The number of pages here (&#8220;Buffers&#8221;) is the right metric to estimate the cost of the query. To be compared with DynamoDB RCU.<\/p>\n<h3>64x initial size scales with 23\/19=1.2 cost factor<\/h3>\n<p>Last test for me, but you can go further, I change to &#8220;generate_series(1,800)&#8221; in the USERS generations and run the same. That generates 640 thousand users and 64 million orders.<\/p>\n<pre><code>\n\nselect relkind,relname,reltuples,relpages,lpad(pg_size_pretty(relpages::bigint*8*1024),20) \"relpages * 8k\" from pg_class natural join (select\noid relnamespace,nspname from pg_namespace) nsp where nspname='public' order by relpages;\n\n relkind |       relname       |   reltuples   | relpages |    relpages * 8k\n---------+---------------------+---------------+----------+----------------------\n S       | orders_order_id_seq |             1 |        1 |           8192 bytes\n S       | users_user_id_seq   |             1 |        1 |           8192 bytes\n i       | users_pkey          |        640000 |     1757 |                14 MB\n r       | users               |        640000 |     7739 |                60 MB\n i       | orders_pkey         |       6.4e+07 |   175482 |              1371 MB\n i       | orders_by_user      |       6.4e+07 |   175728 |              1373 MB\n r       | orders              | 6.4000048e+07 |   853334 |              6667 MB\n\n\nexplain (analyze,verbose,costs,buffers)\nselect order_month,sum(amount)\nfrom\n  (\n  select date_trunc('month',ORDER_DATE) order_month, USER_ID, AMOUNT\n  from ORDERS join USERS using(USER_ID)\n  ) user_orders\nwhere USER_ID=42\ngroup by order_month\n;\n                                                                QUERY PLAN\n------------------------------------------------------------------------------------------------------------------------------------------\n HashAggregate  (cost=1227.18..1227.33 rows=10 width=40) (actual time=0.102..0.105 rows=7 loops=1)\n   Output: (date_trunc('month'::text, orders.order_date)), sum(orders.amount)\n   Group Key: date_trunc('month'::text, orders.order_date)\n   Buffers: shared hit=23\n   -&gt;  Nested Loop  (cost=7.36..1225.65 rows=306 width=17) (actual time=0.027..0.072 rows=100 loops=1)\n         Output: date_trunc('month'::text, orders.order_date), orders.amount\n         Buffers: shared hit=23\n         -&gt;  Index Only Scan using users_pkey on public.users  (cost=0.42..4.44 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)\n               Output: users.user_id\n               Index Cond: (users.user_id = 42)\n               Heap Fetches: 0\n               Buffers: shared hit=4\n         -&gt;  Bitmap Heap Scan on public.orders  (cost=6.94..1217.38 rows=306 width=25) (actual time=0.015..0.037 rows=100 loops=1)\n               Output: orders.order_id, orders.order_date, orders.amount, orders.user_id, orders.description\n               Recheck Cond: (orders.user_id = 42)\n               Heap Blocks: exact=15\n               Buffers: shared hit=19\n               -&gt;  Bitmap Index Scan on orders_by_user  (cost=0.00..6.86 rows=306 width=0) (actual time=0.011..0.011 rows=100 loops=1)\n                     Index Cond: (orders.user_id = 42)\n                     Buffers: shared hit=4\n Planning Time: 0.086 ms\n Execution Time: 0.134 ms\n(22 rows)\n\nTime: 0.521 ms\n<\/code><\/pre>\n<p>You see how the cost increases slowly, with two additional pages to read here.<\/p>\n<p>Please, test with more rows. When looking at the time (less than 1 millisecond), keep in mind that it is CPU time here as all pages are in memory cache. On purpose, because the article mentions CPU and RAM (<em>&#8220;joins require a lot of CPU and memory&#8221;<\/em>). Anyway, 20 disk reads will still be within the millisecond on modern storage.<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-41520\" src=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/SQL-Join-Scale-scaled.jpg\" alt=\"\" width=\"1024\" height=\"625\" \/><\/p>\n<h3>Time complexity of B*Tree is actually very light<\/h3>\n<p>Let&#8217;s get back to the &#8220;Big O&#8221; notation. We are definitely not &#8220;O (M + N)&#8221;. This would have been without any index, where all pages from all tables would have been full scanned. Any database maintains hashed or sorted structures to improve high selectivity queries. This has nothing to do with SQL vs. NoSQL or with RDBMS vs. Hierarchichal DB. In DynamoDB this structure on primary key is HASH partitioning (and additional sorting). In RDBMS the most common structure is a sorted index (with the possibility of additional partitioning, zone maps, clustering,&#8230;). We are not in &#8220;O(log(N))+Nlog(M)&#8221; but more like &#8220;O(log(N))+log(M)&#8221; because the size of the driving table (N) has nothing to do on the inner loop cost. The first filtering, the one with &#8220;O(log(N))&#8221;, has filtered out those N rows. Using &#8220;N&#8221; and ignoring the outer selectivity is a mistake. Anyway, this O(log) for B*Tree is really scalable as we have seen because the base of this log() function is really small (thanks to large branch blocks &#8211; 8KB).<\/p>\n<p>Anyway, the point is not there. As you have seen, the cost of Nested Loop is in the blocks that are fetched from the inner loop, and this depends on the selectivity and the clustering factor (the table physical order correlation with the index order), and some prefetching techniques. This O(log) for going through the index is often insignificant on Range Scans.<\/p>\n<h3>40 years of RDBMS optimizations<\/h3>\n<p>If you have a database size that is several order of magnitude from those million rows, and where this millisecond latency is a problem, the major RDBMS offer many features and structures to go further in performance and predictability of response time. As we have seen, the cost of this query is not about Index, Nested Loop or Group By. It is about the scattered items: ingested by date but queried by user.<\/p>\n<p>Through its 40 years of optimizing OLTP applications, the <strong>Oracle Database<\/strong> have a lot of features, with their popularity depending on the IT trends. From the first versions, in the 80&#8217;s, because the hierarchical databases were still in people&#8217;s mind, and the fear of &#8220;joins are expensive&#8221; was already there, Oracle has implemented a structure to store the tables together, pre-joined. This is called <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/20\/sqlrf\/CREATE-CLUSTER.html#GUID-4DBC701F-AFC3-486D-AA32-B5CB1D6946F7\" target=\"_blank\" rel=\"noopener noreferrer\">CLUSTER<\/a>. It was, from the beginning, indexed with a sorted structure (INDEX CLUSTER) or a hashed structure (HASH CLUSTER). Parallel Query was quickly implemented to scale-up and, with Parallel Server now known as RAC, to scale-out. Partitioning was introduced to scale further, again with sorted (PARTITION by RANGE) or hashed (PARTITION BY HASH) segments. This can scale further the index access (LOCAL INDEX with partition pruning). And it also scales joins (PARTITION-WISE JOIN) to reduce CPU and RAM to join large datasets. When more agility was needed, B*Tree sorted structures were the winner and Index Organized Tables were introduced to cluster data (to avoid the most expensive operation as we have seen in the examples above). But Heap Tables and B*Tree are still the winners given their agility and because the Join operations were rarely a bottleneck. Currently, partitioning has improved (even scaling horizontally beyond the clusters with sharding), as well as data clustering (<a href=\"https:\/\/www.dbi-services.com\/blog\/oracle-12c-clustering\/\" target=\"_blank\" rel=\"noopener noreferrer\">Attribute Clustering<\/a>, Zone Maps and Storage Index). And there&#8217;s always the possibility for Index Only access, just by adding more columns to the index, removing the major cost of table access. And materialized views can also act as <a href=\"https:\/\/www.dbi-services.com\/blog\/oracle-index-on-joins\/\" target=\"_blank\" rel=\"noopener noreferrer\">pre-built join index<\/a> and will also reduce the CPU and RAM required for GROUP BY aggregation.<\/p>\n<p>I listed a lot of Oracle features, but Microsoft <strong>SQL Server<\/strong> has also many similar features. Clustered tables, covering indexes, Indexed Views, and of course partitioning. Open source databases are evolving a lot. <strong>PostgreSQL<\/strong> has partitioning, parallel query, clustering,.. Still Open Source, you can web-scale it to a distributed database with <strong>YugabyteDB<\/strong>. <strong>MySQL<\/strong> is evolving a lot recently. I&#8217;ll not list all databases and all features. You can port the testcase to other databases easily.<\/p>\n<p>Oh, and by the way, if you want to test it on <strong>Oracle Database<\/strong> you will quickly realize that the query from the article may not even do any Join operation. Because when you query with the USER_ID you probably don&#8217;t need to get back to the USERS table. And thanks to the declared foreign key, the optimizer knows that the join is not necessary: <a href=\"https:\/\/dbfiddle.uk\/?rdbms=oracle_18&amp;fiddle=a5afeba2fdb27dec7533545ab0a6eb0e\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/dbfiddle.uk\/?rdbms=oracle_18&amp;fiddle=a5afeba2fdb27dec7533545ab0a6eb0e<\/a>.<\/p>\n<p>NoSQL databases are good for many use-cases. But remember that it is a key-value optimized data store. When you start to split a document into multiple entities with their own keys, remember that relational databases were invented for that. I blogged about another myth recently: <a href=\"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-agility-adding-attributes\/\" target=\"_blank\" rel=\"noopener noreferrer\">The myth of NoSQL (vs. RDBMS) agility: adding attributes<\/a>. All feedback welcome, here or on Twitter:<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">A new blog post on another myth in NoSQL vs. RDBMS:<br \/>  \u201cjoins dont scale\u201d<a href=\"https:\/\/t.co\/u4aBE4HXb7\">https:\/\/t.co\/u4aBE4HXb7<\/a><\/p>\n<p>&mdash; Franck Pachot (@FranckPachot) <a href=\"https:\/\/twitter.com\/FranckPachot\/status\/1279864643029291010?ref_src=twsrc%5Etfw\">July 5, 2020<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\nAnd Twitter has also Alex DeBrie interesting thoughts about it:<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Franck Pachot (<a href=\"https:\/\/twitter.com\/FranckPachot?ref_src=twsrc%5Etfw\">@FranckPachot<\/a> ) is an RDBMS expert and wrote a really nice response to an earlier post I wrote about how DynamoDB scales where relational databases don&#39;t. I highly recommend reading it.<\/p>\n<p>Link below, with 3 thoughts from me to follow.<a href=\"https:\/\/t.co\/DjOK3RWJZc\">https:\/\/t.co\/DjOK3RWJZc<\/a><\/p>\n<p>&mdash; Alex DeBrie (@alexbdebrie) <a href=\"https:\/\/twitter.com\/alexbdebrie\/status\/1280958926436077568?ref_src=twsrc%5Etfw\">July 8, 2020<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Franck Pachot . I&#8217;ll reference Alex DeBrie article &#8220;SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don&#8217;t&#8220;, especially the paragraph about &#8220;Why relational databases don\u2019t scale&#8221;. But I want to make clear that my post here is not against this article, but against a very common myth that even precedes NoSQL databases. [&hellip;]<\/p>\n","protected":false},"author":27,"featured_media":14403,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1865,229,1739,59,83,99],"tags":[2027,947,1927,98,1857],"type_dbi":[],"class_list":["post-14401","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aws","category-database-administration-monitoring","category-nosql","category-oracle","category-postgresql","category-sql-server","tag-joins","tag-nosql","tag-rdbms","tag-sql","tag-yugabytedb"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>The myth of NoSQL (vs. RDBMS) &quot;joins dont scale&quot; - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The myth of NoSQL (vs. RDBMS) &quot;joins dont scale&quot;\" \/>\n<meta property=\"og:description\" content=\"By Franck Pachot . I&#8217;ll reference Alex DeBrie article &#8220;SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don&#8217;t&#8220;, especially the paragraph about &#8220;Why relational databases don\u2019t scale&#8221;. But I want to make clear that my post here is not against this article, but against a very common myth that even precedes NoSQL databases. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-07-05T17:45:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-13T14:19:59+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1724\" \/>\n\t<meta property=\"og:image:height\" content=\"834\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Oracle Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Oracle Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/\"},\"author\":{\"name\":\"Oracle Team\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/66ab87129f2d357f09971bc7936a77ee\"},\"headline\":\"The myth of NoSQL (vs. RDBMS) &#8220;joins dont scale&#8221;\",\"datePublished\":\"2020-07-05T17:45:59+00:00\",\"dateModified\":\"2023-07-13T14:19:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/\"},\"wordCount\":2387,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/Annotation-2020-07-05-200730.jpg\",\"keywords\":[\"joins\",\"NoSQL\",\"RDBMS\",\"SQL\",\"YugaByteDB\"],\"articleSection\":[\"AWS\",\"Database Administration &amp; Monitoring\",\"NoSQL\",\"Oracle\",\"PostgreSQL\",\"SQL Server\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/\",\"name\":\"The myth of NoSQL (vs. RDBMS) \\\"joins dont scale\\\" - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/Annotation-2020-07-05-200730.jpg\",\"datePublished\":\"2020-07-05T17:45:59+00:00\",\"dateModified\":\"2023-07-13T14:19:59+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/66ab87129f2d357f09971bc7936a77ee\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/Annotation-2020-07-05-200730.jpg\",\"contentUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2022\\\/04\\\/Annotation-2020-07-05-200730.jpg\",\"width\":1724,\"height\":834},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The myth of NoSQL (vs. RDBMS) &#8220;joins dont scale&#8221;\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/66ab87129f2d357f09971bc7936a77ee\",\"name\":\"Oracle Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"caption\":\"Oracle Team\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/oracle-team\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"The myth of NoSQL (vs. RDBMS) \"joins dont scale\" - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/","og_locale":"en_US","og_type":"article","og_title":"The myth of NoSQL (vs. RDBMS) \"joins dont scale\"","og_description":"By Franck Pachot . I&#8217;ll reference Alex DeBrie article &#8220;SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don&#8217;t&#8220;, especially the paragraph about &#8220;Why relational databases don\u2019t scale&#8221;. But I want to make clear that my post here is not against this article, but against a very common myth that even precedes NoSQL databases. [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/","og_site_name":"dbi Blog","article_published_time":"2020-07-05T17:45:59+00:00","article_modified_time":"2023-07-13T14:19:59+00:00","og_image":[{"width":1724,"height":834,"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg","type":"image\/jpeg"}],"author":"Oracle Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Oracle Team","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/"},"author":{"name":"Oracle Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"headline":"The myth of NoSQL (vs. RDBMS) &#8220;joins dont scale&#8221;","datePublished":"2020-07-05T17:45:59+00:00","dateModified":"2023-07-13T14:19:59+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/"},"wordCount":2387,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg","keywords":["joins","NoSQL","RDBMS","SQL","YugaByteDB"],"articleSection":["AWS","Database Administration &amp; Monitoring","NoSQL","Oracle","PostgreSQL","SQL Server"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/","url":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/","name":"The myth of NoSQL (vs. RDBMS) \"joins dont scale\" - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg","datePublished":"2020-07-05T17:45:59+00:00","dateModified":"2023-07-13T14:19:59+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2022\/04\/Annotation-2020-07-05-200730.jpg","width":1724,"height":834},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/the-myth-of-nosql-vs-rdbms-joins-dont-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"The myth of NoSQL (vs. RDBMS) &#8220;joins dont scale&#8221;"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee","name":"Oracle Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","caption":"Oracle Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/oracle-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/14401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=14401"}],"version-history":[{"count":1,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/14401\/revisions"}],"predecessor-version":[{"id":26693,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/14401\/revisions\/26693"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media\/14403"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=14401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=14401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=14401"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=14401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}