{"id":10437,"date":"2017-09-08T19:13:21","date_gmt":"2017-09-08T17:13:21","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/"},"modified":"2017-09-08T19:13:21","modified_gmt":"2017-09-08T17:13:21","slug":"create-constraints-in-your-datawarehouse-why-and-how","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/","title":{"rendered":"Create constraints in your datawarehouse &#8211; why and how"},"content":{"rendered":"<h2>By Franck Pachot<\/h2>\n<p>.<br \/>\nWe still see some developers not declaring referential integrity constraints in datawarehouse databases because they think they don&#8217;t need it (integrity of data has been validated by the ETL). Here is a small demo I did to show why you need to declare them, and how to do it to avoid any overhead on the ETL.<br \/>\n<!--more--><\/p>\n<h3>Test case<\/h3>\n<p>I create 3 dimension tables and 1 fact table:<\/p>\n<pre><code>\n21:01:18 SQL&gt; create table DIM1 (DIM1_ID number, DIM1_ATT1 varchar2(20));\nTable DIM1 created.\n&nbsp;\n21:01:19 SQL&gt; create table DIM2 (DIM2_ID number, DIM2_ATT1 varchar2(20));\nTable DIM2 created.\n&nbsp;\n21:01:20 SQL&gt; create table DIM3 (DIM3_ID number, DIM3_ATT1 varchar2(20));\nTable DIM3 created.\n&nbsp;\n21:01:21 SQL&gt; create table FACT (DIM1_ID number, DIM2_ID number, DIM3_ID number,MEAS1 number);\nTable FACT created.\n<\/code><\/pre>\n<p>I insert 10 million rows into the fact table:<\/p>\n<pre><code>\n21:01:22 SQL&gt; insert into FACT select mod(rownum,3),mod(rownum,5),mod(rownum,10),rownum from xmltable('1 to 10000000');\n10,000,000 rows inserted.\n&nbsp;\nElapsed: 00:00:18.983\n<\/code><\/pre>\n<p>and fill the dimension tables from it:<\/p>\n<pre><code>\n21:01:42 SQL&gt; insert into DIM1 select distinct DIM1_ID,'...'||DIM1_ID from FACT;\n3 rows inserted.\n&nbsp;\nElapsed: 00:00:01.540\n&nbsp;\n21:01:52 SQL&gt; insert into DIM2 select distinct DIM2_ID,'...'||DIM2_ID from FACT;\n5 rows inserted.\n&nbsp;\nElapsed: 00:00:01.635\n&nbsp;\n21:01:57 SQL&gt; insert into DIM3 select distinct DIM3_ID,'...'||DIM3_ID from FACT;\n10 rows inserted.\n&nbsp;\nElapsed: 00:00:01.579\n&nbsp;\n21:01:58 SQL&gt; commit;\nCommit complete.\n<\/code><\/pre>\n<h3>Query joining fact with one dimension<\/h3>\n<p>I&#8217;ll run the following query:<\/p>\n<pre><code>\n21:01:58 SQL&gt; select count(*) from FACT join DIM1 using(DIM1_ID);\n&nbsp;\nCOUNT(*)\n--------\n10000000\n&nbsp;\nElapsed: 00:00:01.015\n<\/code><\/pre>\n<p>Here is the execution plan:<\/p>\n<pre><code>\n21:02:12 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  4pqjrjkc7sn17, child number 0\n-------------------------------------\nselect count(*) from FACT join DIM1 using(DIM1_ID)\n&nbsp;\nPlan hash value: 1826335751\n&nbsp;\n----------------------------------------------------------------------------\n| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |\n----------------------------------------------------------------------------\n|   0 | SELECT STATEMENT    |      |       |       |  7514 (100)|          |\n|   1 |  SORT AGGREGATE     |      |     1 |    26 |            |          |\n|*  2 |   HASH JOIN         |      |    10M|   253M|  7514   (2)| 00:00:01 |\n|   3 |    TABLE ACCESS FULL| DIM1 |     3 |    39 |     3   (0)| 00:00:01 |\n|   4 |    TABLE ACCESS FULL| FACT |    10M|   126M|  7482   (1)| 00:00:01 |\n----------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - access(\"FACT\".\"DIM1_ID\"=\"DIM1\".\"DIM1_ID\")\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n<\/code><\/pre>\n<p>Actually, we don&#8217;t need that join. A dimension table has two goals:<\/p>\n<ul>\n<li>filter facts on the dimension attributes. Example: filter on customer last name<\/li>\n<li>add dimension attributes to the result. Example: add customer first name<\/li>\n<\/ul>\n<p>Here, there is no WHERE clause on DIM1 columns, and no columns from DIM1 selected. We don&#8217;t need to join to DIM1. However, we often see those useless joins for two reasons:<\/p>\n<ul>\n<li>We query a view that joins the fact with all dimensions<\/li>\n<li>The query is generated by a reporting tool which always join to dimensions<\/li>\n<\/ul>\n<h3>Join elimination<\/h3>\n<p>The Oracle optimizer is able to remove those kinds of unnecessary joins. But one information is missing here for the optimizer. We know that all rows in the fact table have a matching row in each dimension, but Oracle doesn&#8217;t know that. And if there is no mathing row, then the inner join should not return the result. For this reason, the join must be done.<\/p>\n<p>Let&#8217;s give this information to the optimizer: declare the foreign key from FACT to DIM1 so that Oracle knows that there is a many-to-one relationship:<\/p>\n<pre><code>\n21:02:17 SQL&gt; alter table DIM1 add constraint DIM1_PK primary key(DIM1_ID);\nTable DIM1 altered.\n&nbsp;\nElapsed: 00:00:00.051\n&nbsp;\n21:02:20 SQL&gt; alter table FACT add constraint DIM1_FK foreign key(DIM1_ID) references DIM1;\nTable FACT altered.\n&nbsp;\nElapsed: 00:00:03.210\n<\/code><\/pre>\n<p>I&#8217;ve spent 3 seconds here to create this foreign key (would have been much longer with a real fact table and lot of columns and rows) but now, the optimizer is able to eliminate the join:<\/p>\n<pre><code>\n21:02:24 SQL&gt; select count(*) from FACT join DIM1 using(DIM1_ID);\n&nbsp;\nCOUNT(*)\n--------\n10000000\n&nbsp;\n21:02:25 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  4pqjrjkc7sn17, child number 0\n-------------------------------------\nselect count(*) from FACT join DIM1 using(DIM1_ID)\n&nbsp;\nPlan hash value: 3735838348\n&nbsp;\n---------------------------------------------------------------------------\n| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |\n---------------------------------------------------------------------------\n|   0 | SELECT STATEMENT   |      |       |       |  7488 (100)|          |\n|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |\n|*  2 |   TABLE ACCESS FULL| FACT |    10M|   126M|  7488   (1)| 00:00:01 |\n---------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - filter(\"FACT\".\"DIM1_ID\" IS NOT NULL)\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n<\/code><\/pre>\n<p>No join needed here, the query is faster. This is exactly the point of this blog post: to show you that declaring constraints improve performance of queries. It adds information to the optimizer, like statistics. Statistics gives estimated cardinalities. Foreign keys are exact cardinality (many-to-one).<\/p>\n<h3>No validate<\/h3>\n<p>When loading a datawarehouse, you usually don&#8217;t need to validate the constraints because data was bulk loaded from a staging area where all data validation has been done. You don&#8217;t want to spend time validating constraints (the 3 seconds in my small example above) and this is why some datawarehouse developers do not declare constraints.<\/p>\n<p>However, we can declare constraints without validating them. Let&#8217;s do that for the second dimension table:<\/p>\n<pre><code>\n21:02:34 SQL&gt; alter table DIM2 add constraint DIM2_PK primary key(DIM2_ID) novalidate;\nTable DIM2 altered.\n&nbsp;\nElapsed: 00:00:00.018\n%nbsp;\n21:02:35 SQL&gt; alter table FACT add constraint DIM2_FK foreign key(DIM2_ID) references DIM2 novalidate;\nTable FACT altered.\n&nbsp;\nElapsed: 00:00:00.009\n<\/code><\/pre>\n<p>That was much faster than the 3 seconds we had for the &#8216;validate&#8217; constraint which is the default. Creating a constraint in NOVALIDATE is immediate and do not depend on the size of the table.<\/p>\n<p>However this is not sufficient to get the join elimination:<\/p>\n<pre><code>\n21:02:39 SQL&gt; select count(*) from FACT join DIM2 using(DIM2_ID);\n&nbsp;\nCOUNT(*)\n--------\n10000000\n&nbsp;\n21:02:40 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  4t9g2n6duw0jf, child number 0\n-------------------------------------\nselect count(*) from FACT join DIM2 using(DIM2_ID)\n&nbsp;\nPlan hash value: 3858910383\n&nbsp;\n-------------------------------------------------------------------------------\n| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |\n-------------------------------------------------------------------------------\n|   0 | SELECT STATEMENT    |         |       |       |  7518 (100)|          |\n|   1 |  SORT AGGREGATE     |         |     1 |    26 |            |          |\n|*  2 |   HASH JOIN         |         |    10M|   253M|  7518   (2)| 00:00:01 |\n|   3 |    INDEX FULL SCAN  | DIM2_PK |     5 |    65 |     1   (0)| 00:00:01 |\n|   4 |    TABLE ACCESS FULL| FACT    |    10M|   126M|  7488   (1)| 00:00:01 |\n-------------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - access(\"FACT\".\"DIM2_ID\"=\"DIM2\".\"DIM2_ID\")\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n<\/code><\/pre>\n<p>The constraint ensures that no rows will be inserted without a matching row in the dimension. However, because Oracle has not validated the result itself, it does not apply the join elimination, just in case a previously existing row has no matching dimension.<\/p>\n<h3>Rely novalidate<\/h3>\n<p>If you want the optimizer to do the join elimination on a &#8216;novalidate&#8217; constraint, then it has to trust you and rely on the constraint you have validated.<\/p>\n<p>RELY is an attribute of the constraint that you can set:<\/p>\n<pre><code>\n21:02:44 SQL&gt; alter table DIM2 modify constraint DIM2_PK rely;\nTable DIM2 altered.\n&nbsp;\nElapsed: 00:00:00.016\n&nbsp;\n21:02:45 SQL&gt; alter table FACT modify constraint DIM2_FK rely;\nTable FACT altered.\n&nbsp;\nElapsed: 00:00:00.010\n<\/code><\/pre>\n<p>But this is not sufficient. You told Oracle to rely on your constraint, but Oracle must trust you.<\/p>\n<h3>Trusted<\/h3>\n<p>The join elimination is a rewrite of the query and, by default, rewrite is enabled but only when integrity is enforced by Oracle:<\/p>\n<pre><code>\n21:02:50 SQL&gt; show parameter query_rewrite\nNAME                    TYPE   VALUE\n----------------------- ------ --------\nquery_rewrite_enabled   string TRUE\nquery_rewrite_integrity string ENFORCED\n<\/code><\/pre>\n<p>Let&#8217;s allow our session to have rewrite transformations to trust our RELY constraints:<\/p>\n<pre><code>\n21:02:52 SQL&gt; alter session set query_rewrite_integrity=trusted;\nSession altered.\n<\/code><\/pre>\n<p>Now, joining to DIM2 without using DIM2 columns outside of the join allows join elimination:<\/p>\n<pre><code>\n21:02:57 SQL&gt; select count(*) from FACT join DIM2 using(DIM2_ID);\n&nbsp;\nCOUNT(*)\n--------\n10000000\n&nbsp;\nElapsed: 00:00:00.185\n21:02:58 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  4t9g2n6duw0jf, child number 0\n-------------------------------------\nselect count(*) from FACT join DIM2 using(DIM2_ID)\n&nbsp;\nPlan hash value: 3735838348\n&nbsp;\n---------------------------------------------------------------------------\n| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |\n---------------------------------------------------------------------------\n|   0 | SELECT STATEMENT   |      |       |       |  7494 (100)|          |\n|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |\n|*  2 |   TABLE ACCESS FULL| FACT |    10M|   126M|  7494   (1)| 00:00:01 |\n---------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - filter(\"FACT\".\"DIM2_ID\" IS NOT NULL)\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n   - rely constraint used for this statement\n<\/code><\/pre>\n<p>In 12.2 the execution plan has a note to show that the plan depends on RELY constraint.<\/p>\n<p>From this example, you can see that you can, and should, create RELY NOVALIDATE constraints on tables where you know the existing data is valid. They are immediately created, without any overhead on the load process, and helps to improve queries generated on your dimensional model.<\/p>\n<h3>Rely Disable<\/h3>\n<p>I said that a NOVALIDATE constraint has no overhead when created, but you may have further inserts or updates in your datawarehouse. And then, those constraints will have to be verified and this may add a little overhead. In this case, you can go further and disable the constraint:<\/p>\n<pre><code>\n21:03:04 SQL&gt; alter table DIM3 add constraint DIM3_PK primary key(DIM3_ID) rely;\nTable DIM3 altered.\n&nbsp;\nElapsed: 00:00:00.059\n&nbsp;\n21:03:05 SQL&gt; alter table FACT add constraint DIM3_FK foreign key(DIM3_ID) references DIM3 rely disable novalidate;\nTable FACT altered.\n&nbsp;\nElapsed: 00:00:00.014\n<\/code><\/pre>\n<p>Note that I had to set the referenced constraint DIM3_PK to RELY here, even if it is enable and validate, or I would get: ORA-25158: Cannot specify RELY for foreign key if the associated primary key is NORELY.<\/p>\n<p>My session still trusts RELY constraints for query rewrite:<\/p>\n<pre><code>\n21:03:07 SQL&gt; show parameter query_rewrite\n&nbsp;\nNAME                    TYPE   VALUE\n----------------------- ------ -------\nquery_rewrite_enabled   string TRUE\nquery_rewrite_integrity string TRUSTED\n<\/code><\/pre>\n<p>Now, the join elimination occurs:<\/p>\n<pre><code>\n21:03:08 SQL&gt; select count(*) from FACT join DIM3 using(DIM3_ID);\n&nbsp;\nCOUNT(*)\n--------\n10000000\n&nbsp;\n21:03:09 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  3bhs523zyudf0, child number 0\n-------------------------------------\nselect count(*) from FACT join DIM3 using(DIM3_ID)\n&nbsp;\nPlan hash value: 3735838348\n&nbsp;\n---------------------------------------------------------------------------\n| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |\n---------------------------------------------------------------------------\n|   0 | SELECT STATEMENT   |      |       |       |  7505 (100)|          |\n|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |\n|*  2 |   TABLE ACCESS FULL| FACT |    11M|   138M|  7505   (1)| 00:00:01 |\n---------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - filter(\"FACT\".\"DIM3_ID\" IS NOT NULL)\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n   - rely constraint used for this statement\n<\/code><\/pre>\n<p>So, we can still benefit from the query optimization even with the RELY DISABLE NOVALIDATE.<\/p>\n<p>But I would not recommend this. Be careful. Here are my foreign key constraints:<\/p>\n<pre><code>\n21:03:15 SQL&gt; select table_name,constraint_type,constraint_name,status,validated,rely from all_constraints where owner='\nDEMO' and table_name='FACT' order by 4 desc,5 desc,6 nulls last;\n&nbsp;\nTABLE_NAME  CONSTRAINT_TYPE  CONSTRAINT_NAME  STATUS    VALIDATED      RELY\n----------  ---------------  ---------------  ------    ---------      ----\nFACT        R                DIM1_FK          ENABLED   VALIDATED\nFACT        R                DIM2_FK          ENABLED   NOT VALIDATED  RELY\nFACT        R                DIM3_FK          DISABLED  NOT VALIDATED  RELY\n<\/code><\/pre>\n<p>For DIM1_FK and DIM2_FK the constraints prevent us from inconsistencies:<\/p>\n<pre><code>\n21:03:17 SQL&gt; insert into FACT(DIM1_ID)values(666);\n&nbsp;\nError starting at line : 1 in command -\ninsert into FACT(DIM1_ID)values(666)\nError report -\nORA-02291: integrity constraint (DEMO.DIM1_FK) violated - parent key not found\n<\/code><\/pre>\n<p>But the disabled one will allow inconsistencies:<\/p>\n<pre><code>\n21:03:19 SQL&gt; insert into FACT(DIM3_ID)values(666);\n1 row inserted.\n<\/code><\/pre>\n<p>That&#8217;s bad. I rollback this immediately:<\/p>\n<pre><code>\n21:03:20 SQL&gt; rollback;\nRollback complete.\n<\/code><\/pre>\n<h3>Star transformation<\/h3>\n<p>Join elimination is not the only transformation that needs to know about the many-to-one relationship between fact tables and dimensions. You usually create a bitmap index on each foreign key to the dimension, to get the higher selectivity when looking at the table rows from the combination of criteria on the dimension attributes.<\/p>\n<pre><code>\n21:03:24 SQL&gt; create bitmap index FACT_DIM1 on FACT(DIM1_ID);\nIndex FACT_DIM1 created.\n&nbsp;\n21:03:29 SQL&gt; create bitmap index FACT_DIM2 on FACT(DIM2_ID);\nIndex FACT_DIM2 created.\n&nbsp;\n21:03:33 SQL&gt; create bitmap index FACT_DIM3 on FACT(DIM3_ID);\nIndex FACT_DIM3 created.\n<\/code><\/pre>\n<p>Here is the kind of query with predicates on each dimension attributes:<\/p>\n<pre><code>\n21:03:35 SQL&gt; select count(*) from FACT\n  2  join DIM1 using(DIM1_ID) join DIM2 using(DIM2_ID) join DIM3 using(DIM3_ID)\n  3  where dim1_att1='...0' and dim2_att1='...0' and dim3_att1='...0';\n&nbsp;\nCOUNT(*)\n--------\n333333\n<\/code><\/pre>\n<p>By default, the optimizer applies those predicates on the dimension and do a cartesian join to get all accepted combinations of dimension IDs. Then the rows can be fetched from the table:<\/p>\n<pre><code>\n21:03:37 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  01jmjv0sz1dpq, child number 0\n-------------------------------------\nselect count(*) from FACT  join DIM1 using(DIM1_ID) join DIM2\nusing(DIM2_ID) join DIM3 using(DIM3_ID)  where dim1_att1='...0' and\ndim2_att1='...0' and dim3_att1='...0'\n&nbsp;\nPlan hash value: 1924236134\n&nbsp;\n-------------------------------------------------------------------------------------------\n| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |\n-------------------------------------------------------------------------------------------\n|   0 | SELECT STATEMENT              |           |       |       |  5657 (100)|          |\n|   1 |  SORT AGGREGATE               |           |     1 |   114 |            |          |\n|   2 |   NESTED LOOPS                |           | 55826 |  6215K|  5657   (1)| 00:00:01 |\n|   3 |    MERGE JOIN CARTESIAN       |           |     1 |    75 |     9   (0)| 00:00:01 |\n|   4 |     MERGE JOIN CARTESIAN      |           |     1 |    50 |     6   (0)| 00:00:01 |\n|*  5 |      TABLE ACCESS FULL        | DIM1      |     1 |    25 |     3   (0)| 00:00:01 |\n|   6 |      BUFFER SORT              |           |     1 |    25 |     3   (0)| 00:00:01 |\n|*  7 |       TABLE ACCESS FULL       | DIM2      |     1 |    25 |     3   (0)| 00:00:01 |\n|   8 |     BUFFER SORT               |           |     1 |    25 |     6   (0)| 00:00:01 |\n|*  9 |      TABLE ACCESS FULL        | DIM3      |     1 |    25 |     3   (0)| 00:00:01 |\n|  10 |    BITMAP CONVERSION COUNT    |           | 55826 |  2126K|  5657   (1)| 00:00:01 |\n|  11 |     BITMAP AND                |           |       |       |            |          |\n|* 12 |      BITMAP INDEX SINGLE VALUE| FACT_DIM3 |       |       |            |          |\n|* 13 |      BITMAP INDEX SINGLE VALUE| FACT_DIM2 |       |       |            |          |\n|* 14 |      BITMAP INDEX SINGLE VALUE| FACT_DIM1 |       |       |            |          |\n-------------------------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   5 - filter(\"DIM1\".\"DIM1_ATT1\"='...0')\n   7 - filter(\"DIM2\".\"DIM2_ATT1\"='...0')\n   9 - filter(\"DIM3\".\"DIM3_ATT1\"='...0')\n  12 - access(\"FACT\".\"DIM3_ID\"=\"DIM3\".\"DIM3_ID\")\n  13 - access(\"FACT\".\"DIM2_ID\"=\"DIM2\".\"DIM2_ID\")\n  14 - access(\"FACT\".\"DIM1_ID\"=\"DIM1\".\"DIM1_ID\")\n<\/code><\/pre>\n<p>Here rows are fetched from the fact table through a nested loop from the cartesian join on the dimensions, using the bitmap index access for each loop. If there are lot of rows to fetch, then the optimizer will chose a hash join and then will have to full scan the fact table, which is expensive. To lower that cost, the optimizer can add a &#8216;IN (SELECT DIM_ID FROM DIM WHERE DIM_ATT)&#8217; for very selective dimensions. This is STAR transformation and relies on the foreign key constraints.<\/p>\n<p>It is not enabled by default:<\/p>\n<pre><code>\n21:03:43 SQL&gt; show parameter star\nNAME                         TYPE    VALUE\n---------------------------- ------- -----\nstar_transformation_enabled  string  FALSE\n<\/code><\/pre>\n<p>We can enable it and then it is a cost based transformation:<\/p>\n<pre><code>\n21:03:45 SQL&gt; alter session set star_transformation_enabled=true;\nSession altered.\n<\/code><\/pre>\n<p>Here is my example:<\/p>\n<pre><code>\n21:03:47 SQL&gt; select count(*) from FACT\n  2  join DIM1 using(DIM1_ID) join DIM2 using(DIM2_ID) join DIM3 using(DIM3_ID)\n  3  where dim1_att1='...0' and dim2_att1='...0' and dim3_att1='...0';\n&nbsp;\nCOUNT(*)\n--------\n333333\n<\/code><\/pre>\n<p>The star transformation, changing a join to an &#8216;IN()&#8217; is possible only when we know that there is a many-to-one relationship. We have all constraints for that, disabled or not, validated or not, but all in RELY. Then Star Transformation can occur:<\/p>\n<pre><code>\n21:03:51 SQL&gt; select * from dbms_xplan.display_cursor();\n&nbsp;\nPLAN_TABLE_OUTPUT\n-----------------\nSQL_ID  01jmjv0sz1dpq, child number 1\n-------------------------------------\nselect count(*) from FACT  join DIM1 using(DIM1_ID) join DIM2\nusing(DIM2_ID) join DIM3 using(DIM3_ID)  where dim1_att1='...0' and\ndim2_att1='...0' and dim3_att1='...0'\n&nbsp;\nPlan hash value: 1831539117\n&nbsp;\n--------------------------------------------------------------------------------------------------\n| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |\n--------------------------------------------------------------------------------------------------\n|   0 | SELECT STATEMENT                |                |       |       |    68 (100)|          |\n|   1 |  SORT AGGREGATE                 |                |     1 |    38 |            |          |\n|*  2 |   HASH JOIN                     |                |     2 |    76 |    68   (0)| 00:00:01 |\n|*  3 |    TABLE ACCESS FULL            | DIM2           |     1 |    25 |     3   (0)| 00:00:01 |\n|   4 |    VIEW                         | VW_ST_62BA0C91 |     8 |   104 |    65   (0)| 00:00:01 |\n|   5 |     NESTED LOOPS                |                |     8 |   608 |    56   (0)| 00:00:01 |\n|   6 |      BITMAP CONVERSION TO ROWIDS|                |     8 |   427 |    22   (5)| 00:00:01 |\n|   7 |       BITMAP AND                |                |       |       |            |          |\n|   8 |        BITMAP MERGE             |                |       |       |            |          |\n|   9 |         BITMAP KEY ITERATION    |                |       |       |            |          |\n|* 10 |          TABLE ACCESS FULL      | DIM1           |     1 |    25 |     3   (0)| 00:00:01 |\n|* 11 |          BITMAP INDEX RANGE SCAN| FACT_DIM1      |       |       |            |          |\n|  12 |        BITMAP MERGE             |                |       |       |            |          |\n|  13 |         BITMAP KEY ITERATION    |                |       |       |            |          |\n|* 14 |          TABLE ACCESS FULL      | DIM2           |     1 |    25 |     3   (0)| 00:00:01 |\n|* 15 |          BITMAP INDEX RANGE SCAN| FACT_DIM2      |       |       |            |          |\n|  16 |        BITMAP MERGE             |                |       |       |            |          |\n|  17 |         BITMAP KEY ITERATION    |                |       |       |            |          |\n|* 18 |          TABLE ACCESS FULL      | DIM3           |     1 |    25 |     3   (0)| 00:00:01 |\n|* 19 |          BITMAP INDEX RANGE SCAN| FACT_DIM3      |       |       |            |          |\n|  20 |      TABLE ACCESS BY USER ROWID | FACT           |     1 |    25 |    43   (0)| 00:00:01 |\n--------------------------------------------------------------------------------------------------\n&nbsp;\nPredicate Information (identified by operation id):\n---------------------------------------------------\n&nbsp;\n   2 - access(\"ITEM_1\"=\"DIM2\".\"DIM2_ID\")\n   3 - filter(\"DIM2\".\"DIM2_ATT1\"='...0')\n  10 - filter(\"DIM1\".\"DIM1_ATT1\"='...0')\n  11 - access(\"FACT\".\"DIM1_ID\"=\"DIM1\".\"DIM1_ID\")\n  14 - filter(\"DIM2\".\"DIM2_ATT1\"='...0')\n  15 - access(\"FACT\".\"DIM2_ID\"=\"DIM2\".\"DIM2_ID\")\n  18 - filter(\"DIM3\".\"DIM3_ATT1\"='...0')\n  19 - access(\"FACT\".\"DIM3_ID\"=\"DIM3\".\"DIM3_ID\")\n&nbsp;\nNote\n-----\n   - dynamic statistics used: dynamic sampling (level=2)\n   - star transformation used for this statement\n   - this is an adaptive plan\n<\/code><\/pre>\n<p>Here, each dimension drives a range scan on the bitmap index: the predicate on the dimension table returns the dimension ID for the the index lookup on the fact table. The big advantage of bitmap indexes here is that when this access path is used for several dimensions, the bitmap result can be combined before going to the table. This transformation avoids the join and then you must be sure that there is a many-to-one relationship.<\/p>\n<h3>In summary<\/h3>\n<p>As you should rely on the integrity of data in your datawarehouse, you should find the following parameters to query on fact-dimension schemas:<\/p>\n<pre><code>\nNAME                         TYPE     VALUE\n---------------------------- ------- ------\nquery_rewrite_enabled        string    TRUE\nquery_rewrite_integrity      string TRUSTED\nstar_transformation_enabled  string   FALSE\n<\/code><\/pre>\n<p>And you should define all constraints. When you are sure about the integrity of data, then those constraints can be created RELY ENABLE NOVALIDATE. If some processing must be optimized by not enforcing the constraint verification, then you may create those constraints as RELY DISABLE NOVALIDATE but the gain will probably minimal, and the risk high. But remember that there are not only the well-controlled processes which update data. You may have one day to do a manual update to fix something, and enabled constraint can prevent terrible errors.<\/p>\n<p>I have not covered all optimizer transformations that rely on constraints. When using materialized views you, the rewrite capability also relies on constraints. Relationship cardinality is one of the most important information of database design, this information must be known by the optimizer.<\/p>\n<h3>Added 9-OCT-2017<\/h3>\n<p>Re-reading this, I realize that I forgot to mention one important thing about disabled constraints. I recommend having the constraints enabled in case there is an update. But when you bulk insert (insert \/*+ append *\/) you will disable it or the insert will not be done in direct-path. So the idea is to disable it before the load and enabled it RELY NOVALIDATE after the load. And while we are there, I can mention that inconsistencies can happen only in NOVALIDATE DISABLE because with VALIDATE DISABLE you cannot insert\/update\/delete.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Franck Pachot . We still see some developers not declaring referential integrity constraints in datawarehouse databases because they think they don&#8217;t need it (integrity of data has been validated by the ETL). Here is a small demo I did to show why you need to declare them, and how to do it to avoid [&hellip;]<\/p>\n","protected":false},"author":27,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[294,59],"tags":[496,876,96],"type_dbi":[],"class_list":["post-10437","post","type-post","status-publish","format-standard","hentry","category-business-intelligence","category-oracle","tag-bi","tag-constraint","tag-oracle"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Create constraints in your datawarehouse - why and how - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Create constraints in your datawarehouse - why and how\" \/>\n<meta property=\"og:description\" content=\"By Franck Pachot . We still see some developers not declaring referential integrity constraints in datawarehouse databases because they think they don&#8217;t need it (integrity of data has been validated by the ETL). Here is a small demo I did to show why you need to declare them, and how to do it to avoid [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2017-09-08T17:13:21+00:00\" \/>\n<meta name=\"author\" content=\"Oracle Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Oracle Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\"},\"author\":{\"name\":\"Oracle Team\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\"},\"headline\":\"Create constraints in your datawarehouse &#8211; why and how\",\"datePublished\":\"2017-09-08T17:13:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\"},\"wordCount\":1451,\"commentCount\":0,\"keywords\":[\"BI\",\"Constraint\",\"Oracle\"],\"articleSection\":[\"Business Intelligence\",\"Oracle\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\",\"name\":\"Create constraints in your datawarehouse - why and how - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"datePublished\":\"2017-09-08T17:13:21+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Create constraints in your datawarehouse &#8211; why and how\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\",\"name\":\"Oracle Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"caption\":\"Oracle Team\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/oracle-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Create constraints in your datawarehouse - why and how - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/","og_locale":"en_US","og_type":"article","og_title":"Create constraints in your datawarehouse - why and how","og_description":"By Franck Pachot . We still see some developers not declaring referential integrity constraints in datawarehouse databases because they think they don&#8217;t need it (integrity of data has been validated by the ETL). Here is a small demo I did to show why you need to declare them, and how to do it to avoid [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/","og_site_name":"dbi Blog","article_published_time":"2017-09-08T17:13:21+00:00","author":"Oracle Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Oracle Team","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/"},"author":{"name":"Oracle Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"headline":"Create constraints in your datawarehouse &#8211; why and how","datePublished":"2017-09-08T17:13:21+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/"},"wordCount":1451,"commentCount":0,"keywords":["BI","Constraint","Oracle"],"articleSection":["Business Intelligence","Oracle"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/","url":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/","name":"Create constraints in your datawarehouse - why and how - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"datePublished":"2017-09-08T17:13:21+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/create-constraints-in-your-datawarehouse-why-and-how\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Create constraints in your datawarehouse &#8211; why and how"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee","name":"Oracle Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","caption":"Oracle Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/oracle-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/10437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=10437"}],"version-history":[{"count":0,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/10437\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=10437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=10437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=10437"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=10437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}