{"id":14898,"date":"2020-11-03T15:31:50","date_gmt":"2020-11-03T14:31:50","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/"},"modified":"2020-11-03T15:31:50","modified_gmt":"2020-11-03T14:31:50","slug":"getting-started-with-exasol-distribution-keys","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/","title":{"rendered":"Getting started with Exasol &#8211; Distribution keys"},"content":{"rendered":"<p>After the previous posts about Exasol, which have been more around installation, administration, backup &amp; restore, loading data and general notes around transactions and sessions we&#8217;ll now go into more details how Exasol manages data. Here is the list of the previous posts:<\/p>\n<ol>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-setting-up-an-environment\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Setting up an environment <\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-loading-data-from-postgresql\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Loading data from PostgreSQL<\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-some-words-about-indexes-and-transactions\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Some words about indexes and transactions<\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-sessions-and-auditing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Sessions and auditing <\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-installation-from-iso\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Installation from ISO <\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-backup-and-restore-1\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Backup and restore (1)<\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-backup-and-restore-2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 Backup and restore (2) <\/a><\/li>\n<li><a href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-a-multi-node-cluster-in-aws\/\" target=\"_blank\" rel=\"noopener noreferrer\">Getting started with Exasol \u2013 A multi node cluster in AWS<\/a><\/li>\n<\/ol>\n<p><!--more--><\/p>\n<p>Having a two node cluster running in AWS is a good starting point to explore an important concept in Exasol: Distribution keys. As soon as you have more than one data node in the cluster, data gets distributed across the cluster nodes, if a certain limit in size is reached, automatically. To start with, lets generate a CSV file we can import into Exasol. I&#8217;ll use this simple bash script to generate a file with 100&#8217;000&#8217;000 rows:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\ndwe@dwe:~\/Downloads$ cat gen_data.sh \n#!\/bin\/bash\n\nFILE=\"\/home\/dwe\/Downloads\/sample.csv\"\nrm -rf ${FILE}\n\nfor i in {1..10000000}; do\n    echo \"${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}\" &gt;&gt; ${FILE}\ndone\n<\/pre>\n<p>This results in a file of around 1GB:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\ndwe@dwe:~\/Downloads$ chmod +x gen_data.sh \ndwe@dwe:~\/Downloads$ .\/gen_data.sh \ndwe@dwe:~\/Downloads$ head -5 sample.csv \n1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1\n2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2\n3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3\n4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4\n5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5\ndwe@dwe:~\/Downloads$ ls -lha sample.csv \n-rw-rw-r-- 1 dwe dwe 1023M Oct 17 19:27 sample.csv\n<\/pre>\n<p>We&#8217;ll import that using the <a href=\"https:\/\/docs.exasol.com\/sql\/import.htm?Highlight=import\" target=\"_blank\" rel=\"noopener noreferrer\">import<\/a> command as we already did it in the previous posts. If you have multiple nodes, you can list of all of them in the connection string:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\n\/home\/dwe\/EXAplus-7.0.0\/exaplus -c 18.193.84.41,54.93.49.6:8563 -u sys -p xxxxx\n<\/pre>\n<p>Creating the schema and the table that will hold the data we want to import:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; create schema demo;\nEXA: create schema demo;\n\nRows affected: 0\n\nSQL_EXA&gt; open schema demo;\nEXA: open schema demo;\n\nRows affected: 0\n\nSQL_EXA&gt; create table sample ( id int primary key\n                             , firstname varchar(20)\n                             , lastname varchar(20)\n                             , email varchar(20)\n                             , street varchar(20)\n                             , country varchar(20)\n                             , description varchar(20)\n                             );\nEXA: create table sample ( id int primary key...\n\nRows affected: 0\n\n<\/pre>\n<p>Importing the sample csv created above:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; IMPORT INTO sample  \n         FROM LOCAL CSV FILE '\/home\/dwe\/Downloads\/sample.csv'\n         COLUMN SEPARATOR = ',';\nEXA: IMPORT INTO sample  ...\n\nRows affected: 10000000\n<\/pre>\n<p>If we check if this table already has a distribution key assigned, we&#8217;ll notice that there is none:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select TABLE_NAME,TABLE_HAS_DISTRIBUTION_KEY from exa_dba_tables where TABLE_NAME = 'SAMPLE';\nEXA: select TABLE_NAME,TABLE_HAS_DISTRIBUTION_KEY from exa_dba_tables where...\n\nTABLE_NAME           TABLE\n-------------------- -----\nSAMPLE               FALSE\n\n1 row in resultset.\n<\/pre>\n<p>If no distribution key is specified, the distribution of the rows is random. Using the <a href=\"https:\/\/docs.exasol.com\/6.2\/sql_references\/functions\/alphabeticallistfunctions\/iproc.htm?Highlight=iproc\" target=\"_blank\" rel=\"noopener noreferrer\">IPROC<\/a> function, you can you ask, on which node a given row is stored. If we do that for our sample table we&#8217;ll see this distribution of data:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select count(*) from (select id, iproc() from sample) where iproc=0;\nEXA: select count(*) from (select id, iproc() from sample) where iproc=0;\n\nCOUNT(*)             \n---------------------\n              5002896\n\n1 row in resultset.\n\nSQL_EXA&gt; select count(*) from (select id, iproc() from sample) where iproc=1;\nEXA: select count(*) from (select id, iproc() from sample) where iproc=1;\n\nCOUNT(*)             \n---------------------\n              4997104\n\n1 row in resultset.\n<\/pre>\n<p>As we currently have two data nodes, we see an approximate even distribution of the rows across the two nodes. To know if that distribution is a good choice, you need to know what columns will be used in joins. To demonstrate this let&#8217;s create a copy of the table, again without specifying a distribution key:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; create table sample2 as select * from sample;\nEXA: create table sample2 as select * from sample;\n\nRows affected: 10000000\n\nSQL_EXA&gt; alter table sample2 add constraint sample_pk primary key (id);\nEXA: alter table sample2 add constraint sample_pk primary key (id);\n\nRows affected: 0\n<\/pre>\n<p>Suppose we want to join these two tables on the primary key like this:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select sample.id, sample.firstname \n              from sample join sample2 on sample.id = sample2.id \n             where sample.id &lt; 20;\nEXA: select sample.id, sample.firstname from sample join sample2 on sample....\n\nID                    FIRSTNAME           \n--------------------- --------------------\n                    1 firstname1          \n                    2 firstname2          \n                    3 firstname3          \n                    4 firstname4          \n                    5 firstname5          \n                    6 firstname6          \n                    7 firstname7          \n                    8 firstname8          \n                    9 firstname9          \n                   10 firstname10         \n                   11 firstname11         \n                   12 firstname12         \n                   13 firstname13         \n                   14 firstname14         \n                   15 firstname15         \n                   16 firstname16         \n                   17 firstname17         \n                   18 firstname18         \n                   19 firstname19         \n\n19 rows in resultset.\n<\/pre>\n<p>How can we know if the distribution of the rows is good for this query? In PostgreSQL we would have a look at the execution plan but there is no command to generate an execution plan in Exasol (at least not in the same way you know it from other databases). What you need to do for getting the details about what the cost based optimizer is doing, is to turn on <a href=\"https:\/\/docs.exasol.com\/performance\/profiling.htm\" target=\"_blank\" rel=\"noopener noreferrer\">profiling<\/a>. Let&#8217;s do a simple example: We&#8217;ll create a simple table, select from that table and then check what happened in the background. As the first step, lets create the table and add one row:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; create table dummy ( a int, b varchar(20), c date );\nEXA: create table dummy ( a int, b varchar(20), c date );\n\nRows affected: 0\n\nSQL_EXA&gt; insert into dummy values (1,'aaaa',sysdate);\nEXA: insert into dummy values (1,'aaaa',sysdate);\n\nRows affected: 1\n<\/pre>\n<p>Now we want to profile a simple statement and that requires some steps to complete: Turn on profiling (which is turned off by default), disable the query cache (otherwise we might get the results for subsequent executions of the same SQL statement from the cache), flush the statistics, and finally the steps to get out the information so we see what happened:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; alter session set profile='on';\nEXA: alter session set profile='on';\n\nRows affected: 0\n\nSQL_EXA&gt; alter session set query_cache='off';\nEXA: alter session set query_cache='off';\n\nRows affected: 0\n\nSQL_EXA&gt; set autocommit off;\nSQL_EXA&gt; select count(*) from dummy;\nEXA: select count(*) from dummy;\n\nCOUNT(*)             \n---------------------\n                    1\n\n1 row in resultset.\n\nSQL_EXA&gt; flush statistics;\nEXA: flush statistics;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS from EXA_DBA_PROFILE_LAST_DAY where session_id=current_session and command_name = 'SELECT' preferring high stmt_id order by part_id; \nEXA: select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,...\n\nSTMT_ID      COMMAND_NAME                             PART_ID   PART_NAME            PART_INFO            OBJECT_SCHEMA   OBJECT_NAME                    OBJECT_ROWS           OUT_ROWS             \n------------ ---------------------------------------- --------- -------------------- -------------------- --------------- ------------------------------ --------------------- ---------------------\n           8 SELECT                                           1 COMPILE \/ EXECUTE                                                                                                                   \n           8 SELECT                                           2 SCAN                                      DEMO            DUMMY                                              1                     1\n           8 SELECT                                           3 GROUP BY             GLOBAL on TEMPORARY                  tmp_subselect0                                     0                     1\n                                                                                     table                                                                                                          \n\n3 rows in resultset.\n\nSQL_EXA&gt; \n<\/pre>\n<p>What can we see here: Three steps have been executed:<\/p>\n<ol>\n<li>COMPILE \/ EXECUTE: This is always the first step, the time it takes to compile the query<\/li>\n<li>SCAN: This step does the scan over the dummy table<\/li>\n<li>GROUP BY: Finally a group by to get the aggregation<\/li>\n<\/ol>\n<p>There is much more information in <a href=\"https:\/\/docs.exasol.com\/sql_references\/metadata\/statistical_system_table.htm?Highlight=EXA_DBA_PROFILE_LAST_DAY#EXA_DBA_PROFILE_LAST_DAY\" target=\"_blank\" rel=\"noopener noreferrer\">EXA_DBA_PROFILE_LAST_DAY<\/a>, but we&#8217;ll ignore that for now. Now, that we know how to get that information we can go back to our initial query, what do we see here?<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select sample.id, sample.firstname \n              from sample join sample2 on sample.id = sample2.id \n             where sample.id &lt; 20;\nEXA: select sample.id, sample.firstname from sample join sample2 on sample....\n\nID                    FIRSTNAME           \n--------------------- --------------------\n                    1 firstname1          \n                    2 firstname2          \n                    3 firstname3          \n                    4 firstname4          \n                    5 firstname5          \n                    6 firstname6          \n                    7 firstname7          \n                    8 firstname8          \n                    9 firstname9          \n                   10 firstname10         \n                   11 firstname11         \n                   12 firstname12         \n                   13 firstname13         \n                   14 firstname14         \n                   15 firstname15         \n                   16 firstname16         \n                   17 firstname17         \n                   18 firstname18         \n                   19 firstname19         \n\n19 rows in resultset.\n\nSQL_EXA&gt; flush statistics;\nEXA: flush statistics;\n\n\nRows affected: 0\n\nSQL_EXA&gt; \nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION from EXA_DBA_PROFILE_LAST_DAY where session_id=current_session and command_name = 'SELECT' preferring high stmt_id order by part_id; \nEXA: select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,...\n\nSTMT_ID      COMMAND_NAME                             PART_ID   PART_NAME            PART_INFO            OBJECT_SCHEMA   OBJECT_NAME                    OBJECT_ROWS           OUT_ROWS              DURATION   \n------------ ---------------------------------------- --------- -------------------- -------------------- --------------- ------------------------------ --------------------- --------------------- -----------\n          43 SELECT                                           1 COMPILE \/ EXECUTE                                                                                                                          0.199\n          43 SELECT                                           2 SCAN                                      DEMO            SAMPLE                                      10000000                    19       0.028\n          43 SELECT                                           3 JOIN                 GLOBAL               DEMO            SAMPLE2                                     10000000                    19       0.000\n          43 SELECT                                           4 INSERT               on TEMPORARY table                   tmp_subselect0                                     0                    19       0.000\n\n4 rows in resultset.\n<\/pre>\n<p>What you usually don&#8217;t want to see is a global join but exactly this is happening (I&#8217;ve also added the duration so we can compare timings later). If a join partner can not be found on the local node a global joins happens (communication with the node that holds the join partner) and this introduces network traffic on the private network, and that takes time.<\/p>\n<p>Lets change the distribution key of the two tables (this will re-write the table, of course), so that IDs with the same hash will located on the same node:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; alter table sample distribute by id;\nEXA: alter table sample distribute by id;\n\nRows affected: 0\n\nSQL_EXA&gt; alter table sample2 distribute by id;\nEXA: alter table sample2 distribute by id;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n<\/pre>\n<p>Does that change the execution steps?<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select sample.id, sample.firstname \n              from sample join sample2 on sample.id = sample2.id \n             where sample.id &lt; 20;\nEXA: select sample.id, sample.firstname from sample join sample2 on sample....\n\nID                    FIRSTNAME           \n--------------------- --------------------\n                    1 firstname1          \n                    2 firstname2          \n                    3 firstname3          \n                    4 firstname4          \n                    5 firstname5          \n                    6 firstname6          \n                    7 firstname7          \n                    8 firstname8          \n                    9 firstname9          \n                   10 firstname10         \n                   11 firstname11         \n                   12 firstname12         \n                   13 firstname13         \n                   14 firstname14         \n                   15 firstname15         \n                   16 firstname16         \n                   17 firstname17         \n                   18 firstname18         \n                   19 firstname19         \n\n19 rows in resultset.\n\n\nSQL_EXA&gt; flush statistics;\nEXA: flush statistics;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION from EXA_DBA_PROFILE_LAST_DAY where session_id=current_session and command_name = 'SELECT' preferring high stmt_id order by part_id; \nEXA: select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,...\n\nSTMT_ID      COMMAND_NAME                             PART_ID   PART_NAME            PART_INFO            OBJECT_SCHEMA   OBJECT_NAME                    OBJECT_ROWS           OUT_ROWS              DURATION   \n------------ ---------------------------------------- --------- -------------------- -------------------- --------------- ------------------------------ --------------------- --------------------- -----------\n          51 SELECT                                           1 COMPILE \/ EXECUTE                                                                                                                          0.011\n          51 SELECT                                           2 SCAN                                      DEMO            SAMPLE                                      10000000                    19       0.028\n          51 SELECT                                           3 JOIN                                      DEMO            SAMPLE2                                     10000000                    19       0.000\n          51 SELECT                                           4 INSERT               on TEMPORARY table                   tmp_subselect0                                     0                    19       0.000\n\n4 rows in resultset.\n<\/pre>\n<p>Yes, definitely. We do not see a global join anymore, but a local join and this improves performance. So, with Exasol, you should avoid global joins, because that increases traffic on the private network. Try to distribute the data on the join columns for getting local joins. <\/p>\n<p>Btw: You can see the distribution keys if you do a full describe against a table:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; desc full sample;\nEXA: desc full sample;\n\nCOLUMN_NAME                    SQL_TYPE                                 NULLABLE DISTRIBUTION_KEY PARTITION_KEY    COLUMN_COMMENT                                    \n------------------------------ ---------------------------------------- -------- ---------------- ---------------- --------------------------------------------------\nID                             DECIMAL(18,0)                            FALSE    TRUE             FALSE                                                              \nFIRSTNAME                      VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nLASTNAME                       VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nEMAIL                          VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nSTREET                         VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nCOUNTRY                        VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nDESCRIPTION                    VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                        \n<\/pre>\n<p>To show the effect on global joins on the network, lets redo the example, but this time also add the NET column from <a href=\"https:\/\/docs.exasol.com\/sql_references\/metadata\/statistical_system_table.htm?Highlight=EXA_DBA_PROFILE_LAST_DAY#EXA_DBA_PROFILE_LAST_DAY\" target=\"_blank\" rel=\"noopener noreferrer\">EXA_DBA_PROFILE_LAST_DAY<\/a>, which shows the network traffic in MiB per second. An easy way to get to our initial state is to drop the distribution keys and then manually re-organize the tables:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; alter table sample drop distribution keys;\nEXA: alter table sample drop distribution keys;\n\nRows affected: 0\n\nSQL_EXA&gt; alter table sample2 drop distribution keys;\nEXA: alter table sample2 drop distribution keys;\n\nRows affected: 0\n\nSQL_EXA&gt; reorganize table sample;\nEXA: reorganize table sample;\n\nRows affected: 0\n\nSQL_EXA&gt; reorganize table sample2;\nEXA: reorganize table sample2;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; desc full sample;\nEXA: desc full sample;\n\nCOLUMN_NAME                    SQL_TYPE                                 NULLABLE DISTRIBUTION_KEY PARTITION_KEY    COLUMN_COMMENT                                    \n------------------------------ ---------------------------------------- -------- ---------------- ---------------- --------------------------------------------------\nID                             DECIMAL(18,0)                            FALSE    FALSE            FALSE                                                              \nFIRSTNAME                      VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nLASTNAME                       VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nEMAIL                          VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nSTREET                         VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nCOUNTRY                        VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nDESCRIPTION                    VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \n\n7 rows in resultset.\n\nSQL_EXA&gt; desc full sample2;\nEXA: desc full sample2;\n\nCOLUMN_NAME                    SQL_TYPE                                 NULLABLE DISTRIBUTION_KEY PARTITION_KEY    COLUMN_COMMENT                                    \n------------------------------ ---------------------------------------- -------- ---------------- ---------------- --------------------------------------------------\nID                             DECIMAL(18,0)                            FALSE    FALSE            FALSE                                                              \nFIRSTNAME                      VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nLASTNAME                       VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nEMAIL                          VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nSTREET                         VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nCOUNTRY                        VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \nDESCRIPTION                    VARCHAR(20) UTF8                         TRUE     FALSE            FALSE                                                              \n\n7 rows in resultset.\n<\/pre>\n<p>This is how we started. Let&#8217;s have a look at the first case once more (please note that I&#8217;ll execute the statement twice, as the first execution will create the index), but we also change the statement to go through the whole table:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; select count(*)\n           from sample join sample2 on sample.id = sample2.id;\nEXA: select count(*)...\n\nCOUNT(*)             \n---------------------\n             10000000\n\n1 row in resultset.\n\nSQL_EXA&gt; select count(*)\n           from sample join sample2 on sample.id = sample2.id;\nEXA: select count(*)...\n\nCOUNT(*)             \n---------------------\n             10000000\n\n1 row in resultset.\n\nSQL_EXA&gt; flush statistics;\nEXA: flush statistics;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION,NET from EXA_DBA_PROFILE_LAST_DAY where session_id=current_session and command_name = 'SELECT' preferring high stmt_id order by part_id;\nEXA: select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,...\n\nSTMT_ID      COMMAND_NAME                             PART_ID   PART_NAME            PART_INFO            OBJECT_SCHEMA   OBJECT_NAME                    OBJECT_ROWS           OUT_ROWS              DURATION    NET      \n------------ ---------------------------------------- --------- -------------------- -------------------- --------------- ------------------------------ --------------------- --------------------- ----------- ---------\n         123 SELECT                                           1 COMPILE \/ EXECUTE                                                                                                                          0.031     129.9\n         123 SELECT                                           2 SCAN                                      DEMO            SAMPLE                                      10000000              10000000       0.007     166.4\n         123 SELECT                                           3 JOIN                 GLOBAL               DEMO            SAMPLE2                                     10000000              10000000       1.128          \n         123 SELECT                                           4 GROUP BY             GLOBAL on TEMPORARY                  tmp_subselect0                                     0                     1       0.009     155.6\n                                                                                     table                                                                                                                                \n\n4 rows in resultset.\n\nSQL_EXA&gt; \n<\/pre>\n<p>Changing the distribution keys again, and re-do the test:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1\">\nSQL_EXA&gt; alter table sample distribute by id;\nEXA: alter table sample distribute by id;\n\nRows affected: 0\n\nSQL_EXA&gt; alter table sample2 distribute by id;\nEXA: alter table sample2 distribute by id;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select count(*)\n           from sample join sample2 on sample.id = sample2.id;\nEXA: select count(*)...\n\nCOUNT(*)             \n---------------------\n             10000000\n\n1 row in resultset.\n\nSQL_EXA&gt; select count(*)\n           from sample join sample2 on sample.id = sample2.id;\nEXA: select count(*)...\n\nCOUNT(*)             \n---------------------\n             10000000\n\n1 row in resultset.\n\nSQL_EXA&gt; flush statistics;\nEXA: flush statistics;\n\nRows affected: 0\n\nSQL_EXA&gt; commit;\nEXA: commit;\n\nRows affected: 0\n\nSQL_EXA&gt; select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION,NET from EXA_DBA_PROFILE_LAST_DAY where session_id=current_session and command_name = 'SELECT' preferring high stmt_id order by part_id;\nEXA: select STMT_ID,COMMAND_NAME,PART_ID,PART_NAME,PART_INFO,OBJECT_SCHEMA,...\n\nSTMT_ID      COMMAND_NAME                             PART_ID   PART_NAME            PART_INFO            OBJECT_SCHEMA   OBJECT_NAME                    OBJECT_ROWS           OUT_ROWS              DURATION    NET      \n------------ ---------------------------------------- --------- -------------------- -------------------- --------------- ------------------------------ --------------------- --------------------- ----------- ---------\n         131 SELECT                                           1 COMPILE \/ EXECUTE                                                                                                                          0.006       0.2\n         131 SELECT                                           2 SCAN                                      DEMO            SAMPLE                                      10000000              10000000       0.005       0.0\n         131 SELECT                                           3 JOIN                                      DEMO            SAMPLE2                                     10000000              10000000       0.330          \n         131 SELECT                                           4 GROUP BY             GLOBAL on TEMPORARY                  tmp_subselect0                                     0                     1       0.004       0.1\n                                                                                     table                                                                                                                                \n\n4 rows in resultset.\n\nSQL_EXA&gt; \n\n<\/pre>\n<p>We come down to almost no network traffic and this is what you want to see. <\/p>\n<p>In the next post we&#8217;ll look into partitioning.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After the previous posts about Exasol, which have been more around installation, administration, backup &amp; restore, loading data and general notes around transactions and sessions we&#8217;ll now go into more details how Exasol manages data. Here is the list of the previous posts: Getting started with Exasol \u2013 Setting up an environment Getting started with [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1865,229,198],"tags":[133,2127,2128],"type_dbi":[],"class_list":["post-14898","post","type-post","status-publish","format-standard","hentry","category-aws","category-database-administration-monitoring","category-database-management","tag-aws","tag-exasol","tag-mpp"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.6) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Getting started with Exasol - Distribution keys - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Getting started with Exasol - Distribution keys\" \/>\n<meta property=\"og:description\" content=\"After the previous posts about Exasol, which have been more around installation, administration, backup &amp; restore, loading data and general notes around transactions and sessions we&#8217;ll now go into more details how Exasol manages data. Here is the list of the previous posts: Getting started with Exasol \u2013 Setting up an environment Getting started with [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-11-03T14:31:50+00:00\" \/>\n<meta name=\"author\" content=\"Daniel Westermann\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@westermanndanie\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Westermann\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/\"},\"author\":{\"name\":\"Daniel Westermann\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d08e9bd996a89bd75c0286cbabf3c66\"},\"headline\":\"Getting started with Exasol &#8211; Distribution keys\",\"datePublished\":\"2020-11-03T14:31:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/\"},\"wordCount\":947,\"commentCount\":0,\"keywords\":[\"AWS\",\"Exasol\",\"MPP\"],\"articleSection\":[\"AWS\",\"Database Administration &amp; Monitoring\",\"Database management\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/\",\"name\":\"Getting started with Exasol - Distribution keys - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"datePublished\":\"2020-11-03T14:31:50+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d08e9bd996a89bd75c0286cbabf3c66\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/getting-started-with-exasol-distribution-keys\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Getting started with Exasol &#8211; Distribution keys\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d08e9bd996a89bd75c0286cbabf3c66\",\"name\":\"Daniel Westermann\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"caption\":\"Daniel Westermann\"},\"description\":\"Daniel Westermann is Principal Consultant and Technology Leader Open Infrastructure at dbi services. He has more than 15 years of experience in management, engineering and optimization of databases and infrastructures, especially on Oracle and PostgreSQL. Since the beginning of his career, he has specialized in Oracle Technologies and is Oracle Certified Professional 12c and Oracle Certified Expert RAC\\\/GridInfra. Over time, Daniel has become increasingly interested in open source technologies, becoming \u201cTechnology Leader Open Infrastructure\u201d and PostgreSQL expert. \u00a0Based on community or EnterpriseDB tools, he develops and installs complex high available solutions with PostgreSQL. He is also a certified PostgreSQL Plus 9.0 Professional and a Postgres Advanced Server 9.4 Professional. He is a regular speaker at PostgreSQL conferences in Switzerland and Europe. Today Daniel is also supporting our customers on AWS services such as AWS RDS, database migrations into the cloud, EC2 and automated infrastructure management with AWS SSM (System Manager). He is a certified AWS Solutions Architect Professional. Prior to dbi services, Daniel was Management System Engineer at LC SYSTEMS-Engineering AG in Basel. Before that, he worked as Oracle Developper &amp;\u00a0Project Manager at Delta Energy Solutions AG in Basel (today Powel AG). Daniel holds a diploma in Business Informatics (DHBW, Germany). His branch-related experience mainly covers the pharma industry, the financial sector, energy, lottery and telecommunications.\",\"sameAs\":[\"https:\\\/\\\/x.com\\\/westermanndanie\"],\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/daniel-westermann\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Getting started with Exasol - Distribution keys - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/","og_locale":"en_US","og_type":"article","og_title":"Getting started with Exasol - Distribution keys","og_description":"After the previous posts about Exasol, which have been more around installation, administration, backup &amp; restore, loading data and general notes around transactions and sessions we&#8217;ll now go into more details how Exasol manages data. Here is the list of the previous posts: Getting started with Exasol \u2013 Setting up an environment Getting started with [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/","og_site_name":"dbi Blog","article_published_time":"2020-11-03T14:31:50+00:00","author":"Daniel Westermann","twitter_card":"summary_large_image","twitter_creator":"@westermanndanie","twitter_misc":{"Written by":"Daniel Westermann","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/"},"author":{"name":"Daniel Westermann","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66"},"headline":"Getting started with Exasol &#8211; Distribution keys","datePublished":"2020-11-03T14:31:50+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/"},"wordCount":947,"commentCount":0,"keywords":["AWS","Exasol","MPP"],"articleSection":["AWS","Database Administration &amp; Monitoring","Database management"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/","url":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/","name":"Getting started with Exasol - Distribution keys - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"datePublished":"2020-11-03T14:31:50+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/getting-started-with-exasol-distribution-keys\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Getting started with Exasol &#8211; Distribution keys"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66","name":"Daniel Westermann","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","caption":"Daniel Westermann"},"description":"Daniel Westermann is Principal Consultant and Technology Leader Open Infrastructure at dbi services. He has more than 15 years of experience in management, engineering and optimization of databases and infrastructures, especially on Oracle and PostgreSQL. Since the beginning of his career, he has specialized in Oracle Technologies and is Oracle Certified Professional 12c and Oracle Certified Expert RAC\/GridInfra. Over time, Daniel has become increasingly interested in open source technologies, becoming \u201cTechnology Leader Open Infrastructure\u201d and PostgreSQL expert. \u00a0Based on community or EnterpriseDB tools, he develops and installs complex high available solutions with PostgreSQL. He is also a certified PostgreSQL Plus 9.0 Professional and a Postgres Advanced Server 9.4 Professional. He is a regular speaker at PostgreSQL conferences in Switzerland and Europe. Today Daniel is also supporting our customers on AWS services such as AWS RDS, database migrations into the cloud, EC2 and automated infrastructure management with AWS SSM (System Manager). He is a certified AWS Solutions Architect Professional. Prior to dbi services, Daniel was Management System Engineer at LC SYSTEMS-Engineering AG in Basel. Before that, he worked as Oracle Developper &amp;\u00a0Project Manager at Delta Energy Solutions AG in Basel (today Powel AG). Daniel holds a diploma in Business Informatics (DHBW, Germany). His branch-related experience mainly covers the pharma industry, the financial sector, energy, lottery and telecommunications.","sameAs":["https:\/\/x.com\/westermanndanie"],"url":"https:\/\/www.dbi-services.com\/blog\/author\/daniel-westermann\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/14898","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=14898"}],"version-history":[{"count":0,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/14898\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=14898"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=14898"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=14898"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=14898"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}