There often is a discussion if tablespaces should be used in PostgreSQL. People used to work with Oracle are often a bit surprised by this discussion as there is no way around using tablespaces in that technology. PostgreSQL does not force you to use tablespaces at all and I am pretty sure that there are plenty of PostgreSQL installations out there that do not use them and never even considered using them and that is absolutely fine. In our PostgreSQL basics workshop we cover tablespaces but recommend not to use them if there are no good reasons. Implementing tablespaces in PostgreSQL just because you are forced to use them in other databases is not the way to go. In this post I’d like to talk about one reason where tablespaces can make sense, more precisely: temp_tablespaces.
As the name implies, temporary tablespaces are there for temporary objects. The question is then, what exactly is considered as a temporary object? Well, the obvious case is temporary tables. Without any temporary tablespaces, when you create a temporary table, the temporary files get created in the default tablespace of the current database. In a standard PostgreSQL setup it looks like this:
postgres=# l+ List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description -----------+----------+----------+-------------+-------------+-----------------------+---------+------------+-------------------------------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 872 MB | pg_default | default administrative connection database template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 7393 kB | pg_default | unmodifiable empty database | | | | | postgres=CTc/postgres | | | template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 7393 kB | pg_default | default template for new databases | | | | | postgres=CTc/postgres | | |
All the databases have a default tablespace called “pg_default” which is a kind of pseudo tablespace as it does not really exist. Asking the catalog about the location of that tablespace will show an empty location:
postgres=# select spcname AS "Name" postgres-# , pg_catalog.pg_get_userbyid(spcowner) AS "Owner" postgres-# , pg_catalog.pg_tablespace_location(oid) AS "Location" postgres-# from pg_catalog.pg_tablespace postgres-# where pg_catalog.pg_tablespace.spcname = 'pg_default' postgres-# order by 1; Name | Owner | Location ------------+----------+---------- pg_default | postgres | (1 row)
If we create temporary objects, where will the files be created then?
postgres=# create temporary table tmp1 ( a int, b text, c date ); CREATE TABLE postgres=# select pg_relation_filepath('tmp1'); pg_relation_filepath ---------------------- base/12732/t3_16436 (1 row)
This is the standard directory of my “postgres” database:
postgres@centos8pg:/home/postgres/ [pgdev] cd $PGDATA postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] oid2name All databases: Oid Database Name Tablespace ---------------------------------- 12732 postgres pg_default 12731 template0 pg_default 1 template1 pg_default postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l base/12732/t3_16436 -rw-------. 1 postgres postgres 0 Mar 12 18:17 base/12732/t3_16436
So, by default, files required for temporary tables go to the same location as all the other files that make up the specific database. If we populate the temporary table the files will grow, of course:
postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,100) i; INSERT 0 100 postgres=# ! ls -la $PGDATA/base/12732/t3_16436 -rw-------. 1 postgres postgres 8192 Mar 12 18:41 /u02/pgdata/DEV/base/12732/t3_16436 postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,1000) i; INSERT 0 1000 postgres=# ! ls -la $PGDATA/base/12732/t3_16436 -rw-------. 1 postgres postgres 49152 Mar 12 18:42 /u02/pgdata/DEV/base/12732/t3_16436 postgres=#
The consequence of that is, that I/O for temporary tables will compete with I/O for all the other objects in this PostgreSQL cluster. Another consequence is, that temporary tables can potentially fill your file system until it is full and no one will be be able to work from then on. This is the first reason for creating one or more dedicated temporary tablespaces: By doing this you can avoid that temporary tables going crazy impact your whole cluster as long as the temporary tablespace is on it’s own file system.
Creating a temporary tablespace is not different from creating a normal tablespace as it is actually exactly the same:
postgres=# ! mkdir /var/tmp/tbstmp postgres=# create tablespace tbstmp location '/var/tmp/tbstmp'; CREATE TABLESPACE postgres=# db+ List of tablespaces Name | Owner | Location | Access privileges | Options | Size | Description ------------+----------+-----------------+-------------------+---------+---------+------------- pg_default | postgres | | | | 886 MB | pg_global | postgres | | | | 575 kB | tbstmp | postgres | /var/tmp/tbstmp | | | 0 bytes | (3 rows)
Once we have the new tablespace we can tell PostgreSQL to use it as the default for temporary objects:
postgres=# alter system set temp_tablespaces = 'tbstmp'; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row) postgres=# show temp_tablespaces ; temp_tablespaces ------------------ tbstmp (1 row)
Creating another temporary table will result in the files being created in the new location:
postgres=# create temporary table tmp2 ( a int, b text, c date ); CREATE TABLE postgres=# select pg_relation_filepath('tmp2'); pg_relation_filepath ------------------------------------------------ pg_tblspc/16442/PG_13_202003051/12732/t3_16443 (1 row) postgres=# ! ls -la $PGDATA/pg_tblspc/ total 4 drwx------. 2 postgres postgres 19 Mar 12 18:50 . drwx------. 20 postgres postgres 4096 Mar 12 18:54 .. lrwxrwxrwx. 1 postgres postgres 15 Mar 12 18:50 16442 -> /var/tmp/tbstmp postgres=# ! ls -la $PGDATA/pg_tblspc/16442/ total 0 drwx------. 3 postgres postgres 29 Mar 12 18:50 . drwxrwxrwt. 7 root root 163 Mar 12 18:49 .. drwx------. 3 postgres postgres 19 Mar 12 18:53 PG_13_202003051 postgres=# ! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/ total 0 drwx------. 3 postgres postgres 19 Mar 12 18:53 . drwx------. 3 postgres postgres 29 Mar 12 18:50 .. drwx------. 2 postgres postgres 54 Mar 12 18:53 12732 postgres=# ! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/ total 8 drwx------. 2 postgres postgres 54 Mar 12 18:53 . drwx------. 3 postgres postgres 19 Mar 12 18:53 .. -rw-------. 1 postgres postgres 0 Mar 12 18:53 t3_16443 -rw-------. 1 postgres postgres 0 Mar 12 18:53 t3_16446 -rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448
If you are wondering why there are three files, here is the answer:
postgres=# select relname from pg_class where oid in (16443,16446,16448); relname ---------------------- pg_toast_16443 pg_toast_16443_index tmp2 (3 rows)
The toast objects get created as well as I have a “text” column in my temporary table. Creating a temporary table with data types that do not require toast objects will result in one file only:
postgres=# create temporary table tmp3 ( a int, b date ); CREATE TABLE postgres=# select pg_relation_filepath('tmp3'); pg_relation_filepath ------------------------------------------------ pg_tblspc/16442/PG_13_202003051/12732/t3_16449 (1 row) postgres=# ! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/ total 8 drwx------. 2 postgres postgres 70 Mar 12 19:07 . drwx------. 3 postgres postgres 19 Mar 12 18:53 .. -rw-------. 1 postgres postgres 0 Mar 12 18:53 t3_16443 -rw-------. 1 postgres postgres 0 Mar 12 18:53 t3_16446 -rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448 -rw-------. 1 postgres postgres 0 Mar 12 19:07 t3_16449
So for now we know that all temporary tables will go to the new temporary table space. What else will go there from now on. There is a parameter log_temp_files which can be used to report temp file usage into the PostgreSQL log file and this comes quite handy if you want to know what goes there. The default setting is “-1” which means of, “0” means log everything, all other values greater than “1” specify the minimum size of the temp files for being logged. Setting it to “0” will, as said, log all temp files being created in the background so let’s do that:
postgres=# alter system set log_temp_files = 0; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row)
Creating another temporary table and then checking the log file will confirm that this is working and we’ll get the information we want:
postgres=# create temporary table tmp4 as select * from generate_series(1,1000000); SELECT 1000000
The entry in the log file will look like this and it confirms that the temporary files have been written to the temporary tablespsace we created above:
2020-03-13 02:33:35.658 CET - 1 - 10535 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10535.0", size 14000000 2020-03-13 02:33:35.658 CET - 2 - 10535 - [local] - postgres@postgres STATEMENT: create temporary table tmp4 as select * from generate_series(1,1000000);
But: If you create, with the default configuration of PostgreSQL, this temporary table:
postgres=# create temporary table tmp4 as select * from generate_series(1,100000); SELECT 10000
… you will not see any lines in the log file for this. Why? Because there is temp_buffers and temporary files will be only be reported in the log file if they exceed the value of this parameter. In the default configuration this is ‘8MB’ and that is not enough for the smaller temporary table to be logged. Decreasing the parameter will log the temporary files for the smaller table as well:
postgres=# set temp_buffers = '1024kB'; SET postgres=# create temporary table tmp5 as select * from generate_series(1,100000); SELECT 100000
So now we know how to log the creation of temporary files to the PostgreSQL log file. What other operations will cause temporary file to be created? Sorts?
postgres=# set work_mem = '64kB'; SET postgres=# select * from generate_series(1,1000000) order by random();
Yes, definitely:
2020-03-13 02:47:14.297 CET - 19 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.2", size 26083328 2020-03-13 02:47:14.297 CET - 20 - 10609 - [local] - postgres@postgres STATEMENT: select * from generate_series(1,1000000) order by random(); 2020-03-13 02:47:14.298 CET - 21 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1", size 14000000 2020-03-13 02:47:14.298 CET - 22 - 10609 - [local] - postgres@postgres STATEMENT: select * from generate_series(1,1000000) order by random(); 2020-03-13 02:47:14.298 CET - 23 - 10609 - [local] - postgres@postgres LOG: duration: 2994.386 ms statement: select * from generate_series(1,1000000) order by random();
What about creating indexes?
postgres=# create table tt1 as select * from generate_series(1,1000000); SELECT 1000000 postgres=# create index ii1 on tt1(generate_series ); CREATE INDEX
Yes, that creates temporary files as well:
2020-03-13 02:54:00.933 CET - 33 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/0.0", size 12394496 2020-03-13 02:54:00.933 CET - 34 - 10609 - [local] - postgres@postgres STATEMENT: create index ii1 on tt1(generate_series ); 2020-03-13 02:54:00.934 CET - 35 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/1.0", size 7659520 2020-03-13 02:54:00.934 CET - 36 - 10609 - [local] - postgres@postgres STATEMENT: create index ii1 on tt1(generate_series ); 2020-03-13 02:54:00.948 CET - 37 - 10609 - [local] - postgres@postgres LOG: duration: 1149.625 ms statement: create index ii1 on tt1(generate_series );
Foreign keys?
postgres=# create table ttt1 as select * from generate_series(1,1000000) a; SELECT 1000000 postgres=# create unique index iii1 on ttt1(a); CREATE INDEX postgres=# insert into ttt2 select a,a from generate_series(1,1000000) a; INSERT 0 1000000 postgres=# alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a); ALTER TABLE
Yes, that as well:
2020-03-13 03:01:07.127 CET - 65 - 10609 - [local] - postgres@postgres LOG: duration: 1127.768 ms statement: alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a); 2020-03-13 03:01:15.375 CET - 66 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6", size 67374
So quite a few operations that generate temporary files in background. Separating this on a separate mount point actually can make a lot of sense. From a performance perspective (if I/O spread on the storage layer as well) but also from a security perspective as huge operations that require temporary files will not affect “normal” operations on the instance.
There is another case generating temporary files which is not maybe not clear to everybody. Consider this:
postgres=# create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b; SELECT 1000000
This will create many temporary files in the background as well:
... 2020-03-13 03:11:03.721 CET - 4146 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1289", size 22704 2020-03-13 03:11:03.721 CET - 4147 - 10609 - [local] - postgres@postgres STATEMENT: create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b; 2020-03-13 03:11:03.722 CET - 4148 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.143", size 23136 2020-03-13 03:11:03.722 CET - 4149 - 10609 - [local] - postgres@postgres STATEMENT: create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b; 2020-03-13 03:11:03.723 CET - 4150 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1166", size 23136 2020-03-13 03:11:03.723 CET - 4151 - 10609 - [local] - postgres@postgres STATEMENT: create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b; 2020-03-13 03:11:03.723 CET - 4152 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.792", size 23640 ...
And even refreshes consume temporary files:
postgres=# create unique index mv_i1 on mv1(a); CREATE INDEX postgres=# refresh materialized view concurrently mv1; REFRESH MATERIALIZED VIEW postgres=#
From the log file:
... 2020-03-13 03:14:05.866 CET - 20543 - 10609 - [local] - postgres@postgres STATEMENT: refresh materialized view concurrently mv1; 2020-03-13 03:14:05.866 CET - 20544 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.8193", size 26768 2020-03-13 03:14:05.866 CET - 20545 - 10609 - [local] - postgres@postgres CONTEXT: SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid" 2020-03-13 03:14:05.866 CET - 20546 - 10609 - [local] - postgres@postgres STATEMENT: refresh materialized view concurrently mv1; 2020-03-13 03:14:05.866 CET - 20547 - 10609 - [local] - postgres@postgres LOG: temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6147", size 28487 2020-03-13 03:14:05.866 CET - 20548 - 10609 - [local] - postgres@postgres CONTEXT: SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid" ...
There are more operations that require temporary files in the background but for the scope of this post we stop here. One last thing: The amount of temporary files generated can also be limited by temp_file_limit:
postgres=# set temp_file_limit='1MB'; SET postgres=# refresh materialized view concurrently mv1; ERROR: temporary file size exceeds temp_file_limit (1024kB) postgres=#