In the last posts in this little series we looked at how to get a ZFS file system up and running on a CentOS 7 host and how snapshots and clones can be used to simply processes such as testing and cloning PostgreSQL instances. In this post we’ll look at another feature of zfs: Compression.
The current status of my ZFS file systems is:
1 2 3 4 | [root@centos7 ~] zfs list NAME USED AVAIL REFER MOUNTPOINT pgpool 170M 9.46G 20.5K /pgpool pgpool /pgdata 169M 9.46G 169M /pgpool/pgdata |
To check if compression is enabled:
1 2 3 | [root@centos7 ~] zfs get compression pgpool /pgdata NAME PROPERTY VALUE SOURCE pgpool /pgdata compression off default |
Lets create another file system and enable compression for it:
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@centos7 ~] zfs create pgpool /pgdatacompressed [root@centos7 ~] zfs list NAME USED AVAIL REFER MOUNTPOINT pgpool 170M 9.46G 20.5K /pgpool pgpool /pgdata 169M 9.46G 169M /pgpool/pgdata pgpool /pgdatacompressed 19K 9.46G 19K /pgpool/pgdatacompressed [root@centos7 ~] zfs get compression pgpool /pgdatacompressed NAME PROPERTY VALUE SOURCE pgpool /pgdatacompressed compression off default [root@centos7 ~] zfs set compression=on pgpool /pgdatacompressed [root@centos7 ~] zfs get compression pgpool /pgdatacompressed NAME PROPERTY VALUE SOURCE pgpool /pgdatacompressed compression on local |
You can ask zfs to report the compression ratio for a file system:
1 2 3 4 | [root@centos7 ~] zfs get compressratio pgpool /pgdatacompressed NAME PROPERTY VALUE SOURCE pgpool /pgdatacompressed compressratio 1.00x - [root@centos7 ~] chown postgres:postgres /pgpool/pgdatacompressed/ |
The ratio is 1 which is because we do not have any data yet. Lets copy the PostgreSQL cluster from the uncompressed file system into our new compressed file system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | postgres@centos7: /home/postgres/ [PG1] cp - pr /pgpool/pgdata/ * /pgpool/pgdatacompressed/ postgres@centos7: /home/postgres/ [PG1] ls -l /pgpool/pgdatacompressed/ total 30 drwx------. 6 postgres postgres 6 Sep 29 14:00 base drwx------. 2 postgres postgres 54 Sep 29 14:27 global drwx------. 2 postgres postgres 3 Sep 28 15:11 pg_clog drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_commit_ts drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_dynshmem -rw-------. 1 postgres postgres 4468 Sep 28 15:11 pg_hba.conf -rw-------. 1 postgres postgres 1636 Sep 28 15:11 pg_ident.conf drwxr-xr-x. 2 postgres postgres 2 Sep 28 15:11 pg_log drwx------. 4 postgres postgres 4 Sep 28 15:11 pg_logical drwx------. 4 postgres postgres 4 Sep 28 15:11 pg_multixact drwx------. 2 postgres postgres 3 Sep 29 14:27 pg_notify drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_replslot drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_serial drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_snapshots drwx------. 2 postgres postgres 5 Sep 29 14:46 pg_stat drwx------. 2 postgres postgres 2 Sep 29 14:46 pg_stat_tmp drwx------. 2 postgres postgres 3 Sep 28 15:11 pg_subtrans drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_tblspc drwx------. 2 postgres postgres 2 Sep 28 15:11 pg_twophase -rw-------. 1 postgres postgres 4 Sep 28 15:11 PG_VERSION drwx------. 3 postgres postgres 8 Sep 29 14:26 pg_xlog -rw-------. 1 postgres postgres 88 Sep 28 15:11 postgresql.auto.conf -rw-------. 1 postgres postgres 21270 Sep 28 15:11 postgresql.conf -rw-------. 1 postgres postgres 69 Sep 29 14:27 postmaster.opts |
We already should see a difference, shouldn’t we?
1 2 3 | postgres@centos7: /home/postgres/ [PG1] df -h | grep pgdata pgpool /pgdata 9.6G 170M 9.4G 2% /pgpool/pgdata pgpool /pgdatacompressed 9.5G 82M 9.4G 1% /pgpool/pgdatacompressed |
Not bad, less than half of the size. We should see another compression ratio than 1 now:
1 2 3 | [root@centos7 ~] zfs get compressratio pgpool /pgdatacompressed NAME PROPERTY VALUE SOURCE pgpool /pgdatacompressed compressratio 1.93x - |
Lets generate some data in our two PostgreSQL instances and check the time it takes as well as the size of the file systems afterwards. As in the last post the second instance just gets a different port, everything else is identical:
1 2 3 4 5 | postgres@centos7: /home/postgres/ [PG1] pg_ctl start -D /pgpool/pgdata postgres@centos7: /home/postgres/ [PG1] sed -i 's/#port = 5432/port=5433/g' /pgpool/pgdatacompressed/postgresql .conf postgres@centos7: /home/postgres/ [PG1] FATAL: data directory "/pgpool/pgdatacompressed" has group or world access postgres@centos7: /home/postgres/ [PG1] chmod o-rwx,g-rwx /pgpool/pgdatacompressed/ postgres@centos7: /home/postgres/ [PG1] pg_ctl start -D /pgpool/pgdatacompressed/ |
This is the script to generate some data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | timing c postgres drop database if exists dataload; create database dataload; c dataload create table dataload ( a bigint , b varchar (100) , c timestamp ); with data_generator_num as ( select * from generate_series ( 1 , 1000000 ) nums ) insert into dataload select data_generator_num.nums , md5(data_generator_num.nums:: varchar ) , current_date +data_generator_num.nums from data_generator_num; |
I will run the script two times on each instance. For the instance on the uncompressed file system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | -- FIRST RUN postgres=# i generate_data.sql Timing is on . You are now connected to database "postgres" as user "postgres" . DROP DATABASE Time : 720.626 ms CREATE DATABASE Time : 4631.212 ms You are now connected to database "dataload" as user "postgres" . CREATE TABLE Time : 6.517 ms INSERT 0 1000000 Time : 28668.343 ms -- SECOND RUN dataload=# i generate_data.sql Timing is on . You are now connected to database "postgres" as user "postgres" . DROP DATABASE Time : 774.061 ms CREATE DATABASE Time : 2721.169 ms You are now connected to database "dataload" as user "postgres" . CREATE TABLE Time : 7.374 ms INSERT 0 1000000 Time : 32168.043 ms dataload=# |
For the instance on the compressed file system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | -- FIRST RUN postgres=# i generate_data.sql Timing is on . You are now connected to database "postgres" as user "postgres" . psql:generate_data.sql:3: NOTICE: database "dataload" does not exist, skipping DROP DATABASE Time : 0.850 ms CREATE DATABASE Time : 4281.965 ms You are now connected to database "dataload" as user "postgres" . CREATE TABLE Time : 5.120 ms INSERT 0 1000000 Time : 30606.966 ms -- SECOND RUN dataload=# i generate_data.sql Timing is on . You are now connected to database "postgres" as user "postgres" . DROP DATABASE Time : 2359.120 ms CREATE DATABASE Time : 3267.151 ms You are now connected to database "dataload" as user "postgres" . CREATE TABLE Time : 8.665 ms INSERT 0 1000000 Time : 23474.290 ms dataload=# |
Despite that the numbers are quite bad (5 seconds to create an empty table) the fastest load was the second one on the compressed file system. So at least it is not slower. I have to admit that I did not do any tuning on the file systems and my VM does not have much memory (512m) which is far too less if you work with ZFS (ZFS needs much memory, at least 1gb).
So, what about the size of the data. First lets check what PostgreSQL is telling us:
1 2 3 4 5 6 7 8 9 10 11 12 | -- instance on the uncompressed file system dataload=# select * from pg_size_pretty ( pg_relation_size ( 'dataload' )); pg_size_pretty ---------------- 81 MB (1 row) -- instance on the compressed file system dataload=# select * from pg_size_pretty ( pg_relation_size ( 'dataload' )); pg_size_pretty ---------------- 81 MB (1 row) |
Exactly the same, which is not surprising as PostgreSQL sees the files as if they would be uncompressed (please be aware that the my_app_table from the last post is still there which is why the file system usage in total is larger than you might expect). It is quite funny on how the size is reported on the compressed file system depending on how you ask.
You can use oid2name to map the file name to a table name:
1 2 3 4 5 | postgres@centos7:/pgpool/pgdatacompressed/base/24580/ [PG1] oid2name -d dataload -p 5433 -f 24581 From database "dataload" : Filenode Table Name ---------------------- 24581 dataload |
File 24581 is the table we generated. When you ask for the size by using “du” you get:
1 2 | postgres@centos7:/pgpool/pgdatacompressed/base/24580/ [PG1] du -h 24581 48M 24581 |
This is the compressed size. When you use “ls” you get the uncompressed size:
1 2 | postgres@centos7:/pgpool/pgdatacompressed/base/24580/ [PG1] ls -lh 24581 -rw -------. 1 postgres postgres 81M Sep 30 10:43 24581 |
What does “df” tell us:
1 2 3 | postgres@centos7: /home/postgres/ [PG1] df -h | grep pgdata pgpool /pgdata 9.5G 437M 9.1G 5% /pgpool/pgdata pgpool /pgdatacompressed 9.2G 165M 9.1G 2% /pgpool/pgdatacompressed |
Not bad, 437M of uncompressed data which is 165M compressed. So, if you are short on space this really can be an option.