In PostgreSQL a row or tuple can not span multiple pages (a page is typically 8kB), but of course you can store larger rows and PostgreSQL brakes and compresses these rows into smaller chunks by using a technique called TOAST. Once your table contains a toast-able data type a so-called toast table is created automatically. Up to PostgreSQL 13 you had no choice how the data is compressed, but a recent commit brings the option to use LZ4 as compression method for TOAST. As you can read in the Wikipedia article, LZ4 is all about compression and decompression speed, so let’s have a look.
If you want to make use of this new feature, you need to compile PostgreSQL with support for it:
postgres@debian10pg:/home/postgres/postgresql/ [pgdev] ./configure --help | grep LZ4 --with-lz4 build with LZ4 support LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config LZ4_LIBS linker flags for LZ4, overriding pkg-config
For that to work, the operating system needs to have the corresponding libraries installed. For Debian this is:
postgres@debian10pg:/home/postgres/postgresql/ [pgdev] apt search liblz4-dev Sorting... Done Full Text Search... Done liblz4-dev/stable,now 1.8.3-1 amd64 [installed] Fast LZ compression algorithm library - development files
Having that ready and PostgreSQL compiled and installed, lets create two tables: One with the default compression and one with the new LZ4 compression:
postgres=# create table t1 ( a text ); CREATE TABLE postgres=# create table t2 ( a text compression LZ4 ); CREATE TABLE postgres=# d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+------+-----------+----------+---------+----------+-------------+--------------+------------- a | text | | | | extended | pglz | | Access method: heap postgres=# d+ t2 Table "public.t2" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+------+-----------+----------+---------+----------+-------------+--------------+------------- a | text | | | | extended | lz4 | | Access method: heap
Both tables automatically got a toast table attached:
postgres=# select reltoastrelid from pg_class where relname in ('t1','t2'); reltoastrelid --------------- 16387 16392 (2 rows) postgres=# select oid,relname from pg_class where oid in (16387,16392 ); oid | relname -------+---------------- 16387 | pg_toast_16384 16392 | pg_toast_16389 (2 rows) postgres=# d+ pg_toast.pg_toast_16384 TOAST table "pg_toast.pg_toast_16384" Column | Type | Storage ------------+---------+--------- chunk_id | oid | plain chunk_seq | integer | plain chunk_data | bytea | plain Owning table: "public.t1" Indexes: "pg_toast_16384_index" PRIMARY KEY, btree (chunk_id, chunk_seq) Access method: heap postgres=# d+ pg_toast.pg_toast_16389 TOAST table "pg_toast.pg_toast_16389" Column | Type | Storage ------------+---------+--------- chunk_id | oid | plain chunk_seq | integer | plain chunk_data | bytea | plain Owning table: "public.t2" Indexes: "pg_toast_16389_index" PRIMARY KEY, btree (chunk_id, chunk_seq) Access method: heap
Let’s check if there is a difference if we populate those tables with some dummy data:
postgres=# timing on Timing is on. postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 4643.583 ms (00:04.644) postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 314.107 ms postgres=# truncate table t1,t2; TRUNCATE TABLE Time: 4143.579 ms (00:04.144) postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 4759.809 ms (00:04.760) postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 1011.221 ms (00:01.011) postgres=# truncate table t1,t2; TRUNCATE TABLE Time: 41.449 ms postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 4507.416 ms (00:04.507) postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000); INSERT 0 1000 Time: 299.458 ms
This is a huge difference, and I’ve repeated that test several times and got almost the same numbers. That’s really a great improvement regarding speed. But what about the size on disk?
postgres=# select relname,reltoastrelid from pg_class where relname in ('t1','t2'); relname | reltoastrelid ---------+--------------- t1 | 16387 t2 | 16392 (2 rows) Time: 3.091 ms postgres=# select oid,relname from pg_class where oid in (16387,16392); oid | relname -------+---------------- 16387 | pg_toast_16384 16392 | pg_toast_16389 (2 rows) Time: 0.836 ms postgres=# select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16384')); pg_size_pretty ---------------- 12 MB (1 row) Time: 0.506 ms postgres=# select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16389')); pg_size_pretty ---------------- 4000 kB (1 row) Time: 0.743 ms
Quite impressive. In addition to the speed, we also get a great reduction on disk. Because compression is better with LZ4, we see fewer rows in the toast table for t2:
postgres=# select count(*) from pg_toast.pg_toast_16384; count ------- 6000 (1 row) Time: 5.807 ms postgres=# select count(*) from pg_toast.pg_toast_16389; count ------- 2000 (1 row)
Of course the string I used here is not very representative, but this new feature really looks promising. There is also a new parameter if you want to change to this behavior globally;
postgres=# show default_toast_compression ; default_toast_compression --------------------------- pglz (1 row) postgres=# alter system set default_toast_compression = 'lz4'; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row)