There have been several posts on how to load 1m rows into a database in the last days:
Variations on 1M rows insert (1): bulk insert
Variations on 1M rows insert(2): commit write
Variations on 1M rows insert (1): bulk insert – PostgreSQL
Variations on 1M rows insert(2): commit write – PostgreSQL
Variations on 1M rows insert (3): TimesTen
In this post I’ll focus on how to prepare a PostgreSQL database for bulk loading in more detail. The test system is the same which was used in the previous posts.
cat /etc/oracle-release Oracle Linux Server release 7.1 cat /proc/cpuinfo | grep proc processor : 0 cat /proc/meminfo | head -1 MemTotal: 502612 kB
I did not tweak any kernel parameters, just the default:
cat /usr/lib/sysctl.d/00-system.conf | grep -v "#" net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 kernel.shmmax = 4294967295 kernel.shmall = 268435456
All is on standard xfs filesystems. No tweaking there, too:
df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/ol-root 48G 4.5G 44G 10% / devtmpfs 237M 0 237M 0% /dev tmpfs 246M 4.0K 246M 1% /dev/shm tmpfs 246M 4.4M 242M 2% /run tmpfs 246M 0 246M 0% /sys/fs/cgroup /dev/sda1 497M 255M 243M 52% /boot mount | grep /dev/mapper/ol-root /dev/mapper/ol-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
The postgresql version still is 9.4.1:
postgres=# select version(); version -------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.1dbi services on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit (1 row)
I’ll adjust several postgresql parameters to get best performance:
fsync='off' synchronous_commit='off' full_page_writes='off' bgwriter_lru_maxpages=0 wal_level='minimal' archive_mode='off' work_mem='64MB' checkpoint_segments=32 max_wal_senders=0 maintenance_work_mem='64MB' shared_buffers='128MB'
For a detailed description of the parameters head over to the official documentation. Just to be clear again: Turning fsync and synchronous_commit off is not crash safe. This means if the server crashes during the load for some reason the database will be corrupted. See the reference section at the end of this post for more details. The fastest method to load or unload data in PostgreSQL is the copy command. I’ll use the same file to load the data as in the previous post. Some notes about indexes and foreign keys: It is usually faster to create the indexes and foreign keys after the bulk load and to delete or disable them before the load. I’ll do a test case on this below.
Lets prepare the system:
postgres=# alter system set fsync='off'; ALTER SYSTEM postgres=# alter system set synchronous_commit='off'; ALTER SYSTEM postgres=# alter system set full_page_writes='off'; ALTER SYSTEM postgres=# alter system set bgwriter_lru_maxpages=0; ALTER SYSTEM postgres=# alter system set wal_level='minimal'; ALTER SYSTEM postgres=# alter system set archive_mode='off'; ALTER SYSTEM postgres=# alter system set work_mem='64MB'; ALTER SYSTEM postgres=# alter system set checkpoint_segments=32; ALTER SYSTEM postgres=# alter system set max_wal_senders=0; ALTER SYSTEM postgres=# alter system set maintenance_work_mem='64MB'; ALTER SYSTEM postgres=# alter system set shared_buffers='128MB'; ALTER SYSTEM
As some of the parameters require a restart of the database lets do this:
pg_ctl -D /u02/pgdata/PG1/ restart -m fast
Ready for the test cases. As said earlier I’ll use the same file to load the data (1000001 records):
postgres@oel7:/home/postgres/ [PG1] ls -la demo.txt -rw-r--r--. 1 postgres postgres 16690526 May 4 15:53 demo.txt postgres@oel7:/home/postgres/ [PG1] cat demo.txt | wc -l 1000001 postgres@oel7:/home/postgres/ [PG1] head -2 demo.txt 0 Marc 564 1 Bill 345
First test case: No indexes
The first test case will be without any indexes and looks like this:
\timing off drop table demo; create table demo ( id int , text varchar(15) , number int ); \timing on copy DEMO from '/home/postgres/demo.txt';
How long does it take? I’ll execute the test several times to get an average:
postgres=# \i 1mio_copy_no_indexes.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 467.219 ms postgres=# \i 1mio_copy_no_indexes.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 460.730 ms postgres=# \i 1mio_copy_no_indexes.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 680.538 ms postgres=# \i 1mio_copy_no_indexes.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 444.932 ms postgres=# select count(*) from demo; count --------- 1000001 (1 row) Time: 108.792 ms
Between 0,4 and and 0,7 seconds for 1000001 rows.
Second test case: Primary key exists prior to the load
This test case creates a primary key (and thus an index) before the load is started:
\timing off drop table demo; create table demo ( id int , text varchar(15) , number int , constraint demo_pk primary key ( id ) ); \timing on copy DEMO from '/home/postgres/demo.txt';
The results ( Again several executions to get an average ):
postgres=# \i 1mio_copy_pk_before_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 5721.706 ms postgres=# \i 1mio_copy_pk_before_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 6653.489 ms postgres=# \i 1mio_copy_pk_before_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 3979.691 ms
Something between 3,9 and 6,6 seconds.
Third test case: Primary key after the load
In this case the primary key will be added after the load:
\timing off drop table demo; create table demo ( id int , text varchar(15) , number int ); \timing on copy DEMO from '/home/postgres/demo.txt'; alter table demo add constraint demo_pk primary key (id);
The results ( Again several executions to get an average ):
postgres=# \i 1mio_copy_pk_after_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 519.758 ms ALTER TABLE Time: 555.851 ms postgres=# \i 1mio_copy_pk_after_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 626.097 ms ALTER TABLE Time: 654.467 ms postgres=# \i 1mio_copy_pk_after_load.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 756.139 ms ALTER TABLE Time: 694.572 ms
Something between 1,2 and 1,5 seconds.
Conclusion:
If you need indexes (and who does not) it is definitely faster do drop or disable from before the load and create them afterwards.
References:
Populating a Database
PostgreSQL as an In-Memory Only Database