The last post took a look at how to efficiently load 1m rows into a table in PostgreSQL. In this post I’ll take a look on how to do the same with semi structured data.
Again it is the same system as in the last post using the same PostgreSQL parameters.
Using the hstore datatype
For storing key/value pairs there is the hstore extension in PostgreSQL.
Using the same sample data as in the last post this is the script to load 1m of key/value rows:
\timing off create extension if not exists hstore; drop table demo_hstore; create table demo_hstore ( document hstore ); \timing on DO $$DECLARE l_people_array varchar(15)[12] := '{"Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy","Greg", "Steve", "Kristina", "Katie", "Jeff"}'; n int; lv_hstore_string varchar(80); BEGIN for i in 0..1e6 loop n:=trunc(random()*1000+1); lv_hstore_string := 'id => '||i||', text => "'||l_people_array[floor((random()*11))+1::int]||'", number => '||n; insert into DEMO_HSTORE values(lv_hstore_string::hstore); end loop; END$$;
Lets execute it:
postgres=# \i 1mio_rows_no_pk_hstore.sql Timing is off. psql:1mio_rows_no_pk_hstore.sql:2: NOTICE: extension "hstore" already exists, skipping CREATE EXTENSION DROP TABLE CREATE TABLE Timing is on. DO Time: 5833.749 ms postgres=# \i 1mio_rows_no_pk_hstore.sql Timing is off. psql:1mio_rows_no_pk_hstore.sql:2: NOTICE: extension "hstore" already exists, skipping CREATE EXTENSION DROP TABLE CREATE TABLE Timing is on. DO Time: 5741.429 ms postgres=# \i 1mio_rows_no_pk_hstore.sql Timing is off. psql:1mio_rows_no_pk_hstore.sql:2: NOTICE: extension "hstore" already exists, skipping CREATE EXTENSION DROP TABLE CREATE TABLE Timing is on. DO Time: 5789.512 ms
Around 6 seconds. Lets unload this with copy and reload it using copy:
postgres=# copy demo_hstore to '/home/postgres/demo_hstore.txt'; COPY 1000001 Time: 285.805 ms
Loading it with copy should be a lot faster:
\timing off drop table demo_hstore; create table demo_hstore ( document hstore ); \timing on copy DEMO_HSTORE from '/home/postgres/demo_hstore.txt';
How much faster?
postgres=# \i 1mio_copy_no_indexes_hstore.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 2736.709 ms postgres=# \i 1mio_copy_no_indexes_hstore.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 2882.477 ms postgres=# \i 1mio_copy_no_indexes_hstore.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 3626.356 ms
More than 2 times faster. Your data now looks like this:
postgres=# select * from demo_hstore limit 5; document ---------------------------------------------- "id"=>"0", "text"=>"Bill", "number"=>"424" "id"=>"1", "text"=>"George", "number"=>"681" "id"=>"2", "text"=>"Eliot", "number"=>"79" "id"=>"3", "text"=>"George", "number"=>"472" "id"=>"4", "text"=>"Greg", "number"=>"966" (5 rows) Time: 15.175 ms
Using the jsonb datatype
PostgreSQL provides two data types for storing json. json and jsonb. jsonb is the binary format data type introduced in 9.4 while json is based on text.
So, lets do the same tests:
\timing off drop table demo_jsonb; create table demo_jsonb ( document jsonb ); \timing on DO $$DECLARE l_people_array varchar(15)[12] := '{"Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy","Greg", "Steve", "Kristina", "Katie", "Jeff"}'; n int; lv_json_string varchar(80); BEGIN for i in 0..1e6 loop n:=trunc(random()*1000+1); lv_json_string := '{"id":"'||i||'", "text":"'||l_people_array[floor((random()*11))+1::int]||'", "number":"'||n||'"}'; insert into DEMO_JSONB values(lv_json_string::jsonb); end loop; END$$;
How long does it take in a loop? :
postgres=# \i 1mio_rows_no_pk_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. DO Time: 6229.134 ms postgres=# \i 1mio_rows_no_pk_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. DO Time: 6210.610 ms postgres=# \i 1mio_rows_no_pk_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. DO Time: 6237.734 ms
Around 6 seconds. And by using copy?
postgres=# copy demo_jsonb to '/home/postgres/demo_jsonb.txt'; COPY 1000001 Time: 501.950 ms
Lets load it:
postgres=# \i 1mio_copy_no_indexes_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 3739.681 ms postgres=# \i 1mio_copy_no_indexes_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 4369.161 ms postgres=# \i 1mio_copy_no_indexes_jsonb.sql Timing is off. DROP TABLE CREATE TABLE Timing is on. COPY 1000001 Time: 5000.136 ms
Between 4 and 5 seconds. And how does the data look like?
postgres=# select * from demo_jsonb limit 5; document ----------------------------------------------- {"id": "0", "text": "George", "number": "51"} {"id": "1", "text": "Marc", "number": "884"} {"id": "2", "text": "Bill", "number": "951"} {"id": "3", "text": "Steve", "number": "200"} {"id": "4", "text": "Eliot", "number": "722"} (5 rows)
How do you select on this table if you want to limit the results?
postgres=# SELECT document->'id', document->'text', document->'number' FROM demo_jsonb WHERE document @> '{"number": "884"}' limit 5; ?column? | ?column? | ?column? ----------+----------+---------- "1" | "Marc" | "884" "64" | "Katie" | "884" "201" | "Tracy" | "884" "1041" | "George" | "884" "2074" | "Greg" | "884" (5 rows)
Check the documentation for more examples and on how to index jsonb columns.
Looking forward to see Franck and David to load the same data …