Another great feature just got committed for PostgreSQL 17 and hopefully will not be reverted until the final release later this year. You probably already know copy. Whenever you want to load something into or unload something out of PostgreSQL, this is the tool to use. When loading into PostgreSQL there is something to consider up to PostgreSQL 16 which might be very annoying: Whenever there is a line copy cannot write into the target table the whole process will be aborted and nothing will be loaded at all. Maybe this is something you want to have, but for other use cases the preferred behavior would be either to just ignore those lines and/or write the error messages somewhere else. In both cases you would not loose the rest of the data and don’t need to start from scratch after fixing the data to be loaded.
As usual lets start with a little test case: Suppose we have this text file:
1 2 3 4 5 6 7 8 9 | postgres=# \! cat data.txt 1 aaaa 1 aaaa 2 bbbb 2 bbbb 3 cccc 3 cccc 4 dddd 4 dddd 5 eeee 5 eeee 6 ffff 6 ffff 7 gggg 7 gggg 8 hhhh 8 hhhh |
A table which can hold this data might look like this:
1 2 3 4 5 6 7 8 9 10 | postgres=# create table import ( a int , b varchar (4), c int , d varchar (4)); CREATE TABLE postgres=# \d import Table "public.import" Column | Type | Collation | Nullable | Default --------+----------------------+-----------+----------+--------- a | integer | | | b | character varying (4) | | | c | integer | | | d | character varying (4) | | | |
All you need to do, to get that data from the text file into the table is this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | postgres=# copy import from '/home/postgres/data.txt' with (delimiter ' ' ); COPY 8 postgres=# select * from import; a | b | c | d ---+------+---+------ 1 | aaaa | 1 | aaaa 2 | bbbb | 2 | bbbb 3 | cccc | 3 | cccc 4 | dddd | 4 | dddd 5 | eeee | 5 | eeee 6 | ffff | 6 | ffff 7 | gggg | 7 | gggg 8 | hhhh | 8 | hhhh (8 rows ) |
Pretty simple. The trouble might start if the file to be loaded does look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | postgres=# truncate import; TRUNCATE TABLE postgres=# \! cat data.txt2 1 aaaa 1 aaaa 2 bbbb 2 bbbb 3 cccc 3 cccc 4 dddd 4 dddd 5 eeee 5 eeee 6 ffff 6 ffff 7 gggg 7 gggg 8 hhhh 8 hhhh 9 jjjjjjjjj 9 jjjj 10 kkkk 10 kkkk 11 llll 11 lllllll 12 mmmm 12 mmmm 13 nnnn 13 nnnn |
Lines 9 and 12 will definitely not fit into the table and when you try to load this the whole process will fail and nothing will be loaded at all:
1 2 3 4 5 6 7 | postgres=# copy import from '/home/postgres/data.txt2' with (delimiter ' ' ); ERROR: value too long for type character varying (4) CONTEXT: COPY import, line 9, column b: "jjjjjjjjj" postgres=# select * from import; a | b | c | d ---+---+---+--- (0 rows ) |
Starting with PostgreSQL 17 you will have another option:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | postgres=# \h copy Command: COPY Description: copy data between a file and a table Syntax: COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] [ WHERE condition ] COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | PROGRAM 'command' | STDOUT } [ [ WITH ] ( option [, ...] ) ] where option can be one of: FORMAT format_name FREEZE [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' DEFAULT 'default_string' HEADER [ boolean | MATCH ] QUOTE 'quote_character' ESCAPE 'escape_character' FORCE_QUOTE { ( column_name [, ...] ) | * } FORCE_NOT_NULL { ( column_name [, ...] ) | * } FORCE_NULL { ( column_name [, ...] ) | * } SAVE_ERROR_TO 'location' ENCODING 'encoding_name' |
Using the option “SAVE_ERROR_TO” you can ask PostgreSQL to just ignore any data which does not fit into the target table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | postgres=# copy import from '/home/postgres/data.txt2' with (delimiter ' ' , save_error_to 'none' ); COPY 11 postgres=# select * from import; a | b | c | d ----+------+----+------ 1 | aaaa | 1 | aaaa 2 | bbbb | 2 | bbbb 3 | cccc | 3 | cccc 4 | dddd | 4 | dddd 5 | eeee | 5 | eeee 6 | ffff | 6 | ffff 7 | gggg | 7 | gggg 8 | hhhh | 8 | hhhh 10 | kkkk | 10 | kkkk 12 | mmmm | 12 | mmmm 13 | nnnn | 13 | nnnn (11 rows ) |
Everything is there, except lines 9 and 11. Great. For now you can only ignore the errors or stop on the first error (which is the default, as before). There might be more to come in the future. The commit is here.
Update 25-JAN-2024: Please note that the option for copy has been renamed to “ON_ERROR” and the parameters now are “stop” and “ignore”, see here.