dbi Blog

Another great feature just got committed for PostgreSQL 17 and hopefully will not be reverted until the final release later this year. You probably already know copy. Whenever you want to load something into or unload something out of PostgreSQL, this is the tool to use. When loading into PostgreSQL there is something to consider up to PostgreSQL 16 which might be very annoying: Whenever there is a line copy cannot write into the target table the whole process will be aborted and nothing will be loaded at all. Maybe this is something you want to have, but for other use cases the preferred behavior would be either to just ignore those lines and/or write the error messages somewhere else. In both cases you would not loose the rest of the data and don’t need to start from scratch after fixing the data to be loaded.

As usual lets start with a little test case: Suppose we have this text file:

postgres=# \! cat data.txt
1 aaaa 1 aaaa
2 bbbb 2 bbbb
3 cccc 3 cccc
4 dddd 4 dddd
5 eeee 5 eeee
6 ffff 6 ffff
7 gggg 7 gggg
8 hhhh 8 hhhh

A table which can hold this data might look like this:

postgres=# create table import ( a int, b varchar(4), c int, d varchar(4));
CREATE TABLE
postgres=# \d import
                     Table "public.import"
 Column |         Type         | Collation | Nullable | Default 
--------+----------------------+-----------+----------+---------
 a      | integer              |           |          | 
 b      | character varying(4) |           |          | 
 c      | integer              |           |          | 
 d      | character varying(4) |           |          |

All you need to do, to get that data from the text file into the table is this:

postgres=# copy import from '/home/postgres/data.txt' with (delimiter ' ');
COPY 8
postgres=# select * from import;
 a |  b   | c |  d   
---+------+---+------
 1 | aaaa | 1 | aaaa
 2 | bbbb | 2 | bbbb
 3 | cccc | 3 | cccc
 4 | dddd | 4 | dddd
 5 | eeee | 5 | eeee
 6 | ffff | 6 | ffff
 7 | gggg | 7 | gggg
 8 | hhhh | 8 | hhhh
(8 rows)

Pretty simple. The trouble might start if the file to be loaded does look like this:

postgres=# truncate import;
TRUNCATE TABLE
postgres=# \! cat data.txt2
1 aaaa 1 aaaa
2 bbbb 2 bbbb
3 cccc 3 cccc
4 dddd 4 dddd
5 eeee 5 eeee
6 ffff 6 ffff
7 gggg 7 gggg
8 hhhh 8 hhhh
9 jjjjjjjjj 9 jjjj
10 kkkk 10 kkkk
11 llll 11 lllllll
12 mmmm 12 mmmm
13 nnnn 13 nnnn

Lines 9 and 12 will definitely not fit into the table and when you try to load this the whole process will fail and nothing will be loaded at all:

postgres=# copy import from '/home/postgres/data.txt2' with (delimiter ' ');
ERROR:  value too long for type character varying(4)
CONTEXT:  COPY import, line 9, column b: "jjjjjjjjj"
postgres=# select * from import;
 a | b | c | d 
---+---+---+---
(0 rows)

Starting with PostgreSQL 17 you will have another option:

postgres=# \h copy
Command:     COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    DEFAULT 'default_string'
    HEADER [ boolean | MATCH ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL { ( column_name [, ...] ) | * }
    FORCE_NULL { ( column_name [, ...] ) | * }
    SAVE_ERROR_TO 'location'
    ENCODING 'encoding_name'

URL: https://www.postgresql.org/docs/devel/sql-copy.html

Using the option “SAVE_ERROR_TO” you can ask PostgreSQL to just ignore any data which does not fit into the target table:

postgres=# copy import from '/home/postgres/data.txt2' with (delimiter ' ', save_error_to 'none');
COPY 11
postgres=# select * from import;
 a  |  b   | c  |  d   
----+------+----+------
  1 | aaaa |  1 | aaaa
  2 | bbbb |  2 | bbbb
  3 | cccc |  3 | cccc
  4 | dddd |  4 | dddd
  5 | eeee |  5 | eeee
  6 | ffff |  6 | ffff
  7 | gggg |  7 | gggg
  8 | hhhh |  8 | hhhh
 10 | kkkk | 10 | kkkk
 12 | mmmm | 12 | mmmm
 13 | nnnn | 13 | nnnn
(11 rows)

Everything is there, except lines 9 and 11. Great. For now you can only ignore the errors or stop on the first error (which is the default, as before). There might be more to come in the future. The commit is here.

Update 25-JAN-2024: Please note that the option for copy has been renamed to “ON_ERROR” and the parameters now are “stop” and “ignore”, see here.

Post Views: 3,942

PostgreSQL 17: copy and SAVE_ERROR_TO

Daniel Westermann

Leave a Reply:

Related blog articles

Daniel Westermann

Leave a Reply:

Related blog articles

Commercial PostgreSQL distributions with TDE (1) Fujitsu Enterprise Postgres (2) TDE

SQL Server 2022 CU23: Database Mail is broken, but your alerts shouldn’t be

Commercial PostgreSQL distributions with TDE (1) Fujitsu Enterprise Postgres (1) Setup

PostgreSQL: Transaction IDs and Virtual Transaction IDs