In the last post we’ve looked at how you can tell initdb which character set and locale it should use for the initial databases. In this post we’ll look at how you can tell PostgreSQL how you want to have strings sorted.

Looking at my instance, this is what I currently have:

postgres=# \l
                                                        List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | ICU Locale  | ICU Rules |   Access privileges   
-----------+----------+----------+-----------------+-------------+-------------+-------------+-----------+-----------------------
 postgres  | postgres | UTF8     | icu             | en_US.UTF-8 | en_US.UTF-8 | en-US-x-icu |           | 
 template0 | postgres | UTF8     | icu             | en_US.UTF-8 | en_US.UTF-8 | en-US-x-icu |           | =c/postgres          +
           |          |          |                 |             |             |             |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | icu             | en_US.UTF-8 | en_US.UTF-8 | en-US-x-icu |           | =c/postgres          +
           |          |          |                 |             |             |             |           | postgres=CTc/postgres
(3 rows)

Checking how strings are sorted in this setup can easily be verified by creating a small table with a couple of rows:

postgres=# create table t ( a text );
CREATE TABLE
postgres=# insert into t values ('a'),('A'),('ii'),('II'),('*'),('1.1'),('-');
INSERT 0 7
postgres=# select * from t;
  a  
-----
 a
 A
 ii
 II
 *
 1.1
 -
(7 rows)

Sorting this right now gives the following result:

postgres=# select * from t order by 1;
  a  
-----
 -
 *
 1.1
 a
 A
 ii
 II
(7 rows)

Special characters come first, the number afterwards and characters last, lower case before upper case. What happens if we want to sort this with a Swiss/German locale? As mentioned in the previous post: If you want to use the locales provided by glibc, then those need to be installed on the operating system:

postgres=# \! locale -a | grep de_CH
de_CH.utf8
postgres=# select * from t order by a collate "de_CH.utf8";
  a  
-----
 *
 -
 1.1
 a
 A
 ii
 II
(7 rows)

This gives exactly the same result. What about “C” or “POSIX”?

postgres=# select * from t order by a collate "POSIX";
  a  
-----
 *
 -
 1.1
 A
 II
 a
 ii
(7 rows)

Now upper case characters sort before lower characters and the order is not anymore the same. You can also use the ICU collations listed in pg_collation:

postgres=# select * from t order by a collate "de-BE-x-icu";
  a  
-----
 -
 *
 1.1
 a
 A
 ii
 II
(7 rows)

To summarize this: Depending on how you want to have your strings sorted, you need to use the correct collation for this. Not all languages follow the same rules and if you want to support multiple languages things might become a bit more tricky.