By Mouhamadou Diaw
I was searching a tool for anonymizing data in a PostgreSQL database and I have tested the extension pg_anonymizer.
PostgreSQL_anonymizer is a set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.
In this blog I am showing how this extension can be used. I am using a PostgreSQL 10 database.
The first step is to install the extension pg_anonymizer. In my case I did it with with pgxn client
1
2
3
4
5
6
7
8
9
10
11
12
|
[postgres@pgserver2 ~]$ pgxn install postgresql_anonymizer --pg_config /u01/app/postgres/product/10/db_1/bin/pg_config INFO: best version: postgresql_anonymizer 0.0.3 INFO: saving /tmp/tmpVf3psT/postgresql_anonymizer-0 .0.3.zip INFO: unpacking: /tmp/tmpVf3psT/postgresql_anonymizer-0 .0.3.zip INFO: building extension gmake: Nothing to be done for `all'. INFO: installing extension /usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension' /usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension/anon' /usr/bin/install -c -m 644 . //anon .control '/u01/app/postgres/product/10/db_1/share/extension/' /usr/bin/install -c -m 644 . //anon/anon--0 .0.3.sql '/u01/app/postgres/product/10/db_1/share/extension/anon/' [postgres@pgserver2 ~]$ |
We can then verify that under /u01/app/postgres/product/10/db_1/share/extension we have a file anon.control and a directory named anon
1
2
3
4
5
6
7
8
|
[postgres@pgserver2 extension]$ ls -ltra anon* -rw-r--r--. 1 postgres postgres 167 Sep 13 10:54 anon.control anon: total 18552 drwxrwxr-x. 3 postgres postgres 12288 Sep 13 10:54 .. drwxrwxr-x. 2 postgres postgres 28 Sep 13 10:54 . -rw-r--r--. 1 postgres postgres 18980156 Sep 13 10:54 anon--0.0.3.sql |
Let’s create a database named prod and let’s create the required extensions. tsm_system_rows should delivered by the contrib.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
prod=# c prod You are now connected to database "prod" as user "postgres" . prod=# prod=# CREATE EXTENSION tsm_system_rows;; CREATE EXTENSION prod=# prod=# CREATE EXTENSION anon; CREATE EXTENSION prod=# prod=# dx List of installed extensions Name | Version | Schema | Description -----------------+---------+------------+---------------------------------------------------- -------- anon | 0.0.3 | anon | Data anonymization tools plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language tsm_system_rows | 1.0 | public | TABLESAMPLE method which accepts number of rows as a limit (3 rows ) prod=# |
The extension will create following functions in the schema anon. These functions can be used to mask some data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
prod=# set search_path=anon; SET prod=# df List of functions Schema | Name | Result data type | Argu ment data types | Type --------+--------------------------+--------------------------+------------------------------ ------------------------------------------+-------- anon | random_city | text | | normal anon | random_city_in_country | text | country_name text | normal anon | random_company | text | | normal anon | random_country | text | | normal anon | random_date | timestamp with time zone | | normal anon | random_date_between | timestamp with time zone | date_start timestamp with tim e zone, date_end timestamp with time zone | normal anon | random_email | text | | normal anon | random_first_name | text | | normal anon | random_iban | text | | normal anon | random_int_between | integer | int_start integer , int_stop integer | normal anon | random_last_name | text | | normal anon | random_phone | text | phone_prefix text DEFAULT '0' ::text | normal anon | random_region | text | | normal anon | random_region_in_country | text | country_name text | normal anon | random_siren | text | | normal anon | random_siret | text | | normal anon | random_string | text | l integer | normal anon | random_zip | text | | normal (18 rows ) prod=# |
Now in the database prod let’s create a table with some data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
prod=# d customers Table "public.customers" Column | Type | Collation | Nullable | Default ------------+-----------------------+-----------+----------+--------- first_name | character varying (30) | | | last_name | character varying (30) | | | email_add | character varying (30) | | | country | character varying (60) | | | iban | character varying (60) | | | amount | integer | | | prod=# prod=# table customers; first_name | last_name | email_add | country | iban | amount ------------+-----------+-------------------------+--------------+----------------------------+------------ Michel | Delco | [email protected] | FRANCE | FR763000600001123456890189 | 5000000 Denise | Blanchot | [email protected] | GERMANY | DE91100000000123456789 | 1000000000 Farid | Dim | [email protected] | Saudi Arabia | SA4420000001234567891234 | 2500000 (3 rows ) prod=# |
Let’s say that I want some people to access to all data for this table, but I don’t want them to see the real email, the real country and the real iban of the customers.
One solution should be to create a view with anonymous data for these columns. This will replace them with random-but-plausible values for these columns
1
2
|
prod=# create view Customers_anon as select first_name as Firstname ,last_name as Lastnmame,anon.random_email() as Email ,anon.random_country() as Country, anon.random_iban() as Iban ,amount as Amount from customers; CREATE VIEW |
And then grant the access privilege to concerned people
1
2
3
4
5
6
7
8
9
|
prod=# select * from customers_anon ; firstname | lastnmame | email | country | iban | amount -----------+-----------+-------------------------------+---------+----------------------------+------------ Michel | Delco | [email protected] | Spain | AD1111112222C3C3C3C3C3C3 | 5000000 Denise | Blanchot | [email protected] | Myanmar | AD1111112222C3C3C3C3C3C3 | 1000000000 Farid | Dim | [email protected] | India | AD1111112222C3C3C3C3C3C3 | 2500000 (3 rows ) prod=# |
Thanks to Damien for this project. Just not that this project is in its early development and should be used with care (see Damien comment below).