PostgreSQL, as all other relational database systems, needs to persist data on disk, either by writing the writing the write ahead log or by synchronization the data files on disk whenever there is a checkpoint. When it comes to the data files PostgreSQL creates a new one once a relation reaches segment_size, which is 1GB by default. Because of this a relation can consist of many files if the relation is growing. The question I want to look at in this post is, if there can be orphaned files. A file would be orphaned if it is not part/referenced by any relation but the file is still there on disk. That would be a waste of disk space on the one hand but it would also be inconsistent with what is stored in PostgreSQL catalog.
To start with, lets create an empty table:
1
2
|
postgres=# create table t1 ( a int ); CREATE TABLE |
This already created a file on disk and we can ask PostgreSQL which file that actually is:
1
2
3
4
5
|
postgres=# select pg_relation_filepath( 't1' ); pg_relation_filepath ---------------------- base/12724/24577 (1 row) |
Of course this file is also there on the file system and standard operating system utilities can be used to verify this:
1
2
|
postgres@centos8pg: /u02/pgdata/12/ [121] ls -la $PGDATA /base/12724/24577 -rw-------. 1 postgres postgres 0 Nov 13 16:53 /u02/pgdata/12/base/12724/24577 |
Because there is noting in the table yet, the file is empty. PostgreSQL comes with a small utility called oid2name which also can be used to check the location on disk for a given relation:
1
2
3
4
5
|
postgres@centos8pg: /u02/pgdata/12/ [121] oid2name -t t1 -x From database "postgres" : Filenode Table Name Oid Schema Tablespace ------------------------------------------------- 24577 t1 24577 public pg_default |
Once we populate the table with data the file will be growing and once it reaches the segment_site PostgreSQL will add another file to the relation (please notice that my segment_size is 2GB, instead of the default of 1GB):
1
2
3
4
5
6
7
8
9
10
|
postgres=# insert into t1 select * from generate_series(1,1000000); INSERT 0 1000000 postgres=# ! ls -lha $PGDATA/base/12724/24577 -rw -------. 1 postgres postgres 35M Nov 13 17:03 /u02/pgdata/12/base/12724/24577 postgres=# insert into t1 select * from generate_series(1,100000000); INSERT 0 100000000 postgres=# ! ls -la $PGDATA/base/12724/24577* -rw -------. 1 postgres postgres 2147483648 Nov 13 17:07 /u02/pgdata/12/base/12724/24577 -rw -------. 1 postgres postgres 1513545728 Nov 13 17:08 /u02/pgdata/12/base/12724/24577.1 -rw -------. 1 postgres postgres 917504 Nov 13 17:07 /u02/pgdata/12/base/12724/24577_fsm |
The “*_fsm” file is the free space map which will not be discussed here (it is used to track the available space in the data files).
Now that it is clear what happens on disk we’ll come back to the initial question of this post: Can it happen that files on disk do not belong to any relation? Consider this example: In one session we start a new transaction and create an empty table without committing the transaction. Additionally we get the PID of the session and the location on disk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
postgres=# begin ; BEGIN postgres=# create table t2 ( a int ); CREATE TABLE postgres=# select pg_relation_filepath( 't2' ); pg_relation_filepath ---------------------- base/12724/24580 (1 row) postgres=# select * from pg_backend_pid(); pg_backend_pid ---------------- 7170 (1 row) |
Without committing we can already check the file on disk, as PostgreSQL already created it:
1
2
|
postgres=# ! ls -la $PGDATA/base/12724/24580 -rw -------. 1 postgres postgres 0 Nov 13 17:17 /u02/pgdata/12/base/12724/24580 |
If the server is crashing now, or the out of memory killer kicks in and kills the session: what happens then? We can simulate that quite easy by killing the session without giving it any chance to cleanup (which is signal 9):
1
|
postgres@centos8pg: /home/postgres/ [121] kill -9 7170 |
Going back to the psql session it lost the connection but then immediately reconnects. :
1
2
3
4
5
6
7
8
9
10
|
postgres-# select 1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. postgres=# select 1; ? column ? ---------- 1 (1 row) |
To summarize: The session was killed before the transaction was able to complete and the transaction already created a table. What must happen is, that the transaction is rolled back and the table must not be there:
1
2
3
|
postgres=# select * from t2; ERROR: relation "t2" does not exist LINE 1: select * from t2; |
This is fine and what is expected. But do we still see the file on disk?
1
2
|
postgres@centos8pg: /home/postgres/ [121] ls -lha $PGDATA /base/12724/24580 -rw-------. 1 postgres postgres 0 Nov 13 17:17 /u02/pgdata/12/base/12724/24580 |
Yes, now we really have an orphaned file which does not belong to any relation PostgreSQL knows of:
1
2
3
4
|
postgres=# select relname from pg_class where oid = '24580' ; relname --------- (0 rows ) |
So there might be cases when you are required to cleanup the files on disk. Image you did huge load like this and just before the load completes the session is killed:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
postgres=# begin ; BEGIN postgres=# create table t3 ( a int ); CREATE TABLE postgres=# select pg_relation_filepath( 't3' ); pg_relation_filepath ---------------------- base/12724/32769 (1 row) postgres=# select * from pg_backend_pid(); pg_backend_pid ---------------- 7577 (1 row) postgres=# insert into t3 select * from generate_series(1,10000000); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. |
Now the file is not empty anymore and disk space is consumed that is for nothing:
1
2
|
postgres@centos8pg: /home/postgres/ [121] ls -lha $PGDATA /base/12724/32769 -rw-------. 1 postgres postgres 235M Nov 13 17:42 /u02/pgdata/12/base/12724/32769 |
In the worst case that could be gigabytes or terabytes of wasted space. Is there at least a way to detect such orphaned files? You would need to compare what PostgreSQL has recorded in the catalog against the file system and then delete everything PostgreSQL is not aware of and this needs to be done very, very carefully.
First of all you’d need to get the OID of the database you want to check:
1
2
3
4
5
|
postgres=# select oid from pg_database where datname = 'postgres' ; oid ------- 12724 (1 row) |
Once you have that you know the location on disk, which is $PGDATA/base/[OID_OF_THE_DATABASE] (I am not considering tablespaces here). From here on you can list all the files that should belong to a relation (the regular expression is testing for numbers as only those are important here):
1
2
3
|
postgres=# select * from pg_ls_dir ( '/u02/pgdata/12/base/12724' ) as file where file ~ '^[0-9]* |
All of these should have entry in pg_class (otherwise PostgreSQL is not aware of them).
Finally, getting the list of orphaned files:
1
|
postgres=# select * from pg_ls_dir ( '/u02/pgdata/12/base/12724' ) as file where file ~ '^[0-9]* |
That would be the files you need to check carefully. If you are sure it is an orphaned file you may delete it (after you make a backup, of course).
1
2
3
4
5
6
7
8
9
|
; file ------- 1255 1247 1249 1259 ... |
All of these should have entry in pg_class (otherwise PostgreSQL is not aware of them).
Finally, getting the list of orphaned files:
That would be the files you need to check carefully. If you are sure it is an orphaned file you may delete it (after you make a backup, of course).
1
|
and file::text not in ( select oid::text from pg_class ); file ------- 16385 24580 32769 (3 rows) |
That would be the files you need to check carefully. If you are sure it is an orphaned file you may delete it (after you make a backup, of course).
; file ——- 1255 1247 1249 1259 …
All of these should have entry in pg_class (otherwise PostgreSQL is not aware of them).
Finally, getting the list of orphaned files:
That would be the files you need to check carefully. If you are sure it is an orphaned file you may delete it (after you make a backup, of course).
jimis
27.02.2024Thank you for this post, it's very helpful for understanding what's going on with GBs of orphaned files, after the WAL writer process died because of full disk.
However you might want to check the last 4 code segments in the post. They don't execute cleanly and they appear mixed up, for example (and I copy paste from your post):
Finally, getting the list of orphaned files:
```
postgres=# select * from pg_ls_dir ( '/u02/pgdata/12/base/12724' ) as file where file ~ '^[0-9]*
```