{"id":15655,"date":"2021-02-23T10:54:48","date_gmt":"2021-02-23T09:54:48","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/"},"modified":"2021-02-23T10:54:48","modified_gmt":"2021-02-23T09:54:48","slug":"8000-postgresql-clusters-on-one-host","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/","title":{"rendered":"8000 PostgreSQL clusters on one host?"},"content":{"rendered":"<p>Coming back to the last post: <a href=\"https:\/\/www.dbi-services.com\/blog\/8000-databases-in-one-postgresql-cluster\/\" target=\"_blank\" rel=\"noopener\">8000 databases in one PostgreSQL cluster?<\/a>. As we&#8217;ve seen this is doable but comes with a lot of pain points. Another solution for this kind of requirement would be to host 8000 PostgreSQL clusters on one host. Again, I am not saying this is something you should do, and again, this is not how it finally will be implemented. This is more about if you can do it, and what is means for operating such a setup. Eight thousand is a huge number, no matter if it is about this number of databases in a single cluster, or this number of clusters on a single node.<\/p>\n<p><!--more--><\/p>\n<p>The default PostgreSQL configuration on Linux comes with 128MB of <a href=\"https:\/\/www.postgresql.org\/docs\/13\/runtime-config-resource.html#GUC-SHARED-BUFFERS\" target=\"_blank\" rel=\"noopener\">shared_buffers<\/a>. Again, if we do the math, that would be 1&#8217;024&#8217;000MB for 8000 clusters on a single host. Surprisingly I do not have such a machine available right now. We&#8217;ll use a AWS EC2 <a href=\"https:\/\/aws.amazon.com\/ec2\/instance-types\/\" target=\"_blank\" rel=\"noopener\">t3a.xlarge<\/a> which comes with 4vCPUs and 16GB of memory. If we reduce shared_buffers to 128kB (which is the minimum), we come to 1024000kB, which is around 8GB of shared_buffers for all clusters. This should be doable, not counting any other memory related parameters.<\/p>\n<p>The next bit we need to calculate is the required disk space. A fresh initialized PGDATA for PostgreSQL 13.1 consumes 29MB. For 8000 clusters we will therefore need 312&#8217;000MB of disk space. Se, here we go:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] grep MemTotal \/proc\/meminfo \nMemTotal:       16284668 kB\nostgres@pg13:\/home\/postgres\/ [pg131] grep proc \/proc\/cpuinfo \nprocessor       : 0\nprocessor       : 1\nprocessor       : 2\nprocessor       : 3\npostgres@pg13:\/home\/postgres\/ [pg131] df -h\nFilesystem       Size  Used Avail Use% Mounted on\nudev             7.8G     0  7.8G   0% \/dev\ntmpfs            1.6G  8.4M  1.6G   1% \/run\n\/dev\/nvme0n1p1   492G  2.9G  469G   1% \/\ntmpfs            7.8G     0  7.8G   0% \/dev\/shm\ntmpfs            5.0M     0  5.0M   0% \/run\/lock\ntmpfs            7.8G     0  7.8G   0% \/sys\/fs\/cgroup\n\/dev\/nvme0n1p15  124M  262K  124M   1% \/boot\/efi\ntmpfs            1.6G     0  1.6G   0% \/run\/user\/1000\n<\/pre>\n<p>Initializing 8000 clusters will take some time, but the procedure is quite simple. We need a different directory for each PGDATA and a different port for each cluster. In addition we need to reduce shared_buffers to 128kB and then start up all the clusters. First, lets create all the data directories and check if we run into any issues with this already:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nport=\"5432\"\nfor i in {1..8000}\ndo\n    PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n    initdb -D ${PGDATA} &gt; \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n    echo \"port=${port}\" &gt;&gt; ${PGDATA}\/postgresql.auto.conf\n    echo \"shared_buffers=128kB\" &gt;&gt; ${PGDATA}\/postgresql.auto.conf\n    (( port += 1 ))\ndone\n<\/pre>\n<p>If you want to monitor the progress, you might do something like this:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] watch \"ls -l \/u02\/pgdata | grep -v total | wc -l\"\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 10:50:41 2021\n\n146\n...\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:06:14 2021\n\n1197\n...\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:13:27 2021\n\n1696\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 11:54:09 2021\n\n4452\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 12:19:44 2021\n\n6116\nEvery 2.0s: ls -l \/u02\/pgdata | grep -v total | wc -l                                                 pg13: Fri Feb 12 13:35:24 2021\n\n8000\n<\/pre>\n<p>That takes quite some time. How many space or we consuming?<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] du -sh \/u02\/pgdata\/\n308G    \/u02\/pgdata\/\n<\/pre>\n<p>Not far from what we calculated above. Before we try to start all the clusters there is another bit we need to think about. If we startup a PostgreSQL 13.1 instance, how many processes do we get by default?<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@debian10pg:\/home\/postgres\/ [pg131] ps -ef | egrep \"postgres -D|postgres:\"\npostgres 28408     1  0 07:42 ?        00:00:00 \/u01\/app\/postgres\/product\/DEV\/db_1\/bin\/postgres -D \/u02\/pgdata\/DEV\npostgres 28409 28408  0 07:42 ?        00:00:00 postgres: logger \npostgres 28411 28408  0 07:42 ?        00:00:00 postgres: checkpointer \npostgres 28412 28408  0 07:42 ?        00:00:00 postgres: background writer \npostgres 28413 28408  0 07:42 ?        00:00:00 postgres: walwriter \npostgres 28414 28408  0 07:42 ?        00:00:00 postgres: autovacuum launcher \npostgres 28415 28408  0 07:42 ?        00:00:00 postgres: stats collector \npostgres 28416 28408  0 07:42 ?        00:00:00 postgres: logical replication launcher \nostgres@debian10pg:\/home\/postgres\/ [pg131] ps -ef | egrep \"postgres -D|postgres:\" | wc -l\n8\n<\/pre>\n<p>That&#8217;s eight per cluster, multiplied by 8000, that makes 64&#8217;000 processes. In the default configuration on Debian we have these limits:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1; highlight: [10,16]\">\npostgres@debian10pg:\/home\/postgres\/ [pg131] ulimit -a\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -a\ncore file size          (blocks, -c) 0\ndata seg size           (kbytes, -d) unlimited\nscheduling priority             (-e) 0\nfile size               (blocks, -f) unlimited\npending signals                 (-i) 63541\nmax locked memory       (kbytes, -l) 65536\nmax memory size         (kbytes, -m) unlimited\nopen files                      (-n) 1024\npipe size            (512 bytes, -p) 8\nPOSIX message queues     (bytes, -q) 819200\nreal-time priority              (-r) 0\nstack size              (kbytes, -s) 8192\ncpu time               (seconds, -t) unlimited\nmax user processes              (-u) 63541\nvirtual memory          (kbytes, -v) unlimited\nfile locks                      (-x) unlimited\n<\/pre>\n<p>This will for sure not work, as we can easily try:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nport=\"5432\"\nfor i in {1..8000}\ndo\n    PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n    pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n    (( port += 1 ))\ndone\n<\/pre>\n<p>This will go fine until around 1500 postmasters have been started:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nunable to fork process: Resource temporarily unavailable\n<\/pre>\n<p>&#8230; so we are clearly hit some limits here. Lets increase the maximum number of open files and processes limits for the postgres user (-1 means unlimited, only hard limits here):<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/etc\/security\/limits.conf  | egrep -v \"^#|^$\"\npostgres         hard     nproc          -1\npostgres         hard     nofile         -1\n<\/pre>\n<p>After restarting the session we can check the actual value of the hard limits:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -H -n\n1048576\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -H -u\nunlimited\n<\/pre>\n<p>This means we should be able to create an unlimited amount of user processes, and we should be able to open 1048576 files (this seems to be the maximum on Debian). As these are the hard limits we need to increase the soft limits as well:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -u unlimited\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -n 1048576\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -a\ncore file size          (blocks, -c) 0\ndata seg size           (kbytes, -d) unlimited\nscheduling priority             (-e) 0\nfile size               (blocks, -f) unlimited\npending signals                 (-i) 63540\nmax locked memory       (kbytes, -l) 65536\nmax memory size         (kbytes, -m) unlimited\nopen files                      (-n) 1048576\npipe size            (512 bytes, -p) 8\nPOSIX message queues     (bytes, -q) 819200\nreal-time priority              (-r) 0\nstack size              (kbytes, -s) 8192\ncpu time               (seconds, -t) unlimited\nmax user processes              (-u) unlimited\nvirtual memory          (kbytes, -v) unlimited\nfile locks                      (-x) unlimited\n<\/pre>\n<p>Lets check if that is sufficient for starting the 8000 clusters:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] port=\"5432\"\npostgres@pg13:\/home\/postgres\/ [pg131] for i in {1..8000}\n&gt; do\n&gt;     PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n&gt;     pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n&gt;     (( port += 1 ))\n&gt; done\n<\/pre>\n<p>No, we&#8217;ll still hit limits (again around 1500 clusters):<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nno data was returned by command \"\"\/u01\/app\/postgres\/product\/13\/db_1\/bin\/postgres\" -V\"\nThe program \"postgres\" is needed by pg_ctl but was not found in the\nsame directory as \"\/u01\/app\/postgres\/product\/13\/db_1\/bin\/pg_ctl\".\nCheck your installation.\nsh: 1: Cannot fork\n<\/pre>\n<p>Lets try to use a larger instance type: <a href=\"https:\/\/aws.amazon.com\/ec2\/instance-types\/c5\/\" target=\"_blank\" rel=\"noopener\">c5.9xlarge<\/a>: This one comes with 36vCPU and 72GiB of memory. Will that work?<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/cpuinfo | grep proc | wc -l\n36\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/meminfo | grep Mem\nMemTotal:       72033208 kB\nMemFree:        71577940 kB\nMemAvailable:   71139240 kB\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -u unlimited\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -n 1048576\npostgres@pg13:\/home\/postgres\/ [pg131] for i in {1..8000}\n&gt; do\n&gt;     PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n&gt;     pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n&gt;     (( port += 1 ))\n&gt; done\n<\/pre>\n<p>We can go a bit further and come to approx. 1740 running cluster, but this is far from 8000:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] ps -ef | grep postgres | grep \"postgres -D\" | wc -l\n1736\n<\/pre>\n<p>If we increase even more and go to <a href=\"https:\/\/aws.amazon.com\/ec2\/instance-types\/c5\/\" target=\"_blank\" rel=\"noopener\">c5a.24xlarge<\/a> we&#8217;ll come to (96vCPUs and 192GiB of memory):<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/cpuinfo | grep proc | wc -l\n96\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/meminfo | grep Mem\nMemTotal:       195998200 kB\nMemFree:        195223608 kB\nMemAvailable:   194067016 kB\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -u unlimited\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -n 1048576\npostgres@pg13:\/home\/postgres\/ [pg131] for i in {1..8000}; do     PGDATA=\"\/u02\/pgdata\/pgdata${i}\";     pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null;     (( port += 1 )); done\n<\/pre>\n<p>This will give us around 4096 clusters, which is a bit more than half the 8000 we wanted. But this also means that there is another limit we&#8217;re hitting. Four thousand clusters will consume 32000 processes, and we&#8217;re not close to the limit we&#8217;ve set above. What is it then? Increasing the number of vCPUs clearly gave us more running clusters, so there must be a relation between the number of vCPUs and the number of processes we can start. There is a kernel parameter which limits the maximum number of processes one can start on a Linux system:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/sys\/kernel\/pid_max\n98304\npostgres@pg13:\/home\/postgres\/ [pg131] man proc 5\n       \/proc\/sys\/kernel\/pid_max (since Linux 2.5.34)\n              This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID).  PIDs greater than this value are not allocated; thus, the value in this file also acts as a system-wide limit on  the  total\n              number  of  processes and threads.  The default value for this file, 32768, results in the same range of PIDs as on earlier kernels.  On 32-bit platforms, 32768 is the maximum value for pid_max.  On 64-bit systems, pid_max can be set to any value up\n              to 2^22 (PID_MAX_LIMIT, approximately 4 million).\n\n<\/pre>\n<p>We&#8217;re already higher than the default, which is 32768 and this value is dependent on the number of CPUs (more information <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/master\/include\/linux\/threads.h\" target=\"_blank\" rel=\"noopener\">here<\/a>). This is actually not a real limit, but once that number is reached, the counter starts from 1 again and if no numbers are left, no additional process can be started. If we scale down our instance to c5.large (4vCPUs and 8GiB of memory) we&#8217;ll again get the default:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/sys\/kernel\/pid_max\n32768\n<\/pre>\n<p>Although we should not hit the limit of 98304 lets try to increase that and do another run. This time we&#8217;ll also have a look at the memory usage:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] sudo bash\nroot@pg13:\/home\/postgres$ echo \"196608\" &gt; \/proc\/sys\/kernel\/pid_max\nroot@pg13:\/home\/postgres$ exit\npostgres@pg13:\/home\/postgres\/ [pg131] cat \/proc\/sys\/kernel\/pid_max\n196608\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -u unlimited\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -S -n 1048576\npostgres@pg13:\/home\/postgres\/ [pg131] for i in {1..8000}\n&gt; do\n&gt;     PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n&gt;     pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n&gt;     (( port += 1 ))\n&gt; done\n<\/pre>\n<p>In a second session monitor memory usage:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] watch \"free -m\"\n## with approx.1000 clusters\nEvery 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:16:50 2021\n\n              total        used        free      shared  buff\/cache   available\nMem:         189152        6412      177810        4183        4929      177353\nSwap:             0           0           0\n\n## with approx. 2000 clusters\nEvery 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:18:38 2021\n\n              total        used        free      shared  buff\/cache   available\nMem:         189152       12098      167470        8276        9583      167516\nSwap:             0           0           0\n\n## with approx. 3000 clusters\nEvery 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:20:26 2021\n\n              total        used        free      shared  buff\/cache   available\nMem:         189152       17803      157119       12369       14229      157664\nSwap:             0           0           0\n\n## with approx. 4000 clusters\nEvery 2.0s: free -m                                                                                   pg13: Tue Feb 16 13:22:16 2021\n\n              total        used        free      shared  buff\/cache   available\nMem:         189152       23673      146495       16540       18983      147562\nSwap:             0           0           0\n<\/pre>\n<p>&#8230; and it still fails with a bit more than 4000 clusters. The issue is not the overall memory consumption, we still have plenty of memory left. The answer is in the PostgreSQL startup log:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\n2021-02-16 13:43:17.990 UTC [97614] FATAL:  could not create shared memory segment: No space left on device\n2021-02-16 13:43:17.990 UTC [97614] DETAIL:  Failed system call was shmget(key=21380032, size=56, 03600).\n2021-02-16 13:43:17.990 UTC [97614] HINT:  This error does *not* mean that you have run out of disk space.  It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.\n        The PostgreSQL documentation contains more information about shared memory configuration.\n2021-02-16 13:43:17.990 UTC [97614] LOG:  database system is shut down\n<\/pre>\n<p>So we need to increase <a href=\"https:\/\/www.postgresql.org\/docs\/13\/kernel-resources.html\" target=\"_blank\" rel=\"noopener\">shmmni<\/a> for enough shared memory segments being available. Next try:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] sudo sysctl -w kernel.shmmni=8192\nkernel.shmmni = 8192\npostgres@pg13:\/home\/postgres\/ [pg131] sysctl -p\npostgres@pg13:\/home\/postgres\/ [pg131] sudo sysctl -a | grep kernel.shmmni\nkernel.shmmni = 8192\npostgres@pg13:\/home\/postgres\/ [pg131] for i in {1..8000}\n&gt; do\n&gt;     PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n&gt;     pg_ctl start -D ${PGDATA}\/  -l \/dev\/null 2&gt;&amp;1 &gt; \/dev\/null\n&gt;     (( port += 1 ))\n&gt; done\n\nsh: 1: Cannot fork\nno data was returned by command \"\"\/u01\/app\/postgres\/product\/13\/db_1\/bin\/postgres\" -V\"\nThe program \"postgres\" is needed by pg_ctl but was not found in the\nsame directory as \"\/u01\/app\/postgres\/product\/13\/db_1\/bin\/pg_ctl\".\nCheck your installation.\n<\/pre>\n<p>Next limit hit, now we come to around 4600 clusters, which is better, but still not the 8000. This time we are hitting the number of open files:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] sudo lsof -u postgres 2&gt;\/dev\/null | wc -l\n1884863\npostgres@pg13:\/home\/postgres\/ [pg131] ulimit -Hn\n1048576\n<\/pre>\n<p>But even if we increase the system-wide limit we&#8217;ll run into the same issue because we can not raise the user defined limit by more than 1048576:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] sudo sysctl -w fs.file-max=100000000\nfs.file-max = 100000000\npostgres@pg13:\/home\/postgres\/ [pg131] sudo sysctl -a | grep fs.file-max\nfs.file-max = 100000000\n<\/pre>\n<p>The reason is in the sshd_config:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131]  sudo cat \/etc\/ssh\/sshd_config | grep -i UsePam\nUsePAM yes\n<\/pre>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Linux_PAM\" target=\"_blank\" rel=\"noopener\">PAM<\/a> is responsible for setting the user limits. Lets skip that by removing this line and restarting sshd:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@pg13:\/home\/postgres\/ [pg131] sudo sed -i 's\/UsePAM yes\/UsePAM no\/g' \/etc\/ssh\/sshd_config\npostgres@pg13:\/home\/postgres\/ [pg131] grep UsePAM \/etc\/ssh\/sshd_config\nUsePAM no\npostgres@pg13:\/home\/postgres\/ [pg131] sudo systemctl restart sshd\n<\/pre>\n<p>Once we restarted the ssh session let&#8217;s try again:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\ndwe@ltdwe:~$ ssh -i \/home\/dwe\/Documents\/aws\/dwe-key-pair.pem admin@18.195.216.2\nadmin@18.195.216.2: Permission denied (publickey).\n<\/pre>\n<p>Next issue, we are not able to login anymore. The way to do it is to disable the line which does the enforcing of the limits:<\/p>\n<pre class=\"brush: sql; gutter: true; first-line: 1; highlight: [6]\">\npostgres@debian10pg:\/home\/postgres\/ [pg14] grep -A 1 \"limits\" \/etc\/pam.d\/sshd \n# access limits that are hard to express in sshd_config.\n# account  required     pam_access.so\n--\n# Set up user limits from \/etc\/security\/limits.conf.\n# session    required     pam_limits.so\n<\/pre>\n<p>After restarting sshd lets try again:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nport=\"5432\"\nfor i in {1..8000}\ndo\n    PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n    pg_ctl start -D ${PGDATA}\/  -l \/home\/postgres\/${i}log.log\n    (( port += 1 ))\ndone\n<\/pre>\n<p>.. and it still fails:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\n2021-02-19 15:49:38.457 UTC [109716] LOG:  listening on Unix socket \"\/tmp\/.s.PGSQL.10063\"\n2021-02-19 15:49:38.461 UTC [109717] LOG:  database system was shut down at 2021-02-19 15:39:59 UTC\n2021-02-19 15:49:38.465 UTC [109716] LOG:  could not fork worker process: Resource temporarily unavailable\n2021-02-19 15:49:38.465 UTC [109716] LOG:  database system is ready to accept connections\n2021-02-19 15:49:50.889 UTC [109716] LOG:  received smart shutdown request\n<\/pre>\n<p>The next test is to check if we are hitting a user\/session, or a system wide limit. For that we will start the first 4000 clusters with the postgres user we already have. For the remaining 4000 clusters we&#8217;ll create another user, change $PGDATA to be owned by that user, and then try to start them. If that works, we can at least start 8000 clusters using two separate OS users:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\nroot@ip-10-0-1-21:\/home\/admin# useradd -g postgres -m postgres2\nroot@ip-10-0-1-21:\/home\/admin# for i in {4001..8000}\n&gt; do\n&gt;     PGDATA=\"\/u02\/pgdata\/pgdata${i}\"\n&gt;     chown -R postgres2:postgres ${PGDATA}\n&gt; done\nEvery 2.0s: ps -ef | egrep 'postgres -D' | wc -l                                                                                                                           ip-10-0-1-21: Tue Feb 23 10:38:49 2021\n\n4000\n\n<\/pre>\n<p>First 4000 clusters:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres@ip-10-0-1-21:\/home\/postgres\/ [pg132] for i in {1..4000}; do     PGDATA=\"\/u02\/pgdata\/pgdata${i}\";     pg_ctl start -D ${PGDATA}\/  -l \/home\/postgres\/${i}log.log; done\nEvery 2.0s: ps -ef | egrep 'postgres -D' | wc -l                                                                                                                           ip-10-0-1-21: Tue Feb 23 10:38:49 2021\n\n4000\n<\/pre>\n<p>Lets check if we can start another 4000 using the other user:<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">\npostgres2@ip-10-0-1-21:\/home\/postgres2\/ [pg132] id -a\nuid=1002(postgres2) gid=1001(postgres) groups=1001(postgres)\npostgres@ip-10-0-1-21:\/home\/postgres2\/ [pg132] for i in {1..4000}; do     PGDATA=\"\/u02\/pgdata\/pgdata${i}\";     pg_ctl start -D ${PGDATA}\/  -l \/home\/postgres2\/${i}log.log; done\npostgres2@ip-10-0-1-21:\/home\/postgres2\/ [pg132] cat 4632log.log\n2021-02-23 10:45:38.573 UTC [61580] LOG:  starting PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit\n2021-02-23 10:45:38.573 UTC [61580] LOG:  listening on IPv4 address \"127.0.0.1\", port 10063\n2021-02-23 10:45:38.578 UTC [61580] LOG:  listening on Unix socket \"\/tmp\/.s.PGSQL.10063\"\n2021-02-23 10:45:38.585 UTC [61581] LOG:  database system was shut down at 2021-02-21 17:53:04 UTC\n2021-02-23 10:45:38.597 UTC [61580] LOG:  could not fork worker process: Resource temporarily unavailable\n2021-02-23 10:45:38.597 UTC [61580] LOG:  database system is ready to accept connections\n<\/pre>\n<p>Which leads to the same issue once we reach around 4600 clusters. At this point I decided to stop this experiment as the conclusion is clear: Don&#8217;t do it, for obvious reasons:<\/p>\n<ul>\n<li>You&#8217;ll run into all kinds of limits<\/li>\n<li>Managing 8000 clusters on one host is not doable, you&#8217;ll for sure do a mistake one time and then one or may clusters are damaged<\/li>\n<li>Only looking at the number of open files and the number of processes should already tell you, again: Don&#8217;t do it<\/li>\n<li>Many, many other reasons (backup\/restore\/patching\/upgrades)\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Coming back to the last post: 8000 databases in one PostgreSQL cluster?. As we&#8217;ve seen this is doable but comes with a lot of pain points. Another solution for this kind of requirement would be to host 8000 PostgreSQL clusters on one host. Again, I am not saying this is something you should do, and [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1865,229],"tags":[77],"type_dbi":[],"class_list":["post-15655","post","type-post","status-publish","format-standard","hentry","category-aws","category-database-administration-monitoring","tag-postgresql"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>8000 PostgreSQL clusters on one host? - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"8000 PostgreSQL clusters on one host?\" \/>\n<meta property=\"og:description\" content=\"Coming back to the last post: 8000 databases in one PostgreSQL cluster?. As we&#8217;ve seen this is doable but comes with a lot of pain points. Another solution for this kind of requirement would be to host 8000 PostgreSQL clusters on one host. Again, I am not saying this is something you should do, and [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-23T09:54:48+00:00\" \/>\n<meta name=\"author\" content=\"Daniel Westermann\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@westermanndanie\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Westermann\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\"},\"author\":{\"name\":\"Daniel Westermann\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66\"},\"headline\":\"8000 PostgreSQL clusters on one host?\",\"datePublished\":\"2021-02-23T09:54:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\"},\"wordCount\":1131,\"commentCount\":0,\"keywords\":[\"PostgreSQL\"],\"articleSection\":[\"AWS\",\"Database Administration &amp; Monitoring\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\",\"name\":\"8000 PostgreSQL clusters on one host? - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"datePublished\":\"2021-02-23T09:54:48+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"8000 PostgreSQL clusters on one host?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66\",\"name\":\"Daniel Westermann\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g\",\"caption\":\"Daniel Westermann\"},\"description\":\"Daniel Westermann is Principal Consultant and Technology Leader Open Infrastructure at dbi services. He has more than 15 years of experience in management, engineering and optimization of databases and infrastructures, especially on Oracle and PostgreSQL. Since the beginning of his career, he has specialized in Oracle Technologies and is Oracle Certified Professional 12c and Oracle Certified Expert RAC\/GridInfra. Over time, Daniel has become increasingly interested in open source technologies, becoming \u201cTechnology Leader Open Infrastructure\u201d and PostgreSQL expert. \u00a0Based on community or EnterpriseDB tools, he develops and installs complex high available solutions with PostgreSQL. He is also a certified PostgreSQL Plus 9.0 Professional and a Postgres Advanced Server 9.4 Professional. He is a regular speaker at PostgreSQL conferences in Switzerland and Europe. Today Daniel is also supporting our customers on AWS services such as AWS RDS, database migrations into the cloud, EC2 and automated infrastructure management with AWS SSM (System Manager). He is a certified AWS Solutions Architect Professional. Prior to dbi services, Daniel was Management System Engineer at LC SYSTEMS-Engineering AG in Basel. Before that, he worked as Oracle Developper &amp;\u00a0Project Manager at Delta Energy Solutions AG in Basel (today Powel AG). Daniel holds a diploma in Business Informatics (DHBW, Germany). His branch-related experience mainly covers the pharma industry, the financial sector, energy, lottery and telecommunications.\",\"sameAs\":[\"https:\/\/x.com\/westermanndanie\"],\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/daniel-westermann\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"8000 PostgreSQL clusters on one host? - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/","og_locale":"en_US","og_type":"article","og_title":"8000 PostgreSQL clusters on one host?","og_description":"Coming back to the last post: 8000 databases in one PostgreSQL cluster?. As we&#8217;ve seen this is doable but comes with a lot of pain points. Another solution for this kind of requirement would be to host 8000 PostgreSQL clusters on one host. Again, I am not saying this is something you should do, and [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/","og_site_name":"dbi Blog","article_published_time":"2021-02-23T09:54:48+00:00","author":"Daniel Westermann","twitter_card":"summary_large_image","twitter_creator":"@westermanndanie","twitter_misc":{"Written by":"Daniel Westermann","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/"},"author":{"name":"Daniel Westermann","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66"},"headline":"8000 PostgreSQL clusters on one host?","datePublished":"2021-02-23T09:54:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/"},"wordCount":1131,"commentCount":0,"keywords":["PostgreSQL"],"articleSection":["AWS","Database Administration &amp; Monitoring"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/","url":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/","name":"8000 PostgreSQL clusters on one host? - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"datePublished":"2021-02-23T09:54:48+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/8000-postgresql-clusters-on-one-host\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"8000 PostgreSQL clusters on one host?"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d08e9bd996a89bd75c0286cbabf3c66","name":"Daniel Westermann","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/31350ceeecb1dd8986339a29bf040d4cd3cd087d410deccd8f55234466d6c317?s=96&d=mm&r=g","caption":"Daniel Westermann"},"description":"Daniel Westermann is Principal Consultant and Technology Leader Open Infrastructure at dbi services. He has more than 15 years of experience in management, engineering and optimization of databases and infrastructures, especially on Oracle and PostgreSQL. Since the beginning of his career, he has specialized in Oracle Technologies and is Oracle Certified Professional 12c and Oracle Certified Expert RAC\/GridInfra. Over time, Daniel has become increasingly interested in open source technologies, becoming \u201cTechnology Leader Open Infrastructure\u201d and PostgreSQL expert. \u00a0Based on community or EnterpriseDB tools, he develops and installs complex high available solutions with PostgreSQL. He is also a certified PostgreSQL Plus 9.0 Professional and a Postgres Advanced Server 9.4 Professional. He is a regular speaker at PostgreSQL conferences in Switzerland and Europe. Today Daniel is also supporting our customers on AWS services such as AWS RDS, database migrations into the cloud, EC2 and automated infrastructure management with AWS SSM (System Manager). He is a certified AWS Solutions Architect Professional. Prior to dbi services, Daniel was Management System Engineer at LC SYSTEMS-Engineering AG in Basel. Before that, he worked as Oracle Developper &amp;\u00a0Project Manager at Delta Energy Solutions AG in Basel (today Powel AG). Daniel holds a diploma in Business Informatics (DHBW, Germany). His branch-related experience mainly covers the pharma industry, the financial sector, energy, lottery and telecommunications.","sameAs":["https:\/\/x.com\/westermanndanie"],"url":"https:\/\/www.dbi-services.com\/blog\/author\/daniel-westermann\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/15655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=15655"}],"version-history":[{"count":0,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/15655\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=15655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=15655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=15655"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=15655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}