As many know about a year ago I wrote some critical posts about Microsofts Flexible Server for PostgreSQL, now it is time to take a look again.

At first, many promised functions where delivered with a latency of about one year or more, for example AAD integration or customer keys.

One topic was Compute V4, the Cascade Lake Xeon CPUs which where not listed on the Ubuntu Compatibility List, now, six weeks before EOL of Ubuntu 18.04 Microsoft forced that these CPUs where listed, what ever happend in the background. But this does not change anything on the bad performance of V4.

On two actual customer projects I was able to run performance tests again, comparing with an own VM based on EL8 using a well configured PostgreSQL.org RPMs based installation.

The results where a bit surpising, yes, performance improvements are “visible”, what not means that it is really good. Also the scaling about IOPs of Storage is really bad.

Flexible Server, 8vCPU, 64GB RAM, 512GB Storage (2300 IOPs) running pgbench:

pgbench (14.6)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 30
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 300000/300000
latency average = 15.409 ms
initial connection time = 188.984 ms
tps = 1946.934585 (without initial connection time)

Flexible Server, 8vCPU, 64GB RAM, 2TB Storage (7500 IOPs) running pgbench:

pgbench (14.6)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 30
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 300000/300000
latency average = 13.577 ms
initial connection time = 174.816 ms
tps = 2209.566073 (without initial connection time)

3 times of IOPs and as result only 13% performance improvement.

VM, EL8 with PostgreSQL.org RPMs, Memory and CPU Parameters adapted for 8vCPU and 64GB RAM, for 7500 IOPs only 500GB storage is needed.

pgbench (14.6)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 30
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 300000/300000
latency average = 7.876 ms
initial connection time = 106.050 ms
tps = 3808.850163 (without initial connection time)

Comparing to 512GB twice of perfomance, comparing to 2TB 73% faster.

Now a tuned VM, means adaped tuned.conf, adapted storage parameters within postgresql.auto.conf, again 8vCPU, 64GB RAM and 500GB Storage (7500 IOPs):

pgbench (14.6)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 30
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 300000/300000
latency average = 5.619 ms
initial connection time = 227.165 ms
tps = 5338.680841 (without initial connection time)

142% faster than Microsofts PAAS Service.

Feedback from Microsoft was that we should do the same testing on the new Compute V5 (Ice Lake Xeons) which is in Customer Preview, but at the moment not at Switzerland North, I will do when I have access to it.

Microsoft ist replacing the underlaying Linux, V3 and V4 based on Ubuntu, V5 based on Microsofts Mariner Linux which looks to me like another Fedora / EL9 clone.

One big issue at Microsofts PAAS is that parameters not following best practices, max_connections for example is default at 5000!

Another issue is that memory parameters set as count of 8K Blocks and not in GB or MB like we normal do, this is the reason for many misconfigurations.

I have created a small Excel sheet which allows to check the Parameter and to correct them if needed.

And yes, these parameters a good starting point, this can be different according to the workload on the system including the relationship between max_connections and work_mem.

For managing VMs on different Cloud Providers, such as AWS, Azure, OCI or OnPrem dbi offers Yak, with Yak Components the deplyoments following dbi best practices for maximum on performance and security.

https://www.dbi-services.com/products/yak/

Feel free to take a look at to prevent slow PaaS.