PostgreSQL, an open-source relational database management system, offers robust features and capabilities for data storage and management. Deploying PostgreSQL within the Oracle Cloud Infrastructure (OCI) provides a scalable and reliable environment. In this guide, we’ll explore the process of setting up PostgreSQL in OCI. We will also push a bit more to learn how backup/restore and failover works.

Let’s discover the new PostgreSQL service available in OCI 🙂

Prerequisite

For this training, you need an OCI account, of course. You also need a VM with the OCI CLI, as not everything is manageable from the GUI, like the possibility to a do a failover for example.
I’m personally running an instance with a Rocky 9 image.

Deploying PostgreSQL in OCI with the GUI

From the sidebar navigation, go to Databases > PostgreSQL > databases and then click on “Create PostgreSQL Database system”. As we are doing a project from scratch, you can continue with the default installation “Create new Database system”.
Now, you can update all the necessary fields: Database system configuration, Network configuration, Database system administrator credentials.
The PostgreSQL service necessitates a private IP as it can’t be deployed on a public subnet. To connect to it, you’ll need to create a computed instance in a public subnet. In your VCN, don’t forget to add an ingress rule to allow the traffic with port 5432 for your private subnet.

You can also notice that the only available version for PostgreSQL is the major version 14, which is two major version behind the current version.

Note that during this process you can also chose the number of nodes you want to deploy. By default you have only one node, which will be the primary of your PostgreSQL cluster.

At the same time, you can initiate the creation of an instance with a public IP. It will be used later to connect to the PostgreSQL cluster and to issue OCI related commands.
The DBSystem creation took me approximately 8 minutes.

Installation of OCI CLI and PostgreSQL CLI

Once the computed instance with a public IP is deployed, the next step involves installing the OCI CLI and the PostgreSQL CLI.

[rocky@pg-service-jumphost-frj ~]$ bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"
[rocky@pg-service-jumphost-frj ~]$ sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[rocky@pg-service-jumphost-frj ~]$ sudo dnf install -y postgresql14-server

To issue commands on your OCI environment, you need to setup the OCI config file.
The easiest way to do it is to go to your OCI user profile and then into the API keys section. When creating a new key, it will give you a config file template. You then just need to edit the path to your private key file.

[rocky@pg-service-jumphost-frj ~]$ cat .oci/config
[DEFAULT]
user=ocid1.user.oc1..aaa*********************************************7dg4zf4****ma
fingerprint=c3:3c:*************************************:ea:4e:89
tenancy=ocid1.tenancy.oc1..aa***aa**********************************************uiofira
region=eu-zurich-1
key_file=/home/rocky/.ssh/joan.***********************************************8Z.pem

Retrieving IP Endpoint and Connection Attempt

Once your database system is ready, fetch the IP endpoint from the PostgreSQL Database system. Try to connect to your cluster with `psql` using the command:

[rocky@pg-service-jumphost-frj ~]$ psql -h 10.0.1.173 -U postgres -d postgres
Password for user postgres:
psql (14.10, server 14.9)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=>

If the connection fails, double check your network configuration and security policy.

Creating a Replica Node

Creating a replica by adding a node is a crucial step in PostgreSQL deployment in OCI. This process generally takes around 8 minutes. In PostgreSQL>Database systems>Database system details click the button “Add node”.

Once created, if you go to the detail of each node, you can see that they have a private IP address. You can directly connect to the desired node using this IP. By default, if you use the endpoint IP of the DBSystem, it’ll connect you to the primary instance.

To verify the replication between the two nodes, create a database on your primary instance.

postgres=> create database sephiroth;
CREATE DATABASE
postgres=> \l
                                         List of databases
   Name    |     Owner     | Encoding |   Collate   |    Ctype    |        Access privileges
-----------+---------------+----------+-------------+-------------+---------------------------------
 postgres  | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 sephiroth | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser
 template1 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser

The database should have been replicated to the second node.

[rocky@pg-service-jumphost-frj ~]$ psql -h 10.0.1.125 -U postgres -d postgres
Password for user postgres:
psql (14.10, server 14.9)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

postgres=> \l
                                         List of databases
   Name    |     Owner     | Encoding |   Collate   |    Ctype    |        Access privileges
-----------+---------------+----------+-------------+-------------+---------------------------------
 postgres  | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 sephiroth | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser
 template1 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser

From what I could observe and find on various blogs/documentation, the PostgreSQL service from OCI use an asynchronous physical standby replication inside the same Availability Domain. 
At the end of this blog, I will add some useful links that speak about this topic.

Backup and restore

Once connected successfully, various database operations can be performed. Let’s try to see how backup and restore performs.

Still on the OCI website, click on the “Create backup” button. You can control the status of your backup.

When the backup is done, connect to your cluster and drop the database you just created. Click on the Restore button to restore your backup and verify that the database has been restored.

Bonus: If you stay connected to a PostgreSQL node while starting the restore process, it will automatically terminate the existing connections and attempt to reset them after the restore.

postgres=> drop database sephiroth ;
DROP DATABASE
postgres=> \l
FATAL:  terminating connection due to administrator command
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.10, server 14.9)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
postgres=> \l
                                         List of databases
   Name    |     Owner     | Encoding |   Collate   |    Ctype    |        Access privileges
-----------+---------------+----------+-------------+-------------+---------------------------------
 postgres  | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 sephiroth | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser
 template1 | oci_superuser | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/oci_superuser               +
           |               |          |             |             | oci_superuser=CTc/oci_superuser
(4 rows) 

Failover and Recovery

The failover process is critical for ensuring high availability. It’s important to test it before setting up your cluster in production.

You don’t have an option to do a switchover or a failover through the OCI Website. I decided to try to delete the primary node, so that the replica becomes the new primary, but OCI implemented a protection and it’s impossible to delete a primary node. I also noticed that it’s impossible to delete a node if your Database system is not in an “Active” state.

You can do a failover with the OCI CLI. The only required parameter is the DB system ID that you are trying to failover. You can use various option for the failover process, please have a look at the documentation.

https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.37.0/oci_cli_docs/cmdref/psql/db-system/failover.html

In our case, we will also specify the Node ID of the instance we want to promote as the new primary. Quick reminder, here is our current setup:

From your rocky jump host, initiate the failover command:

[rocky@pg-service-jumphost-frj ~]$ oci psql db-system failover --db-system-id "ocid1.postgresqldbsystem.oc1.eu-zurich-1.amaaaaaa4p5qca********************************************wi7xhknw7q" --db-instance-id "5a0***ac-************************4a5c2706"
{
  "opc-work-request-id": "ocid1.postgresqlworkrequest.oc1.eu-zurich-1.amaaaaaa4p5q*******************************sxy44qa7xjq"
}

When I went to the Work requests to check the status of my failover, I saw the message “Waiting for DbSystem to become ACTIVE again” again and again. After a certain time, the status of the db system just change to Failed.
After my multiple tests, I had this issue every time I tried a failover after I tried a backup/restore. When I tried the failover process directly after creating my DB system with two nodes, I faced no issues.
If you are encountering the same issue, I found no other way to fix it, than deleting the whole DB system, starting from scratch, and testing a failover after the DB system creation.

Conclusion

Deploying PostgreSQL in the Oracle Cloud Infrastructure demands meticulous configuration and management steps. It involves provisioning, connectivity setup, replication testing, and failover management to ensure a robust and highly available database service. I feel like, at the moment, the PostgreSQL service is not totally operational but I’m pretty sure that the issues I faced will be fixed in the future.

https://blogs.oracle.com/cloud-infrastructure/post/first-principles-optimizing-postgresql-for-the-cloud
https://blogs.oracle.com/cloud-infrastructure/post/oci-database-postgres
https://medium.com/oracledevs/oci-database-with-postgresql-cheat-sheet-ee8173a0a2ba