Introduction to Persistent Memory

In the last few years, storage media have become more and more advanced and faster. Today, SSD disks are almost standard as storage and if an Oracle database really has an I/O critical workload, nvme storage is often used.
The latest evolution in database storage is persistent memory which improves latency and I/O throughput compared to nvme. Oracle has also recognized the advantage of PMEM, and it is also used in the latest generation of Exadata.
If you want to know more about persistent Memory I recommend this blog
One of the most important features of an oracle database on persistent memory is that there is no physical I/O between the SGA and the data files, but the blocks are allocated by memory copy, which is unbeatable fast.

In the further part of this article I will show how to create an Oracle database 19c on persistent memory and which restrictions apply.

Restrictions with Oracle 19c and persistent Memory

There are still a few limitation about PMEM and Oracle 19c. The configuration has been simplified with 21c, but 21c is an innovation release, so most databases are still running on version 19c. For this reason this setup is shown here. The following restrictions apply:

  1. You have to be at least on Version 19.12
  2. You database has to run on Enterprise Edition
  3. PMEM in APP Direct Mode is supported only with Oracle Memory Speed Filesystem in Version 19c. OMS is not mandatory in 21c
  4. You can mount a maximum of 2 OMS Filesystem per database

Configuration

So now lets start with the technical part and how we prepare the os and the database be running on persistent memory

Check PMEM Configuration

We are running on a 4 socket server and each socket has 1 PMEM module of 3TB. This modules can be used in app direct mode, which allows us to store database files on it.
The following query shows the server configuration. You see that the Server has 3TB of RAM and additionally 12TB of PMEM in AppDirect Mode (4 x 3 TB)

ipmctl show -memoryresources
MemoryType 	| DDR 		| PMemModule 	| Total
=============================================================
Volatile 	| 3072.000 GiB 	| 0.000 GiB 	| 3072.000 GiB
AppDirect 	| - 		| 12168.000 GiB | 12168.000 GiB
Cache 		| 0.000 GiB 	| - 		| 0.000 GiB
Inaccessible 	| 0.000 GiB 	| 17.244 GiB 	| 17.244 GiB
Physical 	| 3072.000 GiB 	| 12185.244 GiB | 15257.244 GiB
ipmctl show -region
SocketID | ISetID | PersistentMemoryType | Capacity | FreeCapacity | HealthState
==================================================================================================
0x0000 | 0xfb7a7f48d30e2ccc | AppDirect | 3042.000 GiB | 0.000 GiB | Healthy
0x0001 | 0xdd287f48160d2ccc | AppDirect | 3042.000 GiB | 0.000 GiB | Healthy
0x0002 | 0xa8bd7f4889eb2ccc | AppDirect | 3042.000 GiB | 0.000 GiB | Healthy
0x0003 | 0xf5be7f48a2102ccc | AppDirect | 3042.000 GiB | 0.000 GiB | Health

To make the configuration for the blog clearer I will configure only 1 module

Configure the Namespaces 

PMEM has no partitions instead we have to configure a namespace on the PMEM Modul. To run oracle database on it we have to use the default mode “fsdax” that allows us to use the DAX (Direct Access ) capabilities of Linux Filesystems (XSF / EXT4)

ndctl create-namespace

# Check created Namespace
ndctl list -u
[
 {
 "dev":"namespace1.0",
 "mode":"fsdax",
 ...

# Reconfiguring the NVDIMM namespaces from a different mode to fsdax can be done with this command
ndctl create-namespace --force --reconfig=namespace1.0 --mode=fsdax --map=mem
ndctl list -u
 [
 {
 "dev":"namespace1.0",
 "mode":"fsdax",
 ...

After creating the Namespace on the first PMEM Device we will see a device /dev/pmem0

Setting UP Direct Access (DAX) Filesystem

The XFS file system features DAX-capable mounts. Ensure that XFS creates mappings from the physical pages in the PMEM device to virtual pages using HugePages mappings.
To achieve that we have to format the device by specifying the stripe unit size (su) as 2 MB and stripe width size (sw) as 1.

# reflink option must be disabled, because reflink and dax do not work together
mkfs.xfs -m reflink=0 -d su=2m,sw=1 /dev/pmem0 

# Mount Device with DAX option. 
# Don't forget to configure the mount in /etc/fstab that he is still available after a reboot
mount -o dax /dev/pmem0 /mnt/pmem0
 
# verify Mount
mount | grep dax
 
# Sample Output. Its important that the mount option is dax=always
mount | grep dax
/dev/pmem0 on /mnt/pmem0 type xfs 
(rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,sunit=4096,swidth=4096,noquota)

Create Uber File for Oracle Memory Speed Filesystem (OMS)

An uber file is similar to a volume in a traditional kernel-based file system. The uber file stores all the metadata and data for the Oracle Memory Speed Filesystem
The uber file has a one-to-one relationship with an Oracle instance. You formalize the association by ending the uber file name with the Oracle instance identifier (SID). In our example we will create the database ORAPMEM

# create uber file with fallocate 
fallocate -l 800G /mnt/pmem0/omsuberfile.ORAPMEM (Size must be an exact multiple of 2MB) 

# Change Permission of OMS uber File to oracle:oinstall 
chown oracle:oinstall /mnt/pmem0/omsuberfile.ORAPMEM 
chmod 644 /mnt/pmem0/omsuberfile.ORAPMEM 

# Check that extents are correctly aligned on 2MB boundaries 
xfs_bmap /mnt/pmem0/omsuberfile.ORAPMEM 

# Sample Output (Extent Num:[Start Offset..End Offset]:Start block..End Block) 
/mnt/pmem0/omsuberfile.test: 
0: [0..4095]: 1765978112..1765982207 
1: [4096..16773119]: 1765982208..1782751231 
2: [16773120..20971519]: 1782751232..17869496

Create Mountpoint for OMS Filesystem

The next step is to prepare a mountpoint for the oms filesystem. The mount point must meet the following requirements:

  1. The mount directory must have the name of the SID of the database
  2. The directory must have read only privileges. With read / write we will get a error during mount command
mkdir -p /u03/oms_fs1/oracle/oradata/ORAPMEM
chmod -wx /u03/oms_fs1/oracle/oradata/ORAPMEM
chown -R oracle:oinstall /u03

Prepare database creation scripts with DBCA

The next step is to create the database scripts for the database with dbca. It’s important to know, that only the scripts can be generated, with dbca but not the databases. The reason is, that at this point the database does not exist in the /etc/oratab and as long the database is not present on the server we are not able to create and mount the OMS Filesystem for the database Files. This restriction only applies in Oracle 19c, because with Oracle 21c the oms filesystem is not longer mandatory
The important steps during the setup with dbca according to Oracle Support Note 2795728.1

  1. Choose Advanced Configuration
  2. Choose Single Instance Database
  3. Chose Template General Purpose or Transaction Processing
  4. Set DB Name in this example LI99DB02
  5. Create as non Container Database
  6. Set Storage Attribute to Filesystem and as Database file location your prepared mountpoint /u03/omsfs_1/oradata/{DB_UNIQUE_NAME}
  7. Use OMF
  8. Deselect “Create database” and “Save as DB Template”
  9. Select “Generate Database creation scripts”
  10. Click Finish to store the Scripts

Check the Init.ora that were created and verify, that the “db_file_create_dest” Parameter is set to the mount Point for the OMS Filesystem (/u03/omsfs_1/oradata/{DB_UNIQUE_NAME})

Link OMS in Oracle Binaries 

We have to link the OMS Feature into the kernel, because OMS is not activated per default in the oracle binaries

export ORACLE_HOME=/u00/app/oracle/product/19.0.0.0.0.F
export ORACLE_SID=ORAPMEM
export LD_LIBRARY_PATH=/u00/app/oracle/product/19.0.0.0.0.F/lib
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk oms_on

Start OMS Daemon

At start up, the OMS daemon creates a shared memory segment for the exclusive use of OMS. There will be created one daemon per Instance.
For that reason you have to set the instance identifier (SID) before you start the OMS daemon
You should not manually delete the shared memory segment or delete/edit the configuration file.
The Trace Files of the daemon are in $ORACLE_BASE/diag/oms.
All Daemons that were running before a Server reboot will be restarted after a reboot.

Verify that LD_LIBRARY_PATH is set to $ORACLE_HOME/lib because if not you get the following error during daemon startup
./oms_daemon: error while loading shared libraries: libclntsh.so.19.1: cannot open shared object file:No such file or directory

export ORACLE_SID=ORAPMEM
$ORACLE_HOME/bin/oms_daemon
 
# Sample Output
Starting daemon as a detached background.
OMS binary located at /u00/app/oracle/product/19.0.0.0.0.F/bin/oms_daemon
OMS daemon startup: oms_test successfully created
OMS daemon creating tracefile
/u00/app/oracle/diag/oms/ORAPMEM_oms_20758.trc

Create and Mount OMS Filesystem

Now we are ready to create and mount the OMS filesystem. To do that oracle provides a utility called “omsfscmds”.
If I tried to startup the tool from the $ORACLE_HOME/bin directory i got the error that some libraries are missing.
After a bit research i found out, that you have to install an additional package. This is not documented in the oracle support.

yum install librdmacm

Now we can run omsfscmds and create & mount the filesystem

# create and mount file system 
$ORACLE_HOME/bin/omsfscmds
OMS> mkfs /mnt/pmem0/omsuberfile.ORAPMEM
OMS:mkfs:No blocksize specified, using 4K
OMS:mkfs: Device /mnt/pmem0/omsuberfile.ORAPMEM formatted with blocksize 4096

OMS> mount /mnt/pmem0/omsuberfile.ORAPMEM /u03/oms_fs1/oracle/oradata/ORAPMEM
OMS:mount: Mounted /mnt/pmem0/omsuberfile.ORAPMEM at /u03/oms_fs1/oracle/oradata/ORAPMEM
# Check if Filesystem is correctly mounted
OMS> lsmount
fsindex : 0
Mountpt : /u03/oms_fs1/oracle/oradata/ORAPMEM
Deviceid: /mnt/pmem0/omsuberfile.ORAPMEM

Create Database and enjoy performance 🙂

f the filesystem is successfully mounted, you can execute the prepared script to create the database on the OMS File System.
In the next days I will do a follow up with some benchmark numbers with oracle on PMEM


Thumbnail [60x60]
by
Alain Fuhrer