After having recently deployed a new customer system running Oracle SE2 databases in Disaster Recovery environment using dbvisit StandbyMP v11.4.1 on Oracle Linux 8.8, I did some final test running a switchover. The switchover failed with following error:
ORA-27125: unable to create shared memory segment
Problem description
Running switchover command was failing:
oracle@srv01:~/ [MYDB] /u01/app/dbvisit/standbymp/oracle/dbvctl -d MYDB -o switchover ============================================================= Dbvisit Standby Database Technology (11.4.1) (pid 9997) dbvctl started on srv01: Thu Dec 14 11:48:41 2023 ============================================================= >>> Starting Switchover between srv01 and srv02 Running pre-checks ... done =>Enter Custom User Script to run after Switchover is complete on srv01 (leave blank for no script): []: =>Enter Custom User Script to run after Switchover is complete on srv02 (leave blank for no script): []: =>Do you want to proceed with Graceful Switchover? [no]: yes Your input: 1 Is this correct? [Yes]: Pre processing ... done Processing primary ... done Processing standby ... failed Rolling back Primary ... done Rolling back Standby ... failed >>> Database on server srv01 is still a Primary Database >>> Database on server srv02 is in unknown state <<<>>> PID:9997 TRACEFILE:9997_dbvctl_switchover_MYDB_202312141148.trc >>>> Dbvisit Standby terminated <<<<
Looking in the appropriate log, I could find Oracle error:
oracle@srv01:/u01/app/dbvisit/standbymp/oracle/trace/ [MYDB] cat 9997_dbvctl_switchover_MYDB_202312141148.trc ... ... ... 117978 20231214 11:50:39 main::UTIL_rsh_cmd: command:"/u01/app/dbvisit/standbymp/oracle/dbvctl" -d MYDB -f gs_process_standby --show_trace 117979 20231214 11:50:39 main::UTIL_rsh_cmd: noerror= 117979 20231214 11:50:39 main::UTIL_run_command: ORACLE_HOME: /u01/app/oracle/product/19_20_0_0_RU230718_v0 117979 20231214 11:50:39 main::UTIL_run_command: PATH: /usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/sbin:/sbin:/u01/app/oracle/product/19_20_0_0_RU230718_v0/bin 117979 20231214 11:50:39 main::UTIL_run_command: LD_LIBRARY_PATH: /u01/app/dbvisit/standbymp/oracle/lib:/u01/app/oracle/product/19_20_0_0_RU230718_v0/lib 117979 20231214 11:50:39 main::UTIL_run_command: "/u01/app/dbvisit/standbymp/dbvnet/dbvnet" -c "/u01/app/dbvisit/standbymp/dbvnet/conf/dbvnetd.conf" -a srv02 -p 7890 -e "\"/u01/app/dbvisit/standbymp/oracle/dbvctl\" -d MYDB -f gs_process_standby --show_trace " 2>&1 117979 20231214 11:50:39 main::UTIL_run_command: noerror=1 123500 20231214 11:50:45 main::UTIL_run_command: return_code=256 123500 20231214 11:50:45 main::UTIL_run_command: ===>command output start 123500 20231214 11:50:45 main::UTIL_run_command: DBVISIT_TRACEFILE=/u01/app/dbvisit/standbymp/oracle/trace/235662_18_dbvctl_f_gs_process_standby_MYDB_202312141150.trc 123500 20231214 11:50:45 main::UTIL_run_command: 123500 20231214 11:50:45 main::UTIL_run_command: <<<>>> 123500 20231214 11:50:45 main::UTIL_run_command: 123500 20231214 11:50:45 main::UTIL_run_command: PID:235662 123500 20231214 11:50:45 main::UTIL_run_command: TRACEFILE:235662_18_dbvctl_f_gs_process_standby_MYDB_202312141150.trc 123500 20231214 11:50:45 main::UTIL_run_command: SERVER:srv02 123501 20231214 11:50:45 main::UTIL_run_command: ERROR_CODE:8002 123501 20231214 11:50:45 main::UTIL_run_command: Error executing RMAN command. 123501 20231214 11:50:45 main::UTIL_run_command: RMAN-03002: failure of startup command at 12/14/2023 11:50:45 123501 20231214 11:50:45 main::UTIL_run_command: RMAN-04014: startup failed: ORA-27125: unable to create shared memory segment 123501 20231214 11:50:45 main::UTIL_run_command: Linux-x86_64 Error: 1: Operation not permitted 123501 20231214 11:50:45 main::UTIL_run_command: See tracefile for details 123501 20231214 11:50:45 main::UTIL_run_command: >>>> Dbvisit Standby terminated <<<command output end 123501 20231214 11:50:45 main::cmn_extract_trace_name: tracefile=/u01/app/dbvisit/standbymp/oracle/trace/235662_18_dbvctl_f_gs_process_standby_MYDB_202312141150.trc (remote) 123501 20231214 11:50:45 main::UTIL_rsh_cmd: ===>source_output start 123501 20231214 11:50:45 main::UTIL_rsh_cmd: 123501 20231214 11:50:45 main::UTIL_rsh_cmd: <<<>>> ... ... ...
Troubleshooting
It seems that when dbvisit wanted to start the standby database it could not get enough available shared memory from the hugepages, albeit everything is ok on the OS configuration side. Stopping and starting both primary and standby databases manually with sqlplus tool works perfectly.
Oracle Linux limits are configured for Oracle user when running Oracle Preinstallation RPM. Resource Limits can also be checked in following oracle documentation : https://docs.oracle.com/en/database/oracle/oracle-database/19/ladbi/checking-resource-limits-for-oracle-software-installation-users.html#GUID-293874BD-8069-470F-BEBF-A77C06618D5A
Checking system limits for Oracle user as per the soft and hard limits for the open file descriptors (resource nofile) seemed ok :
oracle@srv01:~/ [rdbms1900] ulimit -Sn 1024 oracle@srv01:~/ [rdbms1900] ulimit -Hn 65536
Checking system limits for Oracle user as per the soft and hard limits for the number of processes available to a single user (resource nproc) seemed ok:
oracle@srv01:~/ [rdbms1900] ulimit -Su 16384 oracle@srv01:~/ [rdbms1900] ulimit -Hu 16384
Checking system limits for Oracle user as per the soft limit for the size of the stack segment of the process (resource stack) seemed ok:
oracle@srv01:~/ [rdbms1900] ulimit -Ss 10240 oracle@srv01:~/ [rdbms1900] ulimit -Hs 32768
Checking system limits for Oracle user as per the soft and hard limit for maximum locked memory limit (resource memlock) seemed ok:
oracle@srv01:~/ [rdbms1900] ulimit -Sl 134217728 oracle@srv01:~/ [rdbms1900] ulimit -Hl 134217728 oracle@srv01:~/ [rdbms1900] grep memlock /etc/security/limits.conf # - memlock - max locked-in-memory address space (KB) oracle soft memlock 38797312 oracle hard memlock 38797312
The memlock limit for Oracle user configured in limits.conf file is 38797312 KB which makes 37 GB.
The memlock limit we can see for Oracle user with ulimit command is 134217728 KB which makes 128 GB. This might be an unlimited configuration.
In any case we are all good with all those limits. By the way, we could have seen all those limits with one command:
oracle@bvboracle01:~/ [rdbms1900] ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 255582 max locked memory (kbytes, -l) 134217728 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 16384 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
So the problem might more come from dbvagentmanager which could not get the system limits configured in /etc/security/limits.conf when starting.
Solution
dbvagentmanager systemctl service has been created as per the installation guide as following:
[root@srv01 ~]# systemctl list-units --type=service | grep -i dbv [root@srv01 ~]# /u01/app/dbvisit/standbymp/bin/dbvagentmanager service install --user oracle [root@srv01 ~]# systemctl list-units --type=service | grep -i dbv dbvagentmanager.service loaded activating auto-restart The Dbvisit StandbyMP Agent provides connectivity to databases on this computer. This Agent is used & managed by the Dbvisit StandbyMP Control Center. [root@srv01 ~]# /u01/app/dbvisit/standbymp/bin/dbvagentmanager service start [root@srv01 ~]# ps -ef | grep [d]bv oracle 4000 1 0 15:38 ? 00:00:00 /u01/app/dbvisit/standbymp/bin/dbvagentmanager service run
I then decided to add following limits in the dbvagentmanager service itself:
- LimitNOFILE=65536
- LimitNPROC=16384
- LimitMEMLOCK=infinity
On both primary and standby servers, I updated accordingly systemctl dbvagentmanager service:
[root@srv01 ~]# sudo systemctl stop dbvagentmanager [root@srv01 ~]# sudo systemctl disable dbvagentmanager Removed /etc/systemd/system/multi-user.target.wants/dbvagentmanager.service. [root@srv01 ~]# cat /etc/systemd/system/dbvagentmanager.service [Unit] Description=The Dbvisit StandbyMP Agent provides connectivity to databases on this computer. This Agent is used & managed by the Dbvisit StandbyMP Control Center. ConditionFileIsExecutable=/u01/app/dbvisit/standbymp/bin/dbvagentmanager [Service] StartLimitInterval=5 StartLimitBurst=10 ExecStart=/u01/app/dbvisit/standbymp/bin/dbvagentmanager "service" "run" User=oracle Restart=always KillMode=process RestartSec=120 EnvironmentFile=-/etc/sysconfig/dbvagentmanager [Install] WantedBy=multi-user.target [root@srv01 ~]# vi /etc/systemd/system/dbvagentmanager.serviceagentmanager.service [root@srv01 ~]# cat /etc/systemd/system/dbvagentmanager.service [Unit] Description=The Dbvisit StandbyMP Agent provides connectivity to databases on this computer. This Agent is used & managed by the Dbvisit StandbyMP Control Center. ConditionFileIsExecutable=/u01/app/dbvisit/standbymp/bin/dbvagentmanager [Service] StartLimitInterval=5 StartLimitBurst=10 ExecStart=/u01/app/dbvisit/standbymp/bin/dbvagentmanager "service" "run" User=oracle Restart=always KillMode=process RestartSec=120 EnvironmentFile=-/etc/sysconfig/dbvagentmanager LimitNOFILE=65536 LimitNPROC=16384 LimitMEMLOCK=infinity [Install] WantedBy=multi-user.target [root@srv01 ~]# sudo systemctl daemon-reload [root@srv01 ~]# sudo systemctl start dbvagentmanager [root@srv01 ~]# sudo systemctl enable dbvagentmanager Created symlink /etc/systemd/system/multi-user.target.wants/dbvagentmanager.service → /etc/systemd/system/dbvagentmanager.service.
I could find the same explanation on a dbvisit support article for similar problem when creating the Oracle standby database with dbvisit : https://support.dbvisit.com/hc/en-us/articles/6488001083023-CSD-fails-with-RMAN-04014-startup-failed-ORA-27125-unable-to-create-shared-memory-segment-
Final switchover test
And I could finally run successfully the switchover:
oracle@srv01:~/ [MYDB] /u01/app/dbvisit/standbymp/oracle/dbvctl -d MYDB -o switchover ============================================================= Dbvisit Standby Database Technology (11.4.1) (pid 12677) dbvctl started on srv01: Thu Dec 14 12:12:09 2023 ============================================================= >>> Starting Switchover between srv01 and srv02 Running pre-checks ... done =>Enter Custom User Script to run after Switchover is complete on srv01 (leave blank for no script): []: =>Enter Custom User Script to run after Switchover is complete on srv02 (leave blank for no script): []: =>Do you want to proceed with Graceful Switchover? [no]: yes Your input: 1 Is this correct? [Yes]: Pre processing ... done Processing primary ... done Processing standby ... done Converting standby ... done Converting primary ... done Completing ... done Synchronizing ... done Post processing ... done >>> Graceful switchover completed. Primary Database Server: srv02 Standby Database Server: srv01 >>> Dbvisit Standby can be run as per normal: dbvctl -d MYDB As part of the Switchover process, the primary and standby controlfiles have been exchanged.Unless you are using RMAN catalog database, you may need to cross-check all backups and review RMAN settings using the SHOW ALL command on the new Primary/Standby databases. Confirm the path set for the SNAPSHOT CONTROLFILE NAME TO setting is valid on both sides. PID:12677 TRACE:12677_dbvctl_switchover_MYDB_202312141212.trc ============================================================= dbvctl ended on srv01: Thu Dec 14 12:15:13 2023 =============================================================