I have recently blogged about how to install and configure Dynamic Scaling as an Oracle Linux Service daemon. See my blog, https://www.dbi-services.com/blog/how-to-automatically-manage-ocpu-on-exacc-using-dynamic-scaling/
In this blog I would like to show how to configure it as an Oracle Grid HA resource. This is the way we decided to go with at customer site. This will be an installation on a 2 nodes ExaCC cluster.
Read more: Configure Dynamic Scaling as Grid HA resourceRequirements
All following steps, described in previous blog, needs to be run on both nodes:
- RPM package installation
- oci-cli installation
- OCI config file
- OCI test
- Dynamic Scaling test with check and getocpu option to ensure the installation is successful
- Log file rotation, on the same log file from both nodes
For this cluster we used following Dynamic Scaling threshold.
--interval 60
--maxthreshold 75
--minthreshold 60
--maxocpu 48
--minocpu 4
--ocpu 4
Configure Dynamic Scaling as Grid HA resource
Check if no Dynamic Scaling process is existing on both nodes.
[root@ExaCC-cl01n1 ~]# ps -ef | grep -i [d]ynamic [root@ExaCC-cl01n1 ~]# [root@ExaCC-cl01n2 ~]# ps -ef | grep -i [d]ynamic [root@ExaCC-cl01n2 ~]#
Check existing Grid resource related to Dynamic scaling.
[root@ExaCC-cl01n1 ~]# cd /u01/app/19.0.0.0/grid/bin/ [root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i .acfsvol01.acfs ora.datac1.acfsvol01.acfs
Check that no Dynamic Scaling resource is today existing.
[root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i dynamicscaling [root@ExaCC-cl01n1 bin]#
Create grid resource for Dynamic Scaling. vm-cluster-id and danymicscaling.srv needs to be adapted for each cluster.
[root@ExaCC-cl01n1 bin]# /u01/app/19.0.0.0/grid/bin/crsctl add resource dynamicscaling.srv \ > -type generic_application \ > -attr \ > "START_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin \ > --ocicli \ > --vm-cluster-id ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na \ > --ociprofile DEFAULT \ > --interval 60 \ > --maxthreshold 75 \ > --minthreshold 60 \ > --maxocpu 48 \ > --minocpu 4 \ > --ocpu 4 \ > --logpath /acfs01/dynscal_logs', > STOP_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin stop_resource', > CLEAN_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin stop_resource', > PID_FILES='/tmp/.dynamicscaling.pid', > START_DEPENDENCIES='hard(ora.datac1.acfsvol01.acfs)', > STOP_DEPENDENCIES='hard(ora.datac1.acfsvol01.acfs)', > ENVIRONMENT_VARS='PATH=$PATH:/home/opc/bin,HOME=/home/opc,HTTP_PROXY=http://webproxy.domain.com:XXXX,HTTPS_PROXY=http://webproxy.domain.com:XXXX'" [root@ExaCC-cl01n1 bin]#
Check Dynamic Scaling resource.
[root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i dynamicscaling dynamicscaling.srv
Start Dynamic Scaling resource
Let’s start Dynamic Scaling resource.
[root@ExaCC-cl01n1 bin]# ps -ef | grep -i [d]ynamic [root@ExaCC-cl01n1 bin]# ./crsctl start resource dynamicscaling.srv CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n2' CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded
As per the output Dynamic Scaling has been started on node 2.
Which is correct as per the linux processes.
[root@ExaCC-cl01n1 bin]# ps -ef | grep -i [d]ynamic [root@ExaCC-cl01n1 bin]# [root@ExaCC-cl01n2 bin]# ps -ef | grep -i [d]ynamic root 95918 1 0 09:53 ? 00:00:00 /opt/dynamicscaling/dynamicscaling.bin --ocicli --vm-cluster-id ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na --ociprofile DEFAULT --interval 60 --maxthreshold 75 --minthreshold 60 --maxocpu 48 --minocpu 4 --ocpu 4 --logpath /acfs01/dynscal_logs [root@ExaCC-cl01n2 bin]#
Let’s stop it.
[root@ExaCC-cl01n1 bin]# ./crsctl stop resource dynamicscaling.srv CRS-2673: Attempting to stop 'dynamicscaling.srv' on 'ExaCC-cl01n2' CRS-2677: Stop of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded [root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv NAME=dynamicscaling.srv TYPE=generic_application TARGET=OFFLINE STATE=OFFLINE
And restart it again.
[root@ExaCC-cl01n1 bin]# ./crsctl start resource dynamicscaling.srv CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n2' CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded [root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv NAME=dynamicscaling.srv TYPE=generic_application TARGET=ONLINE STATE=ONLINE on ExaCC-cl01n2
Resource has been started on node 2.
Relocate Grid resource for Dynamic Scaling to node 1
As we can see in the log, scale down is currently in progress. This should not be a problem for relocating the Grid resource.
[root@ExaCC-cl01n1 bin]# ls -ltrh /acfs01/dynscal_logs/ total 12K -rw-r--r-- 1 root root 7.0K Jul 26 09:56 dynamicscaling.log [root@ExaCC-cl01n1 bin]# tail -f /acfs01/dynscal_logs/dynamicscaling.log 2024-07-26 09:56:06: Current OCPU=24 2024-07-26 09:56:08: Local host load ......: 23.9 2024-07-26 09:56:08: Current load is under/equal minimum threshold '60' for '60' secs 2024-07-26 09:56:08: Checking DB System status 2024-07-26 09:56:08: Getting lifecycle-state 2024-07-26 09:56:09: DB System status......: AVAILABLE 2024-07-26 09:56:09: Resetting consecutive DB System 'UPDATING' status count 2024-07-26 09:56:09: Requesting OCPU scale-Down by a factor of '4' 2024-07-26 09:56:09: Scaling-down the core-count... 2024-07-26 09:56:11: Scaling-down in progress, sleeping 180 secs...
Relocate has been successfully executed.
[root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv NAME=dynamicscaling.srv TYPE=generic_application TARGET=ONLINE STATE=ONLINE on ExaCC-cl01n2 [root@ExaCC-cl01n1 bin]# ./crsctl relocate resource dynamicscaling.srv -n ExaCC-cl01n1 CRS-2673: Attempting to stop 'dynamicscaling.srv' on 'ExaCC-cl01n2' CRS-2677: Stop of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n1' CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n1' succeeded [root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv NAME=dynamicscaling.srv TYPE=generic_application TARGET=ONLINE STATE=ONLINE on ExaCC-cl01n1
And the log is ok.
[root@ExaCC-cl01n1 bin]# tail -f /acfs01/dynscal_logs/dynamicscaling.log 2024-07-26 09:58:34: Getting lifecycle-state 2024-07-26 09:58:35: DB System status......: UPDATING 2024-07-26 09:58:35: Consecutive DB System 'UPDATING' status count is 0 2024-07-26 09:58:35: Checking current core count 2024-07-26 09:58:35: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli 2024-07-26 09:58:36: Running on ExaCC getting cpus enabled 2024-07-26 09:58:36: Current OCPU=24 2024-07-26 09:58:38: Local host load ......: 26.8 2024-07-26 09:58:38: CPU usage is under minthreshold 2024-07-26 09:58:38: Next measure in about 60 secs... 2024-07-26 09:59:38: Checking current core count 2024-07-26 09:59:38: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli 2024-07-26 09:59:39: Running on ExaCC getting cpus enabled 2024-07-26 09:59:39: Current OCPU=24 2024-07-26 09:59:41: Local host load ......: 13.1 2024-07-26 09:59:41: Current load is under/equal minimum threshold '60' for '60' secs 2024-07-26 09:59:41: Checking DB System status 2024-07-26 09:59:41: Getting lifecycle-state 2024-07-26 09:59:42: DB System status......: UPDATING 2024-07-26 09:59:42: Scaling-Down currently not possible due to DB System status 'UPDATING' 2024-07-26 09:59:42: zzzzzz 2024-07-26 09:59:42: Checking DB System status 2024-07-26 09:59:42: Getting lifecycle-state 2024-07-26 09:59:43: DB System status......: UPDATING 2024-07-26 09:59:43: Consecutive DB System 'UPDATING' status count is 1 2024-07-26 09:59:43: Checking current core count 2024-07-26 09:59:43: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli 2024-07-26 09:59:44: Running on ExaCC getting cpus enabled 2024-07-26 09:59:44: Current OCPU=24 2024-07-26 09:59:46: Local host load ......: 10.3 2024-07-26 09:59:46: CPU usage is under minthreshold 2024-07-26 09:59:46: Next measure in about 60 secs...
To wrap up
Dynamic Scaling as Grid Resource makes even more sense for ExaCC cluster automatic OCPU scaling.