I have recently blogged about how to install and configure Dynamic Scaling as an Oracle Linux Service daemon. See my blog, https://www.dbi-services.com/blog/how-to-automatically-manage-ocpu-on-exacc-using-dynamic-scaling/

In this blog I would like to show how to configure it as an Oracle Grid HA resource. This is the way we decided to go with at customer site. This will be an installation on a 2 nodes ExaCC cluster.

Read more: Configure Dynamic Scaling as Grid HA resource

Requirements

All following steps, described in previous blog, needs to be run on both nodes:

  • RPM package installation
  • oci-cli installation
  • OCI config file
  • OCI test
  • Dynamic Scaling test with check and getocpu option to ensure the installation is successful
  • Log file rotation, on the same log file from both nodes

For this cluster we used following Dynamic Scaling threshold.

--interval 60
--maxthreshold 75
--minthreshold 60
--maxocpu 48
--minocpu 4
--ocpu 4

Configure Dynamic Scaling as Grid HA resource

Check if no Dynamic Scaling process is existing on both nodes.

[root@ExaCC-cl01n1 ~]# ps -ef | grep -i [d]ynamic
[root@ExaCC-cl01n1 ~]#

[root@ExaCC-cl01n2 ~]# ps -ef | grep -i [d]ynamic
[root@ExaCC-cl01n2 ~]#

Check existing Grid resource related to Dynamic scaling.

[root@ExaCC-cl01n1 ~]# cd /u01/app/19.0.0.0/grid/bin/

[root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i .acfsvol01.acfs
ora.datac1.acfsvol01.acfs

Check that no Dynamic Scaling resource is today existing.

[root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i dynamicscaling
[root@ExaCC-cl01n1 bin]#

Create grid resource for Dynamic Scaling. vm-cluster-id and danymicscaling.srv needs to be adapted for each cluster.

[root@ExaCC-cl01n1 bin]# /u01/app/19.0.0.0/grid/bin/crsctl add resource dynamicscaling.srv \
> -type generic_application \
> -attr \
> "START_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin \
> --ocicli \
> --vm-cluster-id ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na \
> --ociprofile DEFAULT \
> --interval 60 \
> --maxthreshold 75 \
> --minthreshold 60 \
> --maxocpu 48 \
> --minocpu 4 \
> --ocpu 4 \
> --logpath /acfs01/dynscal_logs',
> STOP_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin stop_resource',
> CLEAN_PROGRAM='/opt/dynamicscaling/dynamicscaling.bin stop_resource',
> PID_FILES='/tmp/.dynamicscaling.pid',
> START_DEPENDENCIES='hard(ora.datac1.acfsvol01.acfs)',
> STOP_DEPENDENCIES='hard(ora.datac1.acfsvol01.acfs)',
> ENVIRONMENT_VARS='PATH=$PATH:/home/opc/bin,HOME=/home/opc,HTTP_PROXY=http://webproxy.domain.com:XXXX,HTTPS_PROXY=http://webproxy.domain.com:XXXX'"
[root@ExaCC-cl01n1 bin]#

Check Dynamic Scaling resource.

[root@ExaCC-cl01n1 bin]# ./crsctl stat res -t | grep -i dynamicscaling
dynamicscaling.srv

Start Dynamic Scaling resource

Let’s start Dynamic Scaling resource.

[root@ExaCC-cl01n1 bin]# ps -ef | grep -i [d]ynamic
[root@ExaCC-cl01n1 bin]# ./crsctl start resource dynamicscaling.srv
CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n2'
CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded

As per the output Dynamic Scaling has been started on node 2.

Which is correct as per the linux processes.

[root@ExaCC-cl01n1 bin]# ps -ef | grep -i [d]ynamic
[root@ExaCC-cl01n1 bin]#

[root@ExaCC-cl01n2 bin]# ps -ef | grep -i [d]ynamic
root      95918      1  0 09:53 ?        00:00:00 /opt/dynamicscaling/dynamicscaling.bin                                              --ocicli --vm-cluster-id ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na --ociprofile DEFAULT --interval 60 --maxthreshold 75 --minthreshold 60 --maxocpu 48 --minocpu 4 --ocpu 4 --logpath /acfs01/dynscal_logs
[root@ExaCC-cl01n2 bin]#

Let’s stop it.

[root@ExaCC-cl01n1 bin]# ./crsctl stop resource dynamicscaling.srv
CRS-2673: Attempting to stop 'dynamicscaling.srv' on 'ExaCC-cl01n2'
CRS-2677: Stop of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded

[root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv
NAME=dynamicscaling.srv
TYPE=generic_application
TARGET=OFFLINE
STATE=OFFLINE

And restart it again.

[root@ExaCC-cl01n1 bin]# ./crsctl start resource dynamicscaling.srv
CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n2'
CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded

[root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv
NAME=dynamicscaling.srv
TYPE=generic_application
TARGET=ONLINE
STATE=ONLINE on ExaCC-cl01n2

Resource has been started on node 2.

Relocate Grid resource for Dynamic Scaling to node 1

As we can see in the log, scale down is currently in progress. This should not be a problem for relocating the Grid resource.

[root@ExaCC-cl01n1 bin]# ls -ltrh /acfs01/dynscal_logs/
total 12K
-rw-r--r-- 1 root root 7.0K Jul 26 09:56 dynamicscaling.log

[root@ExaCC-cl01n1 bin]# tail -f /acfs01/dynscal_logs/dynamicscaling.log
2024-07-26 09:56:06: Current OCPU=24
2024-07-26 09:56:08: Local host load ......: 23.9
2024-07-26 09:56:08: Current load is under/equal minimum threshold '60' for '60' secs
2024-07-26 09:56:08: Checking DB System status
2024-07-26 09:56:08: Getting lifecycle-state
2024-07-26 09:56:09: DB System status......: AVAILABLE
2024-07-26 09:56:09: Resetting consecutive DB System 'UPDATING' status count
2024-07-26 09:56:09: Requesting OCPU scale-Down by a factor of '4'
2024-07-26 09:56:09: Scaling-down the core-count...
2024-07-26 09:56:11: Scaling-down in progress, sleeping 180 secs...

Relocate has been successfully executed.

[root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv
NAME=dynamicscaling.srv
TYPE=generic_application
TARGET=ONLINE
STATE=ONLINE on ExaCC-cl01n2

[root@ExaCC-cl01n1 bin]# ./crsctl relocate resource dynamicscaling.srv -n ExaCC-cl01n1
CRS-2673: Attempting to stop 'dynamicscaling.srv' on 'ExaCC-cl01n2'
CRS-2677: Stop of 'dynamicscaling.srv' on 'ExaCC-cl01n2' succeeded
CRS-2672: Attempting to start 'dynamicscaling.srv' on 'ExaCC-cl01n1'
CRS-2676: Start of 'dynamicscaling.srv' on 'ExaCC-cl01n1' succeeded

[root@ExaCC-cl01n1 bin]# ./crsctl status resource dynamicscaling.srv
NAME=dynamicscaling.srv
TYPE=generic_application
TARGET=ONLINE
STATE=ONLINE on ExaCC-cl01n1

And the log is ok.

[root@ExaCC-cl01n1 bin]# tail -f /acfs01/dynscal_logs/dynamicscaling.log
2024-07-26 09:58:34: Getting lifecycle-state
2024-07-26 09:58:35: DB System status......: UPDATING
2024-07-26 09:58:35: Consecutive DB System 'UPDATING' status count is 0
2024-07-26 09:58:35: Checking current core count
2024-07-26 09:58:35: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli
2024-07-26 09:58:36: Running on ExaCC getting cpus enabled
2024-07-26 09:58:36: Current OCPU=24
2024-07-26 09:58:38: Local host load ......: 26.8
2024-07-26 09:58:38: CPU usage is under minthreshold
2024-07-26 09:58:38: Next measure in about 60 secs...
2024-07-26 09:59:38: Checking current core count
2024-07-26 09:59:38: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli
2024-07-26 09:59:39: Running on ExaCC getting cpus enabled
2024-07-26 09:59:39: Current OCPU=24
2024-07-26 09:59:41: Local host load ......: 13.1
2024-07-26 09:59:41: Current load is under/equal minimum threshold '60' for '60' secs
2024-07-26 09:59:41: Checking DB System status
2024-07-26 09:59:41: Getting lifecycle-state
2024-07-26 09:59:42: DB System status......: UPDATING
2024-07-26 09:59:42: Scaling-Down currently not possible due to DB System status 'UPDATING'
2024-07-26 09:59:42: zzzzzz
2024-07-26 09:59:42: Checking DB System status
2024-07-26 09:59:42: Getting lifecycle-state
2024-07-26 09:59:43: DB System status......: UPDATING
2024-07-26 09:59:43: Consecutive DB System 'UPDATING' status count is 1
2024-07-26 09:59:43: Checking current core count
2024-07-26 09:59:43: Getting cpu core count for 'ocid1.vmcluster.oc1.eu-zurich-1.an5he*****************************************47na' with oci-cli
2024-07-26 09:59:44: Running on ExaCC getting cpus enabled
2024-07-26 09:59:44: Current OCPU=24
2024-07-26 09:59:46: Local host load ......: 10.3
2024-07-26 09:59:46: CPU usage is under minthreshold
2024-07-26 09:59:46: Next measure in about 60 secs...

To wrap up

Dynamic Scaling as Grid Resource makes even more sense for ExaCC cluster automatic OCPU scaling.