I have recently deployed a new ODA with last version 19.14 at customer side, and we could face an interesting failure deploying the appliance, which I would like to share with you.
Problem description
After reimaging the ODA and configuring the network interface with configure-firstnet, we created the appliance :
[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json Enter an initial password for Web Console account (oda-admin): Retype the password for Web Console account (oda-admin): User 'oda-admin' created successfully... { "jobId" : "dacff86d-122c-4013-8f54-59ee29c919ca", "status" : "Created", "message" : null, "reports" : [ ], "createTimestamp" : "March 09, 2022 19:25:01 PM UTC", "resourceList" : [ ], "description" : "Provisioning service creation", "updatedTime" : "March 09, 2022 19:25:01 PM UTC" }
And we faced following failure :
[root@oak ~]# odacli describe-job -i dacff86d-122c-4013-8f54-59ee29c919ca Job details ---------------------------------------------------------------- ID: dacff86d-122c-4013-8f54-59ee29c919ca Description: Provisioning service creation Status: Failure Created: March 9, 2022 8:25:01 PM CET Message: DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4991782420745951208.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400 PRKC-1191 : Remote command execution setup check for node ODA01 using shell /usr/bin/ssh failed. FIPS mode initializedNo ECDSA host key is known for ODA01 and you have requested strict checking.Host key verification failed... Task Name Start Time End Time Status ---------------------------------------- ----------------------------------- ----------------------------------- ---------- Storage discovery March 9, 2022 12:28:54 PM CET March 9, 2022 12:33:21 PM CET Success Creating wallet for Root User March 9, 2022 12:33:21 PM CET March 9, 2022 12:33:25 PM CET Success Creating wallet for ASM Client March 9, 2022 12:33:25 PM CET March 9, 2022 12:33:28 PM CET Success Grid stack creation March 9, 2022 12:33:29 PM CET March 9, 2022 12:34:29 PM CET Failure Provisioning GI with RHP March 9, 2022 12:33:29 PM CET March 9, 2022 12:34:29 PM CET Failure Provisioning service creation March 9, 2022 8:25:02 PM CET March 9, 2022 12:34:29 PM CET Failure Provisioning service creation March 9, 2022 8:25:02 PM CET March 9, 2022 12:34:29 PM CET Failure Setting up Network March 9, 2022 8:25:03 PM CET March 9, 2022 8:25:03 PM CET Success Setting up Vlan March 9, 2022 8:25:16 PM CET March 9, 2022 8:25:16 PM CET Success Setting up Network March 9, 2022 8:25:30 PM CET March 9, 2022 8:25:30 PM CET Success network update March 9, 2022 8:25:47 PM CET March 9, 2022 8:26:01 PM CET Success updating network March 9, 2022 8:25:47 PM CET March 9, 2022 8:26:01 PM CET Success attach to bridge March 9, 2022 8:25:47 PM CET March 9, 2022 8:25:47 PM CET Success OS usergroup 'asmdba'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS usergroup 'asmoper'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS usergroup 'asmadmin'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS usergroup 'dba'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS usergroup 'dbaoper'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS usergroup 'oinstall'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS user 'grid'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success OS user 'oracle'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success Default backup policy creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:02 PM CET Success Backup config metadata persist March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success Grant permission to RHP files March 9, 2022 8:26:02 PM CET March 9, 2022 8:26:02 PM CET Success Add SYSNAME in Env March 9, 2022 8:26:02 PM CET March 9, 2022 8:26:02 PM CET Success Install oracle-ahf March 9, 2022 8:26:02 PM CET March 9, 2022 8:27:00 PM CET Success Grid home creation March 9, 2022 8:27:05 PM CET March 9, 2022 12:28:54 PM CET Success Creating GI home directories March 9, 2022 8:27:05 PM CET March 9, 2022 8:27:05 PM CET Success Extract GI clone March 9, 2022 8:27:05 PM CET March 9, 2022 12:28:54 PM CET Success
Analyses
Analyzing dcs-agent.log, we could find the following error :
! com.oracle.dcs.commons.exception.DcsTaskFailureException: DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4108866401395331366.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400 ! PRKC-1191 : Remote command execution setup check for node delphes1 using shell /usr/bin/ssh failed. ! FIPS mode initializedNo ECDSA host key is known for delphes1 and you have requested strict checking.Host key verification failed... ! at com.oracle.dcs.agent.task.TaskZJsonRpc.processJsonResponse(TaskZJsonRpc.java:124) ! at com.oracle.dcs.agent.task.TaskZJsonRpcExt.callInternal(TaskZJsonRpcExt.java:119) ! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:283) ! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:78) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:122) ! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:64) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:59) ! at java.util.concurrent.FutureTask.run(FutureTask.java:266) ! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ! at java.lang.Thread.run(Thread.java:748)
During the grid installation, the check with ssh connection is not working, because, as per the logs, there is no known hosts key.
This is confirmed, the process did not configure it. No known_hosts file is present :
[root@oak ~]# cd .ssh/ [root@oak .ssh]# ll total 0 [root@oak .ssh]#
Solution
Let’s run a ssh locally on the ODA :
[root@oak .ssh]# ssh ODA01 FIPS mode initialized The authenticity of host 'ODA01 (10.20.30.122)' can't be established. ECDSA key fingerprint is SHA256:i2p0EBIgtEAYDJCF3pWhD2Fe6ITkcdiBnkT4u0BwC3s. ECDSA key fingerprint is SHA1:o2TSYrpKihq47VJ5NUH0BnLgkFM. Are you sure you want to continue connecting (yes/no)? y Please type 'yes' or 'no': yes Warning: Permanently added 'ODA01,10.20.30.122' (ECDSA) to the list of known hosts. root@ODA01's password: Last failed login: Wed Mar 9 17:01:42 CET 2022 on ttyS0 There was 1 failed login attempt since the last successful login. Last login: Wed Mar 9 12:34:15 2022 [root@ODA01 ~]#
We have now a known_hosts file and an appropriate key :
[root@oak .ssh]# more known_hosts ODA01,10.20.30.122 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCor/4ewamjVk65VB0YJnxElNKPKXOMAsoSmVdc7jS7YMGJXplmY92rRObgV0s4oYp ztPnWuqSKPdJwt5mavY+0= [root@oak .ssh]#
We can now cleanup the previous failed appliance installation :
[root@ODA01 ~]# perl /opt/oracle/oak/onecmd/cleanup.pl INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/cleanup_2022-03-09_12-37-02.log INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/dcsemu_diag_precleanup_2022-03-09_12-37-02.log INFO: ******************************************************************* INFO: ** Starting process to cleanup provisioned host ODA01 ** INFO: ******************************************************************* INFO: Default mode being used to cleanup a provisioned system. INFO: It will change all ASM disk status from MEMBER to FORMER Do you want to continue (yes/no) : yes ... ... ...
Curiously a cleanup of the appliance will now as well remove all the network configuration :
[root@oak ~]# ip addr sh 1: lo: mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: p7p1: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 3: p7p2: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 4: em1: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff 5: bond0: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff 6: btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 7: priv0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff inet 192.168.16.24/28 scope global priv0 valid_lft forever preferred_lft forever
This is certainly due to the fact that since a few last version, a pubnet bridge interface is created on the bonding interface during the appliance creation.
In any case all the network configuration is gone, so let’s create it again with configure-firstnet before creating the appliance again :
[root@oak ~]# /opt/oracle/dcs/bin/odacli configure-firstnet Select the Interface to configure the network on (btbond1) [btbond1]: Configure DHCP on btbond1 (yes/no) [no]: INFO: You have chosen Static configuration Use VLAN on btbond1 (yes/no) [no]:yes Configure VLAN on btbond1, input VLAN ID [2 - 4094] 205 INFO: using network interface btbond1.205 Enter the IP address to configure : 10.20.30.122 Enter the Netmask address to configure : 255.255.255.0 Enter the Gateway address to configure [10.20.30.1] : INFO: Restarting the network INFO: Restarting the DCS agent
We can check the network configuration :
[root@oak ~]# ip addr sh 1: lo: mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: p7p1: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 3: p7p2: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 4: em1: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff 5: bond0: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff 6: btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff 7: priv0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff inet 192.168.16.24/28 scope global priv0 valid_lft forever preferred_lft forever 8: btbond1.205@btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff inet 10.20.30.122/24 brd 10.20.30.255 scope global btbond1.205 valid_lft forever preferred_lft forever
Now we can create the appliance again :
[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json Enter an initial password for Web Console account (oda-admin): Retype the password for Web Console account (oda-admin): User 'oda-admin' created successfully... { "jobId" : "1d59cee5-b6c5-406f-aa4c-f6385d533b64", "status" : "Created", "message" : null, "reports" : [ ], "createTimestamp" : "March 09, 2022 11:44:25 AM UTC", "resourceList" : [ ], "description" : "Provisioning service creation", "updatedTime" : "March 09, 2022 11:44:25 AM UTC" }
The appliance creation is successful :
[root@ODA01 ~]# odacli describe-job -i 1d59cee5-b6c5-406f-aa4c-f6385d533b64 Job details ---------------------------------------------------------------- ID: 1d59cee5-b6c5-406f-aa4c-f6385d533b64 Description: Provisioning service creation Status: Success Created: March 9, 2022 11:44:25 AM CET Message: Task Name Start Time End Time Status ---------------------------------------- ----------------------------------- ----------------------------------- ---------- Setting up Network March 9, 2022 11:44:27 AM CET March 9, 2022 11:44:27 AM CET Success Setting up Vlan March 9, 2022 11:44:40 AM CET March 9, 2022 11:44:41 AM CET Success Setting up Network March 9, 2022 11:44:54 AM CET March 9, 2022 11:44:54 AM CET Success network update March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:26 AM CET Success updating network March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:26 AM CET Success attach to bridge March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:12 AM CET Success OS usergroup 'asmdba'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS usergroup 'asmoper'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS usergroup 'asmadmin'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS usergroup 'dba'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS usergroup 'dbaoper'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS usergroup 'oinstall'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS user 'grid'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success OS user 'oracle'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success Default backup policy creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success Backup config metadata persist March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success Grant permission to RHP files March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success Add SYSNAME in Env March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success Install oracle-ahf March 9, 2022 11:45:26 AM CET March 9, 2022 11:46:26 AM CET Success Grid home creation March 9, 2022 11:46:30 AM CET March 9, 2022 11:48:10 AM CET Success Creating GI home directories March 9, 2022 11:46:30 AM CET March 9, 2022 11:46:30 AM CET Success Extract GI clone March 9, 2022 11:46:30 AM CET March 9, 2022 11:48:09 AM CET Success Storage discovery March 9, 2022 11:48:10 AM CET March 9, 2022 11:52:39 AM CET Success Creating wallet for Root User March 9, 2022 11:52:39 AM CET March 9, 2022 11:52:43 AM CET Success Creating wallet for ASM Client March 9, 2022 11:52:43 AM CET March 9, 2022 11:52:46 AM CET Success Grid stack creation March 9, 2022 11:52:46 AM CET March 9, 2022 12:02:29 PM CET Success Provisioning GI with RHP March 9, 2022 11:52:46 AM CET March 9, 2022 11:59:14 AM CET Success Updating GIHome version March 9, 2022 11:59:15 AM CET March 9, 2022 11:59:18 AM CET Success Post cluster OAKD configuration March 9, 2022 12:02:29 PM CET March 9, 2022 12:05:12 PM CET Success Disk group 'RECO'creation March 9, 2022 12:05:20 PM CET March 9, 2022 12:05:31 PM CET Success Setting ACL for disk groups March 9, 2022 12:05:31 PM CET March 9, 2022 12:05:35 PM CET Success Modify DB file attributes March 9, 2022 12:05:35 PM CET March 9, 2022 12:05:43 PM CET Success Register Scan and Vips to Public Network March 9, 2022 12:05:43 PM CET March 9, 2022 12:05:45 PM CET Success Configure export clones resource March 9, 2022 12:06:43 PM CET March 9, 2022 12:06:44 PM CET Success Volume 'commonstore'creation March 9, 2022 12:06:44 PM CET March 9, 2022 12:06:58 PM CET Success ACFS File system 'DATA'creation March 9, 2022 12:06:58 PM CET March 9, 2022 12:07:10 PM CET Success Provisioning service creation March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success persist new agent state entry March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success persist new agent state entry March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success Restart Zookeeper and DCS Agent March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:13 PM CET Success
Xavi
22.09.2023Great !!!!! Same error in deploy ODA X9 -L
Marc Wagner
22.09.2023Thanks Xavi for the reply. Happy my blog could help you. Take care.