I have recently deployed a new ODA with last version 19.14 at customer side, and we could face an interesting failure deploying the appliance, which I would like to share with you.
Problem description
After reimaging the ODA and configuring the network interface with configure-firstnet, we created the appliance :
[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json
Enter an initial password for Web Console account (oda-admin):
Retype the password for Web Console account (oda-admin):
User 'oda-admin' created successfully...
{
"jobId" : "dacff86d-122c-4013-8f54-59ee29c919ca",
"status" : "Created",
"message" : null,
"reports" : [ ],
"createTimestamp" : "March 09, 2022 19:25:01 PM UTC",
"resourceList" : [ ],
"description" : "Provisioning service creation",
"updatedTime" : "March 09, 2022 19:25:01 PM UTC"
}
And we faced following failure :
[root@oak ~]# odacli describe-job -i dacff86d-122c-4013-8f54-59ee29c919ca
Job details
----------------------------------------------------------------
ID: dacff86d-122c-4013-8f54-59ee29c919ca
Description: Provisioning service creation
Status: Failure
Created: March 9, 2022 8:25:01 PM CET
Message: DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4991782420745951208.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400
PRKC-1191 : Remote command execution setup check for node ODA01 using shell /usr/bin/ssh failed.
FIPS mode initializedNo ECDSA host key is known for ODA01 and you have requested strict checking.Host key verification failed...
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Storage discovery March 9, 2022 12:28:54 PM CET March 9, 2022 12:33:21 PM CET Success
Creating wallet for Root User March 9, 2022 12:33:21 PM CET March 9, 2022 12:33:25 PM CET Success
Creating wallet for ASM Client March 9, 2022 12:33:25 PM CET March 9, 2022 12:33:28 PM CET Success
Grid stack creation March 9, 2022 12:33:29 PM CET March 9, 2022 12:34:29 PM CET Failure
Provisioning GI with RHP March 9, 2022 12:33:29 PM CET March 9, 2022 12:34:29 PM CET Failure
Provisioning service creation March 9, 2022 8:25:02 PM CET March 9, 2022 12:34:29 PM CET Failure
Provisioning service creation March 9, 2022 8:25:02 PM CET March 9, 2022 12:34:29 PM CET Failure
Setting up Network March 9, 2022 8:25:03 PM CET March 9, 2022 8:25:03 PM CET Success
Setting up Vlan March 9, 2022 8:25:16 PM CET March 9, 2022 8:25:16 PM CET Success
Setting up Network March 9, 2022 8:25:30 PM CET March 9, 2022 8:25:30 PM CET Success
network update March 9, 2022 8:25:47 PM CET March 9, 2022 8:26:01 PM CET Success
updating network March 9, 2022 8:25:47 PM CET March 9, 2022 8:26:01 PM CET Success
attach to bridge March 9, 2022 8:25:47 PM CET March 9, 2022 8:25:47 PM CET Success
OS usergroup 'asmdba'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS usergroup 'asmoper'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS usergroup 'asmadmin'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS usergroup 'dba'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS usergroup 'dbaoper'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS usergroup 'oinstall'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS user 'grid'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
OS user 'oracle'creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
Default backup policy creation March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:02 PM CET Success
Backup config metadata persist March 9, 2022 8:26:01 PM CET March 9, 2022 8:26:01 PM CET Success
Grant permission to RHP files March 9, 2022 8:26:02 PM CET March 9, 2022 8:26:02 PM CET Success
Add SYSNAME in Env March 9, 2022 8:26:02 PM CET March 9, 2022 8:26:02 PM CET Success
Install oracle-ahf March 9, 2022 8:26:02 PM CET March 9, 2022 8:27:00 PM CET Success
Grid home creation March 9, 2022 8:27:05 PM CET March 9, 2022 12:28:54 PM CET Success
Creating GI home directories March 9, 2022 8:27:05 PM CET March 9, 2022 8:27:05 PM CET Success
Extract GI clone March 9, 2022 8:27:05 PM CET March 9, 2022 12:28:54 PM CET Success
Analyses
Analyzing dcs-agent.log, we could find the following error :
! com.oracle.dcs.commons.exception.DcsTaskFailureException: DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4108866401395331366.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400 ! PRKC-1191 : Remote command execution setup check for node delphes1 using shell /usr/bin/ssh failed. ! FIPS mode initializedNo ECDSA host key is known for delphes1 and you have requested strict checking.Host key verification failed... ! at com.oracle.dcs.agent.task.TaskZJsonRpc.processJsonResponse(TaskZJsonRpc.java:124) ! at com.oracle.dcs.agent.task.TaskZJsonRpcExt.callInternal(TaskZJsonRpcExt.java:119) ! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:283) ! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:78) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:122) ! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:64) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65) ! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115) ! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:59) ! at java.util.concurrent.FutureTask.run(FutureTask.java:266) ! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ! at java.lang.Thread.run(Thread.java:748)
During the grid installation, the check with ssh connection is not working, because, as per the logs, there is no known hosts key.
This is confirmed, the process did not configure it. No known_hosts file is present :
[root@oak ~]# cd .ssh/ [root@oak .ssh]# ll total 0 [root@oak .ssh]#
Solution
Let’s run a ssh locally on the ODA :
[root@oak .ssh]# ssh ODA01 FIPS mode initialized The authenticity of host 'ODA01 (10.20.30.122)' can't be established. ECDSA key fingerprint is SHA256:i2p0EBIgtEAYDJCF3pWhD2Fe6ITkcdiBnkT4u0BwC3s. ECDSA key fingerprint is SHA1:o2TSYrpKihq47VJ5NUH0BnLgkFM. Are you sure you want to continue connecting (yes/no)? y Please type 'yes' or 'no': yes Warning: Permanently added 'ODA01,10.20.30.122' (ECDSA) to the list of known hosts. root@ODA01's password: Last failed login: Wed Mar 9 17:01:42 CET 2022 on ttyS0 There was 1 failed login attempt since the last successful login. Last login: Wed Mar 9 12:34:15 2022 [root@ODA01 ~]#
We have now a known_hosts file and an appropriate key :
[root@oak .ssh]# more known_hosts ODA01,10.20.30.122 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCor/4ewamjVk65VB0YJnxElNKPKXOMAsoSmVdc7jS7YMGJXplmY92rRObgV0s4oYp ztPnWuqSKPdJwt5mavY+0= [root@oak .ssh]#
We can now cleanup the previous failed appliance installation :
[root@ODA01 ~]# perl /opt/oracle/oak/onecmd/cleanup.pl INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/cleanup_2022-03-09_12-37-02.log INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/dcsemu_diag_precleanup_2022-03-09_12-37-02.log INFO: ******************************************************************* INFO: ** Starting process to cleanup provisioned host ODA01 ** INFO: ******************************************************************* INFO: Default mode being used to cleanup a provisioned system. INFO: It will change all ASM disk status from MEMBER to FORMER Do you want to continue (yes/no) : yes ... ... ...
Curiously a cleanup of the appliance will now as well remove all the network configuration :
[root@oak ~]# ip addr sh
1: lo: mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: p7p1: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
3: p7p2: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
4: em1: mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff
5: bond0: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff
6: btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
7: priv0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff
inet 192.168.16.24/28 scope global priv0
valid_lft forever preferred_lft forever
This is certainly due to the fact that since a few last version, a pubnet bridge interface is created on the bonding interface during the appliance creation.
In any case all the network configuration is gone, so let’s create it again with configure-firstnet before creating the appliance again :
[root@oak ~]# /opt/oracle/dcs/bin/odacli configure-firstnet Select the Interface to configure the network on (btbond1) [btbond1]: Configure DHCP on btbond1 (yes/no) [no]: INFO: You have chosen Static configuration Use VLAN on btbond1 (yes/no) [no]:yes Configure VLAN on btbond1, input VLAN ID [2 - 4094] 205 INFO: using network interface btbond1.205 Enter the IP address to configure : 10.20.30.122 Enter the Netmask address to configure : 255.255.255.0 Enter the Gateway address to configure [10.20.30.1] : INFO: Restarting the network INFO: Restarting the DCS agent
We can check the network configuration :
[root@oak ~]# ip addr sh
1: lo: mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: p7p1: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
3: p7p2: mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
4: em1: mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff
5: bond0: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff
6: btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
7: priv0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff
inet 192.168.16.24/28 scope global priv0
valid_lft forever preferred_lft forever
8: btbond1.205@btbond1: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
inet 10.20.30.122/24 brd 10.20.30.255 scope global btbond1.205
valid_lft forever preferred_lft forever
Now we can create the appliance again :
[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json
Enter an initial password for Web Console account (oda-admin):
Retype the password for Web Console account (oda-admin):
User 'oda-admin' created successfully...
{
"jobId" : "1d59cee5-b6c5-406f-aa4c-f6385d533b64",
"status" : "Created",
"message" : null,
"reports" : [ ],
"createTimestamp" : "March 09, 2022 11:44:25 AM UTC",
"resourceList" : [ ],
"description" : "Provisioning service creation",
"updatedTime" : "March 09, 2022 11:44:25 AM UTC"
}
The appliance creation is successful :
[root@ODA01 ~]# odacli describe-job -i 1d59cee5-b6c5-406f-aa4c-f6385d533b64
Job details
----------------------------------------------------------------
ID: 1d59cee5-b6c5-406f-aa4c-f6385d533b64
Description: Provisioning service creation
Status: Success
Created: March 9, 2022 11:44:25 AM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up Network March 9, 2022 11:44:27 AM CET March 9, 2022 11:44:27 AM CET Success
Setting up Vlan March 9, 2022 11:44:40 AM CET March 9, 2022 11:44:41 AM CET Success
Setting up Network March 9, 2022 11:44:54 AM CET March 9, 2022 11:44:54 AM CET Success
network update March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:26 AM CET Success
updating network March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:26 AM CET Success
attach to bridge March 9, 2022 11:45:12 AM CET March 9, 2022 11:45:12 AM CET Success
OS usergroup 'asmdba'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS usergroup 'asmoper'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS usergroup 'asmadmin'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS usergroup 'dba'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS usergroup 'dbaoper'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS usergroup 'oinstall'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS user 'grid'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
OS user 'oracle'creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
Default backup policy creation March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
Backup config metadata persist March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
Grant permission to RHP files March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
Add SYSNAME in Env March 9, 2022 11:45:26 AM CET March 9, 2022 11:45:26 AM CET Success
Install oracle-ahf March 9, 2022 11:45:26 AM CET March 9, 2022 11:46:26 AM CET Success
Grid home creation March 9, 2022 11:46:30 AM CET March 9, 2022 11:48:10 AM CET Success
Creating GI home directories March 9, 2022 11:46:30 AM CET March 9, 2022 11:46:30 AM CET Success
Extract GI clone March 9, 2022 11:46:30 AM CET March 9, 2022 11:48:09 AM CET Success
Storage discovery March 9, 2022 11:48:10 AM CET March 9, 2022 11:52:39 AM CET Success
Creating wallet for Root User March 9, 2022 11:52:39 AM CET March 9, 2022 11:52:43 AM CET Success
Creating wallet for ASM Client March 9, 2022 11:52:43 AM CET March 9, 2022 11:52:46 AM CET Success
Grid stack creation March 9, 2022 11:52:46 AM CET March 9, 2022 12:02:29 PM CET Success
Provisioning GI with RHP March 9, 2022 11:52:46 AM CET March 9, 2022 11:59:14 AM CET Success
Updating GIHome version March 9, 2022 11:59:15 AM CET March 9, 2022 11:59:18 AM CET Success
Post cluster OAKD configuration March 9, 2022 12:02:29 PM CET March 9, 2022 12:05:12 PM CET Success
Disk group 'RECO'creation March 9, 2022 12:05:20 PM CET March 9, 2022 12:05:31 PM CET Success
Setting ACL for disk groups March 9, 2022 12:05:31 PM CET March 9, 2022 12:05:35 PM CET Success
Modify DB file attributes March 9, 2022 12:05:35 PM CET March 9, 2022 12:05:43 PM CET Success
Register Scan and Vips to Public Network March 9, 2022 12:05:43 PM CET March 9, 2022 12:05:45 PM CET Success
Configure export clones resource March 9, 2022 12:06:43 PM CET March 9, 2022 12:06:44 PM CET Success
Volume 'commonstore'creation March 9, 2022 12:06:44 PM CET March 9, 2022 12:06:58 PM CET Success
ACFS File system 'DATA'creation March 9, 2022 12:06:58 PM CET March 9, 2022 12:07:10 PM CET Success
Provisioning service creation March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success
persist new agent state entry March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success
persist new agent state entry March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:12 PM CET Success
Restart Zookeeper and DCS Agent March 9, 2022 12:07:12 PM CET March 9, 2022 12:07:13 PM CET Success
Xavi
22.09.2023Great !!!!! Same error in deploy ODA X9 -L
Marc Wagner
22.09.2023Thanks Xavi for the reply. Happy my blog could help you. Take care.