I have recently deployed a new ODA with last version 19.14 at customer side, and we could face an interesting failure deploying the appliance, which I would like to share with you.

Problem description

After reimaging the ODA and configuring the network interface with configure-firstnet, we created the appliance :

[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json
Enter an initial password for Web Console account (oda-admin):
Retype the password for Web Console account (oda-admin):
User 'oda-admin' created successfully...
{
  "jobId" : "dacff86d-122c-4013-8f54-59ee29c919ca",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "March 09, 2022 19:25:01 PM UTC",
  "resourceList" : [ ],
  "description" : "Provisioning service creation",
  "updatedTime" : "March 09, 2022 19:25:01 PM UTC"
}


And we faced following failure :

[root@oak ~]# odacli describe-job -i dacff86d-122c-4013-8f54-59ee29c919ca

Job details
----------------------------------------------------------------
                     ID:  dacff86d-122c-4013-8f54-59ee29c919ca
            Description:  Provisioning service creation
                 Status:  Failure
                Created:  March 9, 2022 8:25:01 PM CET
                Message:  DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4991782420745951208.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400
PRKC-1191 : Remote command execution setup check for node ODA01 using shell /usr/bin/ssh failed.
FIPS mode initializedNo ECDSA host key is known for ODA01 and you have requested strict checking.Host key verification failed...

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Storage discovery                        March 9, 2022 12:28:54 PM CET       March 9, 2022 12:33:21 PM CET       Success
Creating wallet for Root User            March 9, 2022 12:33:21 PM CET       March 9, 2022 12:33:25 PM CET       Success
Creating wallet for ASM Client           March 9, 2022 12:33:25 PM CET       March 9, 2022 12:33:28 PM CET       Success
Grid stack creation                      March 9, 2022 12:33:29 PM CET       March 9, 2022 12:34:29 PM CET       Failure
Provisioning GI with RHP                 March 9, 2022 12:33:29 PM CET       March 9, 2022 12:34:29 PM CET       Failure
Provisioning service creation            March 9, 2022 8:25:02 PM CET        March 9, 2022 12:34:29 PM CET       Failure
Provisioning service creation            March 9, 2022 8:25:02 PM CET        March 9, 2022 12:34:29 PM CET       Failure
Setting up Network                       March 9, 2022 8:25:03 PM CET        March 9, 2022 8:25:03 PM CET        Success
Setting up Vlan                          March 9, 2022 8:25:16 PM CET        March 9, 2022 8:25:16 PM CET        Success
Setting up Network                       March 9, 2022 8:25:30 PM CET        March 9, 2022 8:25:30 PM CET        Success
network update                           March 9, 2022 8:25:47 PM CET        March 9, 2022 8:26:01 PM CET        Success
updating network                         March 9, 2022 8:25:47 PM CET        March 9, 2022 8:26:01 PM CET        Success
attach to bridge                         March 9, 2022 8:25:47 PM CET        March 9, 2022 8:25:47 PM CET        Success
OS usergroup 'asmdba'creation            March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS usergroup 'asmoper'creation           March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS usergroup 'asmadmin'creation          March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS usergroup 'dba'creation               March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS usergroup 'dbaoper'creation           March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS usergroup 'oinstall'creation          March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS user 'grid'creation                   March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
OS user 'oracle'creation                 March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
Default backup policy creation           March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:02 PM CET        Success
Backup config metadata persist           March 9, 2022 8:26:01 PM CET        March 9, 2022 8:26:01 PM CET        Success
Grant permission to RHP files            March 9, 2022 8:26:02 PM CET        March 9, 2022 8:26:02 PM CET        Success
Add SYSNAME in Env                       March 9, 2022 8:26:02 PM CET        March 9, 2022 8:26:02 PM CET        Success
Install oracle-ahf                       March 9, 2022 8:26:02 PM CET        March 9, 2022 8:27:00 PM CET        Success
Grid home creation                       March 9, 2022 8:27:05 PM CET        March 9, 2022 12:28:54 PM CET       Success
Creating GI home directories             March 9, 2022 8:27:05 PM CET        March 9, 2022 8:27:05 PM CET        Success
Extract GI clone                         March 9, 2022 8:27:05 PM CET        March 9, 2022 12:28:54 PM CET       Success


Analyses

Analyzing dcs-agent.log, we could find the following error :

! com.oracle.dcs.commons.exception.DcsTaskFailureException: DCS-10001:Internal error encountered: Failed to provision GI with RHP at the home: /u01/app/19.14.0.0/grid: DCS-10001:Internal error encountered: PRGH-1002 : Failed to copy files from /opt/oracle/rhp/RHPCheckpoints/rhptemp/grid4108866401395331366.rsp to /opt/oracle/rhp/RHPCheckpoints/wOraGrid191400
! PRKC-1191 : Remote command execution setup check for node delphes1 using shell /usr/bin/ssh failed.
! FIPS mode initializedNo ECDSA host key is known for delphes1 and you have requested strict checking.Host key verification failed...
! at com.oracle.dcs.agent.task.TaskZJsonRpc.processJsonResponse(TaskZJsonRpc.java:124)
! at com.oracle.dcs.agent.task.TaskZJsonRpcExt.callInternal(TaskZJsonRpcExt.java:119)
! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:283)
! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:78)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65)
! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:122)
! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:64)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:63)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:65)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:36)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:115)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:59)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
! at java.lang.Thread.run(Thread.java:748)


During the grid installation, the check with ssh connection is not working, because, as per the logs, there is no known hosts key.

This is confirmed, the process did not configure it. No known_hosts file is present :

[root@oak ~]# cd .ssh/

[root@oak .ssh]# ll
total 0
[root@oak .ssh]#


Solution

Let’s run a ssh locally on the ODA :

[root@oak .ssh]# ssh ODA01
FIPS mode initialized
The authenticity of host 'ODA01 (10.20.30.122)' can't be established.
ECDSA key fingerprint is SHA256:i2p0EBIgtEAYDJCF3pWhD2Fe6ITkcdiBnkT4u0BwC3s.
ECDSA key fingerprint is SHA1:o2TSYrpKihq47VJ5NUH0BnLgkFM.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'ODA01,10.20.30.122' (ECDSA) to the list of known hosts.
root@ODA01's password:
Last failed login: Wed Mar  9 17:01:42 CET 2022 on ttyS0
There was 1 failed login attempt since the last successful login.
Last login: Wed Mar  9 12:34:15 2022
[root@ODA01 ~]#


We have now a known_hosts file and an appropriate key :

[root@oak .ssh]# more known_hosts
ODA01,10.20.30.122 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCor/4ewamjVk65VB0YJnxElNKPKXOMAsoSmVdc7jS7YMGJXplmY92rRObgV0s4oYp
ztPnWuqSKPdJwt5mavY+0=
[root@oak .ssh]#


We can now cleanup the previous failed appliance installation :

[root@ODA01 ~]# perl /opt/oracle/oak/onecmd/cleanup.pl
INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/cleanup_2022-03-09_12-37-02.log

INFO: Log file is /opt/oracle/oak/log/ODA01/cleanup/dcsemu_diag_precleanup_2022-03-09_12-37-02.log

INFO: *******************************************************************
INFO: ** Starting process to cleanup provisioned host ODA01         **
INFO: *******************************************************************
INFO: Default mode being used to cleanup a provisioned system.
INFO: It will change all ASM disk status from MEMBER to FORMER
Do you want to continue (yes/no) : yes
...
...
...


Curiously a cleanup of the appliance will now as well remove all the network configuration :

[root@oak ~]# ip addr sh
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: p7p1:  mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
3: p7p2:  mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
4: em1:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff
5: bond0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff
6: btbond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
7: priv0:  mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.24/28 scope global priv0
       valid_lft forever preferred_lft forever


This is certainly due to the fact that since a few last version, a pubnet bridge interface is created on the bonding interface during the appliance creation.

In any case all the network configuration is gone, so let’s create it again with configure-firstnet before creating the appliance again :

[root@oak ~]# /opt/oracle/dcs/bin/odacli configure-firstnet
Select the Interface to configure the network on (btbond1) [btbond1]:
Configure DHCP on btbond1 (yes/no) [no]:
INFO: You have chosen Static configuration
Use VLAN on btbond1 (yes/no) [no]:yes
Configure VLAN on btbond1, input VLAN ID [2 - 4094] 205
INFO: using network interface btbond1.205
Enter the IP address to configure : 10.20.30.122
Enter the Netmask address to configure : 255.255.255.0
Enter the Gateway address to configure [10.20.30.1] :
INFO: Restarting the network
INFO: Restarting the DCS agent


We can check the network configuration :

[root@oak ~]# ip addr sh
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: p7p1:  mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
3: p7p2:  mtu 1500 qdisc mq master btbond1 state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
4: em1:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a8:69:8c:0c:91:14 brd ff:ff:ff:ff:ff:ff
5: bond0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:6a:3e:ed:ad:8f brd ff:ff:ff:ff:ff:ff
6: btbond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
7: priv0:  mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether aa:d8:c4:5e:f3:aa brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.24/28 scope global priv0
       valid_lft forever preferred_lft forever
8: btbond1.205@btbond1:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:3d:1a:d4:b5:d0 brd ff:ff:ff:ff:ff:ff
    inet 10.20.30.122/24 brd 10.20.30.255 scope global btbond1.205
       valid_lft forever preferred_lft forever


Now we can create the appliance again :

[root@oak ~]# odacli create-appliance -r /u01/patch/ODA01.json
Enter an initial password for Web Console account (oda-admin):
Retype the password for Web Console account (oda-admin):
User 'oda-admin' created successfully...
{
  "jobId" : "1d59cee5-b6c5-406f-aa4c-f6385d533b64",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "March 09, 2022 11:44:25 AM UTC",
  "resourceList" : [ ],
  "description" : "Provisioning service creation",
  "updatedTime" : "March 09, 2022 11:44:25 AM UTC"
}


The appliance creation is successful :

[root@ODA01 ~]# odacli describe-job -i 1d59cee5-b6c5-406f-aa4c-f6385d533b64

Job details
----------------------------------------------------------------
                     ID:  1d59cee5-b6c5-406f-aa4c-f6385d533b64
            Description:  Provisioning service creation
                 Status:  Success
                Created:  March 9, 2022 11:44:25 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up Network                       March 9, 2022 11:44:27 AM CET       March 9, 2022 11:44:27 AM CET       Success
Setting up Vlan                          March 9, 2022 11:44:40 AM CET       March 9, 2022 11:44:41 AM CET       Success
Setting up Network                       March 9, 2022 11:44:54 AM CET       March 9, 2022 11:44:54 AM CET       Success
network update                           March 9, 2022 11:45:12 AM CET       March 9, 2022 11:45:26 AM CET       Success
updating network                         March 9, 2022 11:45:12 AM CET       March 9, 2022 11:45:26 AM CET       Success
attach to bridge                         March 9, 2022 11:45:12 AM CET       March 9, 2022 11:45:12 AM CET       Success
OS usergroup 'asmdba'creation            March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS usergroup 'asmoper'creation           March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS usergroup 'asmadmin'creation          March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS usergroup 'dba'creation               March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS usergroup 'dbaoper'creation           March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS usergroup 'oinstall'creation          March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS user 'grid'creation                   March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
OS user 'oracle'creation                 March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
Default backup policy creation           March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
Backup config metadata persist           March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
Grant permission to RHP files            March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
Add SYSNAME in Env                       March 9, 2022 11:45:26 AM CET       March 9, 2022 11:45:26 AM CET       Success
Install oracle-ahf                       March 9, 2022 11:45:26 AM CET       March 9, 2022 11:46:26 AM CET       Success
Grid home creation                       March 9, 2022 11:46:30 AM CET       March 9, 2022 11:48:10 AM CET       Success
Creating GI home directories             March 9, 2022 11:46:30 AM CET       March 9, 2022 11:46:30 AM CET       Success
Extract GI clone                         March 9, 2022 11:46:30 AM CET       March 9, 2022 11:48:09 AM CET       Success
Storage discovery                        March 9, 2022 11:48:10 AM CET       March 9, 2022 11:52:39 AM CET       Success
Creating wallet for Root User            March 9, 2022 11:52:39 AM CET       March 9, 2022 11:52:43 AM CET       Success
Creating wallet for ASM Client           March 9, 2022 11:52:43 AM CET       March 9, 2022 11:52:46 AM CET       Success
Grid stack creation                      March 9, 2022 11:52:46 AM CET       March 9, 2022 12:02:29 PM CET       Success
Provisioning GI with RHP                 March 9, 2022 11:52:46 AM CET       March 9, 2022 11:59:14 AM CET       Success
Updating GIHome version                  March 9, 2022 11:59:15 AM CET       March 9, 2022 11:59:18 AM CET       Success
Post cluster OAKD configuration          March 9, 2022 12:02:29 PM CET       March 9, 2022 12:05:12 PM CET       Success
Disk group 'RECO'creation                March 9, 2022 12:05:20 PM CET       March 9, 2022 12:05:31 PM CET       Success
Setting ACL for disk groups              March 9, 2022 12:05:31 PM CET       March 9, 2022 12:05:35 PM CET       Success
Modify DB file attributes                March 9, 2022 12:05:35 PM CET       March 9, 2022 12:05:43 PM CET       Success
Register Scan and Vips to Public Network March 9, 2022 12:05:43 PM CET       March 9, 2022 12:05:45 PM CET       Success
Configure export clones resource         March 9, 2022 12:06:43 PM CET       March 9, 2022 12:06:44 PM CET       Success
Volume 'commonstore'creation             March 9, 2022 12:06:44 PM CET       March 9, 2022 12:06:58 PM CET       Success
ACFS File system 'DATA'creation          March 9, 2022 12:06:58 PM CET       March 9, 2022 12:07:10 PM CET       Success
Provisioning service creation            March 9, 2022 12:07:12 PM CET       March 9, 2022 12:07:12 PM CET       Success
persist new agent state entry            March 9, 2022 12:07:12 PM CET       March 9, 2022 12:07:12 PM CET       Success
persist new agent state entry            March 9, 2022 12:07:12 PM CET       March 9, 2022 12:07:12 PM CET       Success
Restart Zookeeper and DCS Agent          March 9, 2022 12:07:12 PM CET       March 9, 2022 12:07:13 PM CET       Success