I recently patched an Oracle Database Appliance (ODA) from 19.16. to 19.18. Detailed steps on how to patch to 19.18. are available here. During the update-server step I got the following error:
# odacli describe-job -i "859c6b58-19e5-453e-96a9-f29bcc70894a"
Job details
----------------------------------------------------------------
ID: 859c6b58-19e5-453e-96a9-f29bcc70894a
Description: Server Patching
Status: Failure
Created: April 24, 2023 4:28:10 PM CEST
Message: DCS-10001:Internal error encountered: Error in sshd_config.
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Server patching April 24, 2023 4:28:25 PM CEST April 24, 2023 4:37:54 PM CEST Failure
Validating GI user metadata April 24, 2023 4:28:25 PM CEST April 24, 2023 4:28:25 PM CEST Success
Validate ILOM server reachable April 24, 2023 4:28:25 PM CEST April 24, 2023 4:28:25 PM CEST Success
Validate DCS Admin mTLS setup April 24, 2023 4:28:25 PM CEST April 24, 2023 4:28:25 PM CEST Success
Server patching April 24, 2023 4:28:26 PM CEST April 24, 2023 4:37:54 PM CEST Failure
Configure export clones resource April 24, 2023 4:28:26 PM CEST April 24, 2023 4:28:27 PM CEST Success
Creating repositories using yum April 24, 2023 4:28:27 PM CEST April 24, 2023 4:28:31 PM CEST Success
Updating YumPluginVersionLock rpm April 24, 2023 4:28:31 PM CEST April 24, 2023 4:28:31 PM CEST Success
Applying OS Patches April 24, 2023 4:28:31 PM CEST April 24, 2023 4:37:54 PM CEST Success
Server patching April 24, 2023 4:37:54 PM CEST April 24, 2023 4:37:54 PM CEST Failure
Modify sshd_config attribute April 24, 2023 4:37:54 PM CEST April 24, 2023 4:37:54 PM CEST Failure
So obviously something is wrong with sshd_config. If an error happens when running a job on the ODA with odacli then I check the dcs agent logfile /opt/oracle/dcs/log/dcs-agent.log. It contained the following:
2023-04-24 16:37:54,771 DEBUG [Modify sshd_config attribute : JobId=859c6b58-19e5-453e-96a9-f29bcc70894a] [] c.o.d.c.u.CommonsUtils: Output :
/etc/ssh/sshd_config line 147: Directive 'Protocol' is not allowed within a Match block
At the end of the /etc/ssh/sshd_config I saw those lines:
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
# BEGIN ANSIBLE MANAGED BLOCK
Match User user41
PasswordAuthentication no
Match User user46
PasswordAuthentication no
# END ANSIBLE MANAGED BLOCK
So obviously the sshd_config file was adjusted by the customer using ansible. This update was done already a year ago and the ODA has been patched in between successfully. But why did I get an error
Directive 'Protocol' is not allowed within a Match block
The line
Protocol 2
was already in the file sshd_config. Unfortunately Oracle rolls back all changes in case of failure and I could not see what happened with sshd_config. In the dcs-agent.log I saw that 2 files were temporarily created to modify the sshd_config-file:
/opt/oracle/dcs/bin/modifySshdConfScript.sh
/tmp/dcsfiles/temp.txt
Both files were deleted as part of the cleanup of the failure. So I created a simple loop to check every second if one of above files has been created:
# while true
> do
> cat /opt/oracle/dcs/bin/modifySshdConfScript.sh /tmp/dcsfiles/temp.txt
> sleep 1
> done
cat: /opt/oracle/dcs/bin/modifySshdConfScript.sh: No such file or directory
cat: /tmp/dcsfiles/temp.txt: No such file or directory
...
While this was running I started my update-server again:
# /opt/oracle/dcs/bin/odacli update-server -v 19.18.0.0.0
REMARK: After the failure the clusterware was no longer running. I.e. above update-server resulted in an immediate message
DCS-10059:Oracle Clusterware is not running on all nodes.
To start the clusterware again I just did a reboot.
After the reboot and starting my loop again (followed by the update-server in a separate session) I got this output:
# while true
> do
> cat /opt/oracle/dcs/bin/modifySshdConfScript.sh /tmp/dcsfiles/temp.txt
> sleep 1
> done
cat: /opt/oracle/dcs/bin/modifySshdConfScript.sh: No such file or directory
cat: /tmp/dcsfiles/temp.txt: No such file or directory
...
cat: /opt/oracle/dcs/bin/modifySshdConfScript.sh: No such file or directory
cat: /tmp/dcsfiles/temp.txt: No such file or directory
#!/bin/sh
SSHD_CONF_FILE=/etc/ssh/sshd_config
if [ -f "$SSHD_CONF_FILE" ]; then
...
sed -i 's/^[^#]*Protocol/#&/' $SSHD_CONF_FILE
...
echo 'AllowTcpForwarding no' >> $SSHD_CONF_FILE
echo 'Protocol 2' >> $SSHD_CONF_FILE
...
else
echo "Unable to locate sshd config file SSHD_CONF_FILE"
exit 1
fi
...
cat: /opt/oracle/dcs/bin/modifySshdConfScript.sh: No such file or directory
cat: /tmp/dcsfiles/temp.txt: No such file or directory
...
So the issue became clear now. The existing line
Protocol 2
was put in comments and new lines (including the line “Protocol 2”) were added at the end of sshd_config. The line
Protocol 2
is not allowed in a Match Block, which has been included “open ended” with the ansible update.
My workaround was to temporarily comment the Match-commands in sshd_config and then my update-server did run through:
# BEGIN ANSIBLE MANAGED BLOCK
#Match User user41
# PasswordAuthentication no
#Match User user46
# PasswordAuthentication no
# END ANSIBLE MANAGED BLOCK
Afterwards I moved above lines to the end of the sshd_config again and un-commented the lines as they were before. Then restart sshd or reboot the server.
I’ve found several blogs and discussions on how to end a Match-block in sshd_config to avoid the errors I’ve seen. You may check here or here. Basically the customer did modify the sshd_config correctly, because Match-Blocks should be added at the end of the file. This can also been seen by the example, which was there in the default sshd_config of the ODA at the end of the file:
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
Workarounds to end a Match-Block with the line “Match” should not be used anymore. Better would be to end the block with “Match All”, but even that may not be correct. So is it an Oracle bug of the ODA patching software here? Yes and No. Yes, because modifying files should be done in a way, which won’t break the syntax. No, because changes (customizations) in sshd_config should be undone before patching an ODA. So it goes back to the statement that you should not modify/customize an Oracle Database Appliance. But if you do then undo those modifications before patching and redo them after the patch has been successfully installed.