A few weeks ago, an issue was identified at a customer which would cause the value of “dm_server_config.r_server_version” to be reset to the GA version number (16.4 P00) instead of the currently deployed patch (which is 16.4 P26) and that would happen randomly on some of the Content Servers but not all. The different Content Servers are all deployed on a Kubernetes Cluster, are using a single image (which contains the patched binaries) and are using replicas to provide High Availability. That really means that it’s the same image everywhere but despite that, some of them would, at some point, end-up with the P00 shown.
Every time the pods would start/restart, the “dm_server_config.r_server_version” would properly display the P26 version, without exception. I spent several hours doing testing on that issue, but I was never able to replicate the issue by simply restarting the Content Server. At startup, a Content Server will update the value of “dm_server_config.r_server_version” with the currently used binaries (returned by the “documentum -v” command). I tried enabling the RPC and SQL traces after some discussion with the OpenText Support, but it didn’t show anything useful.
Since it appeared randomly and since I couldn’t replicate the issue by restarting the Content Server, I just simply let an environment up&running without any user activity on it and I was checking every day the value of the “dm_server_config.r_server_version”. During the week, nothing happened, the P26 was constantly shown and the dm_server_config object wasn’t updated in any way. However, after the weekend, it was suddenly showing P00 so I started looking into the details:
[morgan@k8s_master ~]$ kubectl get pod cs-0 NAME READY STATUS RESTARTS AGE cs-0 1/1 Running 0 6d22h [morgan@k8s_master ~]$ [morgan@k8s_master ~]$ kubectl describe pod cs-0 | grep Started Started: Mon, 31 Jan 2022 10:53:48 +0100 [morgan@k8s_master ~]$ [morgan@k8s_master ~]$ kubectl exec -it cs-0 -- bash -l [dmadmin@cs-0 ~]$ [dmadmin@cs-0 ~]$ date; iapi ${DOCBASE_NAME} -Udmadmin -Pxxx << EOC > retrieve,c,dm_server_config > dump,c,l > EOC Mon Feb 7 07:57:00 UTC 2022 ... SYSTEM ATTRIBUTES r_object_type : dm_server_config r_creation_date : 7/15/2021 11:38:28 r_modify_date : 2/6/2022 03:19:25 ... r_server_version : 16.4.0000.0248 Linux64.Oracle r_host_name : cs-0 ... API> Bye [dmadmin@cs-0 ~]$ [dmadmin@cs-0 ~]$ grep --color "16.4.0" $DOCUMENTUM/dba/log/${DOCBASE_NAME}.log OpenText Documentum Content Server (version 16.4.0260.0296 Linux64.Oracle) 2022-01-31T09:59:36.943158 6994[6994] 0000000000000000 [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin: /app/dctm/server/product/16.4/bin/libDsearchQueryPlugin.so, API Interface version: 1.0, Build number: HEAD; Feb 14 2018 04:27:20, FT Engine version: xPlore version 16.4.0160.0089" Mon Jan 31 10:00:50 2022 [INFORMATION] [AGENTEXEC 8215] Detected during program initialization: Version: 16.4.0260.0296 Linux64 2022-02-06T03:04:00.114945 23067[23067] 0000000000000000 [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin: /app/dctm/server/product/16.4/bin/libDsearchQueryPlugin.so, API Interface version: 1.0, Build number: HEAD; Feb 14 2018 04:27:20, FT Engine version: xPlore version 16.4.0160.0089" 2022-02-06T03:19:25.601602 31553[31553] 0000000000000000 [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin: /app/dctm/server/product/16.4/bin/libDsearchQueryPlugin.so, API Interface version: 1.0, Build number: HEAD; Feb 14 2018 04:27:20, FT Engine version: xPlore version 16.4.0160.0089" [morgan@k8s_master ~]$
As you can see on the above output, the modification date of the dm_server_config was Sunday 6-Feb-2022 at 03:19:25 UTC while the repository started on the Monday 31-Jan-2022 at 09:59 UTC. Until the Friday 4-Feb-2022, the returned version was P26 (16.4.0260.0296) but then after the weekend, it was P00 (16.4.0000.0248). The grep command above was used initially to verify that the repository started with the P26, that’s the very first line of the Repository log file: “OpenText Documentum Content Server (version 16.4.0260.0296 Linux64.Oracle)”. However, on the Sunday morning, suddenly there are two new messages shown related to the FT Query Plugin initialization. This usually means that the Content Server was re-initialized and that’s indeed what happened:
[dmadmin@cs-0 ~]$ grep "DM_SESSION_I_INIT_BEGIN.*Initialize Server Configuration" $DOCUMENTUM/dba/log/${DOCBASE_NAME}.log 2022-01-31T09:59:34.302520 6994[6994] 0000000000000000 [DM_SESSION_I_INIT_BEGIN]info: "Initialize Server Configuration." 2022-02-06T03:03:56.846840 23067[23067] 0000000000000000 [DM_SESSION_I_INIT_BEGIN]info: "Initialize Server Configuration." 2022-02-06T03:19:23.644453 31553[31553] 0000000000000000 [DM_SESSION_I_INIT_BEGIN]info: "Initialize Server Configuration." [dmadmin@cs-0 ~]$
Therefore, something must have triggered the reinit and that would most probably be a job. One that would have run around 03:04 UTC and one around 03:20 UTC. Looking at the dm_sysobject created around that time gave me two good candidates of jobs that would have performed the reinit of the Content Server and therefore that might be the cause of the switch from P26 to P00:
API> ?,c,select r_object_id, r_creation_date, r_modify_date, object_name from dm_sysobject where r_creation_date>=date('06.02.2022 03:00:00','dd.mm.yyyy hh:mi:ss') and r_creation_date<=date('06.02.2022 03:25:00','dd.mm.yyyy hh:mi:ss'); r_object_id r_creation_date r_modify_date object_name ---------------- ------------------------- ------------------------- ---------------------------------- 090f45158029f8a6 2/6/2022 03:03:13 2/6/2022 03:03:14 2/6/2022 03:03:11 dm_Initialize_WQ 090f45158029f8a9 2/6/2022 03:04:14 2/6/2022 03:04:15 2/6/2022 03:03:44 dm_DMClean 090f45158029f8aa 2/6/2022 03:04:11 2/6/2022 03:04:11 Result.dmclean 090f45158029f8ad 2/6/2022 03:04:15 2/6/2022 03:04:16 2/6/2022 03:04:14 dm_WfmsTimer 090f45158029f8b4 2/6/2022 03:11:46 2/6/2022 03:11:47 2/6/2022 03:11:42 dm_Initialize_WQ 090f45158029f8b7 2/6/2022 03:12:52 2/6/2022 03:12:53 2/6/2022 03:12:12 DocumentTraining 090f45158029fc1e 2/6/2022 03:16:15 2/6/2022 03:16:15 2/6/2022 03:16:13 dm_Initialize_WQ 090f45158029fc21 2/6/2022 03:22:17 2/6/2022 03:22:17 2/6/2022 03:19:13 dm_DMFilescan 090f45158029fc22 2/6/2022 03:22:14 2/6/2022 03:22:14 Result.dmfilescan 090f45158029fc28 2/6/2022 03:23:48 2/6/2022 03:23:48 2/6/2022 03:23:43 dm_Initialize_WQ (10 rows affected)
As shown above, the first reinit was most probably triggered by the dm_DMClean job while the second one was most probably coming from the dm_DMFilescan job: if you look at the start time of these jobs (check the object_name) and the completion time (check the Result line), then the reinit is right in the middle of it. Just in case, looking at the “a_last_completion” for these two jobs also confirmed the same:
API> ?,c,select r_object_id, a_last_completion, a_next_invocation, object_name from dm_job where object_name in ('dm_DMClean','dm_DMFilescan'); r_object_id a_last_completion a_next_invocation object_name ---------------- ------------------------- ------------------------- ------------- 080f45158000035b 2/6/2022 03:04:14 2/13/2022 03:00:00 dm_DMClean 080f45158000035c 2/6/2022 03:22:17 2/13/2022 03:15:00 dm_DMFilescan
Knowing that I got two good candidates, I obviously had to try manually to reproduce the issue. Therefore, I restarted the repository to get back to P26:
[dmadmin@cs-0 ~]$ date; iapi ${DOCBASE_NAME} -Udmadmin -Pxxx << EOC > ?,c,select r_object_id, r_creation_date, r_modify_date, r_server_version from dm_server_config; > ?,c,select r_object_id, a_last_completion, a_next_invocation, object_name from dm_job where object_name in ('dm_DMClean','dm_DMFilescan'); > EOC Mon Feb 7 08:45:13 UTC 2022 ... Session id is s0 API> r_object_id r_creation_date r_modify_date r_server_version ---------------- ------------------------- ------------------------- -------------------------------- 3d0f451580000102 7/15/2021 11:38:28 2/7/2022 08:41:02 16.4.0260.0296 Linux64.Oracle (1 row affected) API> r_object_id a_last_completion a_next_invocation object_name ---------------- ------------------------- ------------------------- ------------- 080f45158000035b 2/6/2022 03:04:14 2/13/2022 03:00:00 dm_DMClean 080f45158000035c 2/6/2022 03:22:17 2/13/2022 03:15:00 dm_DMFilescan (2 rows affected) API> Bye [dmadmin@cs-0 ~]$
Then, I ran the dm_DMClean (updated the a_next_invocation and window_interval so that the job can start), checked that it performed the reinit of the Content Server and verified the “dm_server_config.r_server_version” value:
[dmadmin@cs-0 ~]$ date; iapi ${DOCBASE_NAME} -Udmadmin -Pxxx << EOC > ?,c,select r_object_id, r_creation_date, r_modify_date, r_server_version from dm_server_config; > ?,c,select r_object_id, a_last_completion, a_next_invocation, object_name from dm_job where object_name in ('dm_DMClean','dm_DMFilescan'); > EOC Mon Feb 7 08:50:39 UTC 2022 ... Session id is s0 API> r_object_id r_creation_date r_modify_date r_server_version ---------------- ------------------------- ------------------------- -------------------------------- 3d0f451580000102 7/15/2021 11:38:28 2/7/2022 08:41:02 16.4.0260.0296 Linux64.Oracle (1 row affected) API> r_object_id a_last_completion a_next_invocation object_name ---------------- ------------------------- ------------------------- ------------- 080f45158000035b 2/7/2022 08:49:19 2/8/2022 03:00:00 dm_DMClean 080f45158000035c 2/6/2022 03:22:17 2/8/2022 03:15:00 dm_DMFilescan (2 rows affected) API> Bye [dmadmin@cs-0 ~]$
The reinit of the Content Server happened but it didn’t change the “dm_server_config.r_modify_date”, maybe because it was already showing P26 so nothing had to be updated? The only thing that changed is the “dm_job.a_last_completion” obviously, since the job ran. This means that the dm_DMClean is probably not the culprit, so I did the same for the dm_DMFilescan:
[dmadmin@cs-0 ~]$ date; iapi ${DOCBASE_NAME} -Udmadmin -Pxxx << EOC > ?,c,select r_object_id, r_creation_date, r_modify_date, r_server_version from dm_server_config; > ?,c,select r_object_id, a_last_completion, a_next_invocation, object_name from dm_job where object_name in ('dm_DMClean','dm_DMFilescan'); > EOC Mon Feb 7 08:59:23 UTC 2022 ... Session id is s0 API> r_object_id r_creation_date r_modify_date r_server_version ---------------- ------------------------- ------------------------- -------------------------------- 3d0f451580000102 7/15/2021 11:38:28 2/7/2022 08:52:34 16.4.0000.0248 Linux64.Oracle (1 row affected) API> r_object_id a_last_completion a_next_invocation object_name ---------------- ------------------------- ------------------------- ------------- 080f45158000035b 2/7/2022 08:49:19 2/8/2022 03:00:00 dm_DMClean 080f45158000035c 2/7/2022 08:58:44 2/8/2022 03:15:00 dm_DMFilescan (2 rows affected) API> Bye [dmadmin@cs-0 ~]$ [dmadmin@cs-0 ~]$ grep Report $DOCUMENTUM/dba/log/${DOCBASE_NAME}/sysadmin/DMFilescanDoc.txt DMFilescan Report For DocBase REPO01 As Of 2/7/2022 08:52:22 Report End 2/7/2022 08:58:43 [dmadmin@cs-0 ~]$
As you can see above, the dm_DMFilescan did change the “dm_server_config.r_server_version” to P00 and therefore the “dm_server_config.r_modify_date” was also updated. Checking the dm_DMFilescan job report shows that it took around 6min to complete and the update of the dm_server_config object happened around 10s after the start of the job.
Therefore, the reason why the “dm_server_config.r_server_version” is being changed “randomly” from P26 back to P00 isn’t actually random but it’s due to the execution of the dm_DMFilescan job. On HA environment, since this job can run on any of the available Content Servers, it gave a sense of randomness but it’s not. The same information was provided to OpenText and the bug CS-136387 was opened for the same.
While doing further checks to try to understand the root cause, I saw the following on the method’s dmbasic scripts:
[dmadmin@cs-0 ~]$ cd $DM_HOME/bin/ [dmadmin@cs-0 bin]$ [dmadmin@cs-0 bin]$ documentum -v OpenText Documentum Release Version: 16.4.0260.0296 Linux64.Oracle [dmadmin@cs-0 bin]$ [dmadmin@cs-0 bin]$ ls -ltr dmfilescan* dmclean* -rwxr-x--- 1 dmadmin dmadmin 13749675 Nov 11 13:38 dmfilescan -rwxr-x--- 1 dmadmin dmadmin 13769063 Nov 11 13:38 dmclean.patch.bak -rwxr-x--- 1 dmadmin dmadmin 13866290 Nov 11 13:39 dmclean [dmadmin@cs-0 bin]$ [dmadmin@cs-0 bin]$ strings dmfilescan | grep "16.4" 16.4.0000.0248 [dmadmin@cs-0 bin]$ [dmadmin@cs-0 bin]$ strings dmclean.patch.bak | grep "16.4" 16.4.0000.0248 [dmadmin@cs-0 bin]$ [dmadmin@cs-0 bin]$ strings dmclean | grep "16.4" 16.4.0260.0296 [dmadmin@cs-0 bin]$
As shown above, the reason is the Documentum patching of the binaries:
- dmclean: the dmbasic script is being updated properly and the version number that seems to be hardcoded inside it reflects the P26
- dmfilescan: the dmbasic script isn’t being updated by the patch (there is no “*.patch.bak” file) and therefore it still contains the P00 hardcoded version