A little bit more than two years ago, one of our customers started to use Documentum 20.2 (and therefore xPlore 20.2) in custom containers, with all components in SSL/Secure. At first sight, everything seemed to work. However, in case the Repository restarted (for any reasons), then the searches stopped working completely. Basically, there was no problem if the Docbroker and Repository started first and the xPlore afterwards, but in case it was the other way around, then the searches didn’t work at all.

Reproducing the issue was rather simple:

[[email protected] ~]$ ### At the moment, xPlore started after the Repository
[[email protected] ~]$ alias | grep -E "start|stop|ts"
alias start='$DOCUMENTUM/dba/dm_start_REPO1'
alias stop='$DOCUMENTUM/dba/dm_shutdown_REPO1'
alias ts='iapi REPO1 -Udmadmin -Pxxx -r/tmp/test_search.api'
[[email protected] ~]$
[[email protected] ~]$ cat /tmp/test_search.api
?,c,select count(*) from dm_document search document contains 'test'
exit
[[email protected] ~]$
[[email protected] ~]$ curl -k HTTPS://ft-service.dctm-ns1.svc.cluster.local:9302/dsearch/; echo
The xPlore instance PrimaryDsearch [version=20.2.0000.0015] normal
[[email protected] ~]$
[[email protected] ~]$ ts

OpenText Documentum iapi - Interactive API interface
Copyright (c) 2020. OpenText Corporation
All rights reserved.
Client Library Release 20.2.0000.0082

Connecting to Server using docbase REPO1
[DM_SESSION_I_SESSION_START]info: "Session 010f457a80050108 started for user dmadmin."

Connected to OpenText Documentum Server running Release 20.2.0000.0110 Linux64.Oracle
Session id is s0
API> count(*)
----------------------
                   270
(1 row affected)

API> Bye
[[email protected] ~]$
[[email protected] ~]$ ### Search is working, restarting the Repository
[[email protected] ~]$
[[email protected] ~]$ stop
Stopping Documentum server for repository: [REPO1]

OpenText Documentum iapi - Interactive API interface
Copyright (c) 2020. OpenText Corporation
All rights reserved.
Client Library Release 20.2.0000.0082

Connecting to Server using docbase REPO1.REPO1
[DM_SESSION_I_SESSION_START]info: "Session 010f457a8005010a started for user dmadmin."

Connected to OpenText Documentum Server running Release 20.2.0000.0110 Linux64.Oracle
Session id is s0
API> shutdown,c,T,T
...
OK
API> exit
Bye
Waiting for 90 seconds for server pid, 2152, to disappear.

Mon Aug 3 17:01:49 UTC 2020: Waiting for shutdown of repository: [REPO1]
Mon Aug 3 17:01:49 UTC 2020: checking for pid: 2152

repository: [REPO1] has been shutdown
checking that all children (2155 2205 2206 2218 2219 2221 2247 2264 2642) have shutdown
[[email protected] ~]$
[[email protected] ~]$ start
starting Documentum server for repository: [REPO1]
with server log: [$DOCUMENTUM/dba/log/REPO1.log]
server pid: 2854
[[email protected] ~]$
[[email protected] ~]$ ts
...
Connected to OpenText Documentum Server running Release 20.2.0000.0110 Linux64.Oracle
Session id is s0
API> [DM_FULLTEXT_E_SEARCH_NEW_FAIL]error: "dmFTSearchNew failed with error: ESS_DMSearch::ExecuteSearch: Communication Exception Failed to perform URL: HTTPS://ft-service.dctm-ns1.svc.cluster.local:9302/dsearch/IndexServerServlet?ftRequestHandler=GET-CONFIG, errorcode: 35: SSL connect error,General Exception, error code: -2"

API> Bye
[[email protected] ~]$
[[email protected] ~]$ curl -k HTTPS://ft-service.dctm-ns1.svc.cluster.local:9302/dsearch/; echo
The xPlore instance PrimaryDsearch [version=20.2.0000.0015] normal
[[email protected] ~]$

As you can see above, while the xPlore processes stayed up&running, the issue could be reproduced by just restarting the Repository. This environment was using the supported Oracle OpenJDK 11 with the default “security.providers” on both CS (Content Server = Documentum Server) and FT (Full Text = Documentum xPlore). When the Repository started, it would, by default, try to contact the IndexAgent (it can be disabled with “start_index_agents=F” in server.ini) and it seemed that this was what messed things up in the SSL communications between the CS and the FT.

I worked with OpenText on the SR#4750851 where we generated a lot of traces/debugs logs (java SSL debug, tcpdump, Documentum rpc/fulltext traces, aso…) for the SSL layers but in the end, our suspicions were redirected to the OS (Operating System) libraries. We tried on both RHEL 7.7 and 7.5 with respectively OpenSSL 1.1.1d and 1.0.2k-fips but the same issue was seen. Here are the libraries in play when there is a search on the CS:

[[email protected] ~]$ ldd $DM_HOME/bin/libDsearchQueryPlugin.so
        linux-vdso.so.1 =>  (0x00007ffd22aff000)
        libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f1ad7b9f000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f1ad7997000)
        libxerces-c.so.28 => $DM_HOME/bin/libxerces-c.so.28 (0x00007f1ad73c1000)
        libssl.so.1.1 => $DM_HOME/bin/libssl.so.1.1 (0x00007f1ad7130000)
        libcrypto.so.1.1 => $DM_HOME/bin/libcrypto.so.1.1 (0x00007f1ad6c6f000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1ad6968000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1ad6666000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1ad6450000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1ad6082000)
        libidn.so.11 => /lib64/libidn.so.11 (0x00007f1ad5e4f000)
        libssh2.so.1 => /lib64/libssh2.so.1 (0x00007f1ad5c22000)
        libssl3.so => /lib64/libssl3.so (0x00007f1ad59c9000)
        libsmime3.so => /lib64/libsmime3.so (0x00007f1ad57a1000)
        libnss3.so => /lib64/libnss3.so (0x00007f1ad5472000)
        libnssutil3.so => /lib64/libnssutil3.so (0x00007f1ad5242000)
        libplds4.so => /lib64/libplds4.so (0x00007f1ad503e000)
        libplc4.so => /lib64/libplc4.so (0x00007f1ad4e39000)
        libnspr4.so => /lib64/libnspr4.so (0x00007f1ad4bfb000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1ad49df000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1ad47db000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f1ad458e000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f1ad42a5000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f1ad4072000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f1ad3e6e000)
        liblber-2.4.so.2 => $DM_HOME/bin/liblber-2.4.so.2 (0x00007f1ad3c60000)
        libldap-2.4.so.2 => $DM_HOME/bin/libldap-2.4.so.2 (0x00007f1ad3a16000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f1ad3800000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1ad804d000)
        libssl.so.10 => /lib64/libssl.so.10 (0x00007f1ad358e000)
        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f1ad312b000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f1ad2f1b000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f1ad2d17000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f1ad2afd000)
        libsasl2.so.2 => /lib64/libsasl2.so.2 (0x00007f1ad28e0000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f1ad26b9000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f1ad2482000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f1ad2220000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f1ad201d000)
[[email protected] ~]$

When we compared the list above to the one coming from containers provided by OpenText that apparently didn’t have this issue, we saw that there were a few differences, but the two most suspicious candidate were:

        libnss3.so => /lib64/libnss3.so (0x00007f1ad5472000)
        libnssutil3.so => /lib64/libnssutil3.so (0x00007f1ad5242000)

NSS stands for Network Security Services and it’s a set of libraries for cross-platform development of client/servers with the security aspects taken into account. This appeared to be coming from the base OS image that the customer needed to use (big company, having its own enterprise-wise images). There is a parameter for the NSS that can sometimes help to avoid issues, for more details, look at the definition of “NSS_STRICT_NOFORK” here. We tried to apply it and got the following behavior:

[[email protected] ~]$ ### Defining the environment variable and checking again
[[email protected] ~]$ echo $NSS_STRICT_NOFORK
[[email protected] ~]$
[[email protected] ~]$ export NSS_STRICT_NOFORK=DISABLED
[[email protected] ~]$ echo $NSS_STRICT_NOFORK
DISABLED
[[email protected] ~]$
[[email protected] ~]$ stop
Stopping Documentum server for repository: [REPO1]
...
repository: [REPO1] has been shutdown
checking that all children (2857 2898 2903 2904 2911 2914 2922 2933 2954) have shutdown
[[email protected] ~]$
[[email protected] ~]$ start
starting Documentum server for repository: [REPO1]
with server log: [$DOCUMENTUM/dba/log/REPO1.log]
server pid: 3049
[[email protected] ~]$
[[email protected] ~]$ ts
...
Connected to OpenText Documentum Server running Release 20.2.0000.0110 Linux64.Oracle
Session id is s0
API> count(*)
----------------------
                   270
(1 row affected)

API> Bye
[[email protected] ~]$

As you can see, by defining this environment variable, the issue doesn’t reproduce anymore and it’s now safe to restart the Repository while xPlore is running. Going a little bit deeper, it looks like this issue might be a consequence of the libraries defaulting to NSS on some OS (the ones with the issue). As an example, the curl source package of Debian seems to list three flavours of curl: libcurl3-gnutls (GnuTLS flavour), libcurl3-nss (NSS flavour) and libcurl4 (OpenSSL flavour). In addition, if you look at the curl dependencies, it appears to be using the OpenSSL one by default. Therefore, depending on the OS (Debian, RHEL, CentOS, …), it looks like there might be issues (or not). In any cases, adding this environment variable for Documentum should be a safe workaround if you do not want to change your OS or rebuild libraries without NSS! If you have more experience with NSS, feel free to share.