I recently faced another interesting issue (c.f. this one) with Documentum xPlore where the Dsearch would refuse to start because of a stuck online rebuild. It happened on a Kubernetes environment using custom images we built (for security and scalability). The K8s Nodes were split across two DataCenters for DR reasons. However, one of the DataCenter had an important issue which caused all VMs running there to freeze and become unresponsive. Some Documentum pods were “running” on these VMs, at that point in time. While putting these K8s Nodes offline, the pods were properly rescheduled on the other DataCenter. However, post that operation, one specific Dsearch pod couldn’t start anymore. This Documentum environment has two xPlore Federations, each containing one Dsearch (with embedded local CPS), one IndexAgent and two CPS-Only pods (so four xPlore pods per Federation). All pods were up&running properly, except for that particular Dsearch.
The startup logs of the Dsearch were as follow:
[xplore@ds1-0 ~]$ cdlogPrimaryDsearch
[xplore@ds1-0 logs]$ ls -ltr
total 108
-rw-r----- 1 xplore xplore 0 Oct 14 07:20 rest.log
-rw-r----- 1 xplore xplore 139 Oct 14 07:20 dsearchadminweb.log
-rw-r----- 1 xplore xplore 37722 Oct 14 07:21 cps_daemon.log
-rw-r----- 1 xplore xplore 1353 Oct 14 07:21 cps.log
-rw-r----- 1 xplore xplore 3218 Oct 14 07:21 xdb.log
-rw-r----- 1 xplore xplore 7445 Oct 14 07:21 dfc.log
-rw-r----- 1 xplore xplore 39287 Oct 14 07:21 dsearch.log
[xplore@ds1-0 logs]$
[xplore@ds1-0 logs]$ grep ERROR *
dfc.log:2024-10-14 07:21:18,279 ERROR [default task-1] io.undertow.servlet.request - UT015002: Stopping servlet IndexServerServlet due to permanent unavailability
dsearch.log:2024-10-14 07:21:09,858 ERROR [ServerService Thread Pool -- 76] c.e.d.c.f.i.core.collection.ESSCollection - Failed to create index on library REPO_NAME/dsearch/Data/col1.
dsearch.log:2024-10-14 07:21:09,860 ERROR [ServerService Thread Pool -- 76] c.e.d.core.fulltext.webapp.IndexServerServlet - Failed to start
dsearch.log:2024-10-14 07:21:18,278 ERROR [default task-1] c.e.d.core.fulltext.webapp.IndexServerServlet - Failed to start
dsearch.log:2024-10-14 07:21:45,963 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
dsearch.log:2024-10-14 07:21:45,966 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
dsearch.log:2024-10-14 07:21:45,968 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
dsearch.log:2024-10-14 07:21:45,968 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
dsearch.log:2024-10-14 07:21:45,970 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
[xplore@ds1-0 logs]$
[xplore@ds1-0 logs]$ cat dsearch.log
2024-10-14 07:20:59,035 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.core.ESSNode - Starting xPlore
2024-10-14 07:20:59,044 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.core.ESSContext - Initializing xPlore instance
2024-10-14 07:20:59,062 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initializing xDB federation and database
2024-10-14 07:21:00,265 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveSchemaHandler - Not loading configuration from file system, revision 1.29 equal to xDB revision of indexserverconfig.xml
2024-10-14 07:21:00,725 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.indexserver.core.ESSActiveNodesRegistry - Register instance PrimaryDsearch
2024-10-14 07:21:00,726 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Starting xDB for instance primary
2024-10-14 07:21:00,967 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initialize xDB driver to local driver with cache pages: 524288 and Socket timeout :1000 ms
2024-10-14 07:21:01,153 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - XDB is listening at 9330 successfully
2024-10-14 07:21:01,253 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initialize xDB driver to remote driver with cache pages: 524288 and Socket timeout :1000 ms
2024-10-14 07:21:01,253 INFO [ServerService Thread Pool -- 76] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - The XML database server is started successfully. {xDB version=xDB 10_7@4558357}
2024-10-14 07:21:01,261 DEBUG [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (local) with connection 3
2024-10-14 07:21:08,559 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a local Content Processing Service with version [20.2.0000.0015].
2024-10-14 07:21:08,630 DEBUG [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (https://cps1-0.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl) with connection 3
2024-10-14 07:21:09,077 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a remote Content Processing Service [https://cps1-0.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl] with version [20.2.0000.0015].
2024-10-14 07:21:09,244 DEBUG [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (https://cps1-1.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl) with connection 3
2024-10-14 07:21:09,437 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a remote Content Processing Service [https://cps1-1.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl] with version [20.2.0000.0015].
2024-10-14 07:21:09,626 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.core.ESSContext - Initialize domain SystemData
2024-10-14 07:21:09,706 INFO [ServerService Thread Pool -- 76] c.e.d.core.fulltext.indexserver.core.ESSContext - Initialize domain REPO_NAME
2024-10-14 07:21:09,858 ERROR [ServerService Thread Pool -- 76] c.e.d.c.f.i.core.collection.ESSCollection - Failed to create index on library REPO_NAME/dsearch/Data/col1.
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ReindexStatus.reset(ReindexStatus.java:68)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.addLMPIReindexTaskIfNecessary(ESSCollection.java:2766)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.createIndexes(ESSCollection.java:2824)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.initIndex(ESSCollection.java:2665)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.init(ESSCollection.java:2347)
...
2024-10-14 07:21:09,860 ERROR [ServerService Thread Pool -- 76] c.e.d.core.fulltext.webapp.IndexServerServlet - Failed to start
com.emc.documentum.core.fulltext.common.exception.IndexServerRuntimeException: com.emc.documentum.core.fulltext.common.exception.IndexServerException: Failed to create index on library REPO_NAME/dsearch/Data/col1.
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSContext.initialize(ESSContext.java:261)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSNode.startUp(ESSNode.java:67)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.webapp.IndexServerServlet.init(IndexServerServlet.java:48)
at [email protected]//io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)
...
Caused by: com.emc.documentum.core.fulltext.common.exception.IndexServerException: Failed to create index on library REPO_NAME/dsearch/Data/col1.
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.createIndexes(ESSCollection.java:2864)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.initIndex(ESSCollection.java:2665)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.init(ESSCollection.java:2347)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.<init>(ESSCollection.java:145)
...
Caused by: java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ReindexStatus.reset(ReindexStatus.java:68)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.addLMPIReindexTaskIfNecessary(ESSCollection.java:2766)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.createIndexes(ESSCollection.java:2824)
... 34 common frames omitted
2024-10-14 07:21:09,861 INFO [Index-Rebuilder-col1-0] c.e.d.c.f.i.core.collection.ESSCollection - Rebuilding index [dmftdoc] for collection [col1].
2024-10-14 07:21:09,863 INFO [Index-Rebuilder-col1-0] c.e.d.c.f.i.core.collection.ESSCollection - Rebuild index dmftdoc with non-blocking mode.
2024-10-14 07:21:18,256 INFO [default task-1] c.e.d.core.fulltext.indexserver.core.ESSNode - Starting xPlore
2024-10-14 07:21:18,256 INFO [default task-1] c.e.d.core.fulltext.indexserver.core.ESSContext - Initializing xPlore instance
2024-10-14 07:21:18,276 INFO [default task-1] c.e.d.c.f.indexserver.core.ESSActiveNodesRegistry - Register instance PrimaryDsearch
2024-10-14 07:21:18,276 INFO [default task-1] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Starting xDB for instance primary
2024-10-14 07:21:18,278 ERROR [default task-1] c.e.d.core.fulltext.webapp.IndexServerServlet - Failed to start
com.emc.documentum.core.fulltext.common.exception.IndexServerRuntimeException: com.emc.documentum.core.fulltext.common.exception.EngineException: Failed to start xDB socket listener
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSContext.initialize(ESSContext.java:261)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSNode.startUp(ESSNode.java:67)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.webapp.IndexServerServlet.init(IndexServerServlet.java:48)
at [email protected]//io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)
...
Caused by: com.emc.documentum.core.fulltext.common.exception.EngineException: Failed to start xDB socket listener
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.engine.xhive.impl.XhiveManager.startDatabase(XhiveManager.java:646)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSContext.startDatabase(ESSContext.java:187)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.ESSContext.initialize(ESSContext.java:237)
... 49 common frames omitted
Caused by: java.net.BindException: Address already in use (Bind failed)
at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
...
2024-10-14 07:21:45,963 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.ESSIndexHelper.getObjectCategoryConfig(ESSIndexHelper.java:97)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.ESSIndexHelper.shouldCheckObjectVersion(ESSIndexHelper.java:115)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.FtIndexObject.<init>(FtIndexObject.java:71)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.callCPSPlugin(ESSXMLNodeHandler.java:379)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.enter(ESSXMLNodeHandler.java:123)
at deployment.dsearch.war//com.xhive.xDB_10_7_r4558357.ch$b.c(xdb:617)
...
2024-10-14 07:21:45,966 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.prepareDomDocument(ESSXMLNodeHandler.java:304)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.callCPSPlugin(ESSXMLNodeHandler.java:350)
...
2024-10-14 07:21:45,968 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.prepareDomDocument(ESSXMLNodeHandler.java:304)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.callCPSPlugin(ESSXMLNodeHandler.java:350)
...
2024-10-14 07:21:45,968 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.prepareDomDocument(ESSXMLNodeHandler.java:304)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.callCPSPlugin(ESSXMLNodeHandler.java:350)
...
2024-10-14 07:21:45,970 ERROR [Index-Rebuilder-col1-0-Worker-0] c.e.d.c.f.i.core.index.xhive.ESSXMLNodeHandler - Failed to handle 090f123480096bde
java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.ESSIndexHelper.getObjectCategoryConfig(ESSIndexHelper.java:97)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.ESSIndexHelper.shouldCheckObjectVersion(ESSIndexHelper.java:115)
...
2024-10-14 07:21:46,427 WARN [Index-Rebuilder-col1-0] c.e.d.c.f.i.core.collection.FtReindexTask - Reindex for index col1.dmftdoc failed
com.emc.documentum.core.fulltext.common.exception.IndexServerException: java.lang.NullPointerException
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.recreatePathIndexNB(ESSCollection.java:3391)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindexNB(ESSCollection.java:1360)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindex(ESSCollection.java:1249)
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.collection.FtReindexTask.run(FtReindexTask.java:204)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException: null
at deployment.dsearch.war//com.emc.documentum.core.fulltext.indexserver.core.index.xhive.ESSXMLNodeHandler.getTokens(ESSXMLNodeHandler.java:163)
at deployment.dsearch.war//com.xhive.xDB_10_7_r4558357.ch$b.hasNext(xdb:473)
at deployment.dsearch.war//com.xhive.core.index.PathValueIndexModifier.a(xdb:328)
...
[xplore@ds1-0 logs]$
In terms of processes, it looked “OK”, as all expected Dsearch processes were present, including the (local) CPS Indexing & Querying ones. However, the K8s liveness (a custom one I created long ago) was showing an issue with “/dsearch” not responding, and the pod was restarting continuously:
[xplore@ds1-0 logs]$ ps uxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
xplore 3015817 0.0 0.0 13920 3220 pts/1 S 07:20 0:00 /bin/sh ./startPrimaryDsearch.sh
xplore 3015819 0.0 0.0 13940 3276 pts/1 S 07:20 0:00 \_ /bin/sh /app/xPlore/wildfly17.0.1/bin/standalone.sh
xplore 3015920 52.9 1.5 10265428 2030744 pts/1 Sl 07:20 1:14 \_ /app/xPlore/java64/JAVA_LINK/bin/java -D[Standalone] -server -Xms8g -Xmx8g -XX:MaxMetaspaceSize=512m -XX:+UseG1GC -XX:+UseStringDeduplicati
xplore 3016359 0.1 0.0 670524 59008 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml Daemon0 9322
xplore 3016412 0.1 0.0 670460 59868 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml Daemon1 9323
xplore 3016486 0.1 0.0 670652 61400 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml Daemon2 9324
xplore 3016539 0.1 0.0 670396 58836 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml Daemon3 9325
xplore 3016660 0.1 0.0 670588 63384 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml Daemon4 9326
xplore 3016714 0.1 0.0 670524 61196 pts/1 Sl 07:21 0:00 \_ /app/xPlore/dsearch/cps/cps_daemon/bin/CPSDaemon /app/xPlore/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration.xml QDaemon0 932
xplore 3012770 0.0 0.0 14064 3668 pts/1 Ss 07:14 0:00 bash -l
xplore 3017548 0.0 0.0 53832 3960 pts/1 R+ 07:22 0:00 \_ ps uxf
xplore 3011206 0.0 0.0 14064 3656 pts/2 Ss+ 07:10 0:00 bash -l
xplore 2988030 0.0 0.0 14064 3720 pts/0 Ss+ 06:21 0:00 bash -l
xplore 2981707 0.0 0.0 13932 3464 ? Ss 06:15 0:00 /bin/bash /scripts/dbi_entrypoint.sh
[xplore@ds1-0 logs]$
[xplore@ds1-0 logs]$ /scripts/dbi_ft_liveness.sh
INFO - FT liveness exit code is '1' -- Unexpected http response from component 'PrimaryDsearch' for url 'https://xxx:9302/dsearch'...
[xplore@ds1-0 logs]$
Same thing when trying to stop xPlore to begin the investigations, it shows a problem connecting to the Dsearch:
[xplore@ds1-0 logs]$ $STARTSTOP stop
**
** The PrimaryDsearch is running with PID: 3015920
**
INFO - Stopping the PrimaryDsearch...
Instance {PrimaryDsearch} is about to shut down, wait for shutdown complete message.
Exception in thread "main" java.lang.IllegalArgumentException: Fail to connect remote server (https://ds1-0.ds1.dctm-ns1-name.svc.cluster.local:9302)
at com.emc.documentum.core.fulltext.client.admin.cli.DSearchAdminScript.getAdminService(DSearchAdminScript.java:86)
at com.emc.documentum.core.fulltext.client.admin.cli.DSearchAdminScript.stopNode(DSearchAdminScript.java:187)
...
Caused by: com.emc.documentum.core.fulltext.common.admin.DSearchAdminException: [ERROR] ServiceException thrown out:
com.emc.documentum.core.fulltext.indexserver.admin.controller.ServiceException: Exception happened while connect to instance: PrimaryDsearch
at com.emc.documentum.core.fulltext.client.admin.api.impl.ESSAdminSOAPClient.<init>(ESSAdminSOAPClient.java:110)
at com.emc.documentum.core.fulltext.client.admin.api.impl.ESSAdminServiceImpl.<init>(ESSAdminServiceImpl.java:62)
...
Caused by: com.emc.documentum.core.fulltext.indexserver.admin.controller.ServiceException: Exception happened while connect to instance: PrimaryDsearch
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
...
{
"outcome" => "success",
"result" => undefined
}
[xplore@ds1-0 logs]$
I had no idea what was happening in this environment in the few days prior to the DataCenter issue as the day-to-day support was handled by another team. However, based on the logs, it looked like it could be linked to an online rebuild. Therefore, I went and checked on the recent changes done to the indexserverconfig.xml file and I found that:
[xplore@ds1-0 logs]$ cd $CONFIG_HOME
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ ll -tr indexserverconfig.xml*
-rw-r----- 1 xplore xplore 33388 Jul 3 2021 indexserverconfig.xml.bakHttp
-rw-r----- 1 xplore xplore 33389 Jul 3 2021 indexserverconfig.xml.patch.bak.custom-v1
-rw-r----- 1 xplore xplore 34628 Oct 8 07:34 indexserverconfig.xml.patch.bak.custom-v2
-rw-r----- 1 xplore xplore 36696 Oct 10 01:11 indexserverconfig.xml
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ diff indexserverconfig.xml indexserverconfig.xml.patch.bak.custom-v2
2c2
< <index-server-configuration enable-lemmatization="true" config-check-interval="60000" revision="1.29">
---
> <index-server-configuration enable-lemmatization="true" config-check-interval="60000" revision="1.22">
159,163d158
< <sub-path path="dmftmetadata//a_status" type="string" enumerate-repeating-elements="false" full-text-search="true" value-comparison="true" returning-contents="true" include-descendants="false" description="Used by REPO_NAME to compute facets." boost-value="1.0" compress="true" leading-wildcard="false" sortable="false" include-start-end-token-flags="true"/>
< <sub-path path="dmftmetadata//doc_business_unit" type="string" enumerate-repeating-elements="false" full-text-search="true" value-comparison="true" returning-contents="true" include-descendants="false" description="Used by REPO_NAME to compute facets." boost-value="1.0" compress="true" leading-wildcard="false" sortable="false" include-start-end-token-flags="true"/>
< <sub-path path="dmftmetadata//site_unit" type="string" enumerate-repeating-elements="false" full-text-search="true" value-comparison="true" returning-contents="true" include-descendants="false" description="Used by REPO_NAME to compute facets." boost-value="1.0" compress="true" leading-wildcard="false" sortable="false" include-start-end-token-flags="true"/>
< <sub-path path="dmftmetadata//doc_responsible_author" type="string" enumerate-repeating-elements="false" full-text-search="true" value-comparison="true" returning-contents="true" include-descendants="false" description="Used by REPO_NAME to compute facets." boost-value="1.0" compress="true" leading-wildcard="false" sortable="false" include-start-end-token-flags="true"/>
< <sub-path path="dmftmetadata//material_number" type="string" enumerate-repeating-elements="false" full-text-search="true" value-comparison="true" returning-contents="true" include-descendants="false" description="Used by REPO_NAME to compute facets." boost-value="1.0" compress="true" leading-wildcard="false" sortable="false" include-start-end-token-flags="true"/>
373,377c368
< <collection usage="Data" document-category="dftxml" name="col1">
< <properties>
< <property value="dmftdoc_9640f" name="Build_dmftdoc"/>
< </properties>
< </collection>
---
> <collection usage="Data" document-category="dftxml" name="col1"/>
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ date
Mon Oct 14 07:23:34 UTC 2024
[xplore@ds1-0 config]$
Based on the above, it looked evident that five new facets were added by someone on the 8-Oct and for these facets to appear on Documentum Search, you would then need to perform an online rebuild. The last section/difference is related to that online rebuild. Whenever you start an online rebuild, the collection definition is changed in such way, to include a new property named “Build_dmftdoc“. And when it’s completed, it gets removed. However, this is a fairly small index, so an online rebuild should be done in half a day, at most. Above, it appears as if the online rebuild that has been started on the 10-Oct was still in progress, which is definitively not normal.
I have no evidence of it, but I can only assume that someone started an online rebuild on the 10-Oct for the “col1” collection and before it could complete, the DataCenter issue started, which rendered the Dsearch pod (or one of the CPS-Only pods) frozen and in an inconsistent/corrupted state. Because of it, there would be a different status for the document 090f123480096bde in the index / xDB and that would be preventing the Dsearch to start properly as we see on the logs.
To fix this issue, I tried to cancel/remove the online rebuild by modifying the indexserverconfig.xml file manually. For that purpose, I simply incremented the revision number and restored the collection “col1” definition to what it should be without a running online rebuild (so removing the 3 properties lines):
[xplore@ds1-0 config]$ cp -p indexserverconfig.xml indexserverconfig.xml.with_issue
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ vi indexserverconfig.xml ### Corrected the file here
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ cp -p indexserverconfig.xml indexserverconfig.xml.with_modification
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ diff indexserverconfig.xml.with_issue indexserverconfig.xml.with_modification
2c2
< <index-server-configuration enable-lemmatization="true" config-check-interval="60000" revision="1.29">
---
> <index-server-configuration enable-lemmatization="true" config-check-interval="60000" revision="1.30">
373,377c373
< <collection usage="Data" document-category="dftxml" name="col1">
< <properties>
< <property value="dmftdoc_9640f" name="Build_dmftdoc"/>
< </properties>
< </collection>
---
> <collection usage="Data" document-category="dftxml" name="col1"/>
[xplore@ds1-0 config]$
With this new revision “1.30“, I started the Dsearch again and this time it was able to start successfully:
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ $STARTSTOP start
**
** The PrimaryDsearch is shutdown
**
INFO - Starting the PrimaryDsearch...
**
** The PrimaryDsearch is running with PID: 53180
**
[xplore@ds1-0 config]$
[xplore@ds1-0 config]$ cdlogPrimaryDsearch
[xplore@ds1-0 logs]$ cat dsearch.log
2024-10-14 07:30:24,118 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSNode - Starting xPlore
2024-10-14 07:30:24,126 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Initializing xPlore instance
2024-10-14 07:30:24,143 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initializing xDB federation and database
2024-10-14 07:30:25,376 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveSchemaHandler - File system configuration (revision 1.30) is higher than xDB revision (1.29). Updating indexserverconfig.xml in xDB.
2024-10-14 07:30:25,931 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.indexserver.core.ESSActiveNodesRegistry - Register instance PrimaryDsearch
2024-10-14 07:30:25,932 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Starting xDB for instance primary
2024-10-14 07:30:26,123 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initialize xDB driver to local driver with cache pages: 524288 and Socket timeout :1000 ms
2024-10-14 07:30:26,395 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - XDB is listening at 9330 successfully
2024-10-14 07:30:26,504 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - Initialize xDB driver to remote driver with cache pages: 524288 and Socket timeout :1000 ms
2024-10-14 07:30:26,504 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.engine.xhive.impl.XhiveManager - The XML database server is started successfully. {xDB version=xDB 10_7@4558357}
2024-10-14 07:30:26,514 DEBUG [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (local) with connection 3
2024-10-14 07:30:33,925 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a local Content Processing Service with version [20.2.0000.0015].
2024-10-14 07:30:33,986 DEBUG [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (https://cps1-0.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl) with connection 3
2024-10-14 07:30:34,447 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a remote Content Processing Service [https://cps1-0.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl] with version [20.2.0000.0015].
2024-10-14 07:30:34,636 DEBUG [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Begin to connect to CPS at (https://cps1-1.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl) with connection 3
2024-10-14 07:30:34,838 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.cps.CPSSubmitter - Connected to a remote Content Processing Service [https://cps1-1.cps1.dctm-ns1-name.svc.cluster.local:9302/cps/ContentProcessingService?wsdl] with version [20.2.0000.0015].
2024-10-14 07:30:35,030 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Initialize domain SystemData
2024-10-14 07:30:35,128 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Initialize domain REPO_NAME
2024-10-14 07:30:35,387 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Starting Audit Service
2024-10-14 07:30:35,391 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.services.audit.impl.FtAuditService - Audit records purge task will launch at: 2024-10-15T00:00:00+00:00
2024-10-14 07:30:35,392 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.indexserver.services.FtBaseService - Auditing service started
2024-10-14 07:30:35,392 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Starting System Metrics Service
2024-10-14 07:30:35,396 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.indexserver.services.FtBaseService - SystemMetrics service started
2024-10-14 07:30:35,396 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Starting Group Cache Service
2024-10-14 07:30:35,397 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.s.groupcache.impl.FtGroupCacheService - Enable incremental group cache update
2024-10-14 07:30:35,397 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.indexserver.services.FtBaseService - GroupCache service started
2024-10-14 07:30:35,399 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.services.security.GlobalACLCache - Use global ACL cache, user count=10, clear delay time=0
2024-10-14 07:30:35,400 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.services.security.GlobalACECache - Use global ACE cache, ACE cache size=1000000
2024-10-14 07:30:35,400 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - Starting Auto Warmup Service
2024-10-14 07:30:35,410 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.indexserver.services.FtBaseService - Warmup service started
2024-10-14 07:30:35,417 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSContext - The xPlore instance PrimaryDsearch initialized
2024-10-14 07:30:35,425 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.core.index.thread.WorkerThreadPool - Spawn a new thread CPSWorkerThread-1
2024-10-14 07:30:35,426 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.core.index.thread.WorkerThreadPool - Spawn a new thread CPSWorkerThread-2
2024-10-14 07:30:35,427 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.core.index.thread.WorkerThreadPool - Spawn a new thread IndexWorkerThread-1
2024-10-14 07:30:35,427 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.core.index.thread.WorkerThreadPool - Spawn a new thread IndexWorkerThread-2
2024-10-14 07:30:35,438 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSNode - The xPlore instance PrimaryDsearch started the indexing threads.
2024-10-14 07:30:35,451 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.search.threads.SearchThreadPool - Spawn 4 search threads
2024-10-14 07:30:36,445 INFO [ServerService Thread Pool -- 80] c.e.d.c.f.i.admin.controller.jmx.ESSAdminJMXPlugin - Admin agent started.
2024-10-14 07:30:36,445 INFO [ServerService Thread Pool -- 80] c.e.d.core.fulltext.indexserver.core.ESSNode - The xPlore instance PrimaryDsearch started. {version=20.2.0000.0015}
[xplore@ds1-0 logs]$
To close this topic, a new online rebuild was started, to make sure the facets are properly showing-up, since I wasn’t sure what was the exact status of other collections. Once it was done, the Dsearch pod could restart again without problems, so the issue was fully fixed.