In relation to my previous blog about custom facets not showing up after full reindex, a customer was doing a migration that just completed. After the full reindex, there were no facets because of what I explained in the blog. Knowing that the online rebuild is normally faster than a full reindex, I helped to start this operation but after a little bit more than a day of processing, it failed on a document. The online rebuild operation is something really useful on xPlore and it’s something that I found pretty robust since it usually works quite well.

The online rebuild stopped with the following error on the dsearch.log:

2020-01-21 17:53:44,853 WARN [Index-Rebuilder-default-0-Worker-0] c.e.d.c.f.indexserver.core.index.plugin.CPSPlugin - Content Processing Service failed for [090f1234800d647e] with error code [7] and message [Communication error while processing req 090f1234800d647e]
2020-01-21 17:53:45,758 WARN [Index-Rebuilder-default-0] c.e.d.c.f.i.core.collection.FtReindexTask - Reindex for index default.dmftdoc failed
com.emc.documentum.core.fulltext.common.exception.IndexServerException: java.lang.IllegalArgumentException: Document contains at least one immense term in field="<>/dmftcontents<0>/dmftcontent<0>/ tkn" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 97, 115, 116, 101, 114, 102, 105, 108, 101, 32, 112, 115, 117, 114, 32, 99, 97, 115, 101, 32, 114, 101, 118, 105, 101, 119, 32, 32, 32]...', original message: bytes can be at most 32766 in length; got 39938386
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.recreatePathIndexNB(ESSCollection.java:3391)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindexNB(ESSCollection.java:1360)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindex(ESSCollection.java:1249)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.FtReindexTask.run(FtReindexTask.java:204)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="<>/dmftcontents<0>/dmftcontent<0>/ tkn" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 97, 115, 116, 101, 114, 102, 105, 108, 101, 32, 112, 115, 117, 114, 32, 99, 97, 115, 101, 32, 114, 101, 118, 105, 101, 119, 32, 32, 32]...', original message: bytes can be at most 32766 in length; got 39938386
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234)
	at com.xhive.xDB_10_7_r4498571.xo.addEntry(xdb:156)
	at com.xhive.xDB_10_7_r4498571.qo.a(xdb:194)
	at com.xhive.xDB_10_7_r4498571.qo.a(xdb:187)
	at com.xhive.core.index.ExternalIndex.add(xdb:368)
	at com.xhive.core.index.XhiveIndex.a(xdb:321)
	at com.xhive.core.index.XhiveIndex.a(xdb:330)
	at com.xhive.xDB_10_7_r4498571.eq$b$1.a(xdb:142)
	at com.xhive.xDB_10_7_r4498571.bo$a.a(xdb:58)
	at com.xhive.xDB_10_7_r4498571.bo$f.a(xdb:86)
	at com.xhive.xDB_10_7_r4498571.eq$b.a(xdb:126)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:335)
	at com.xhive.core.index.PathValueIndexModifier.b(xdb:291)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:279)
	at com.xhive.core.index.PathValueIndexModifier.d(xdb:514)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:456)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:435)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:414)
	at com.xhive.core.index.PathValueIndexModifier.b(xdb:403)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:397)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:666)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:504)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:494)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:362)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:213)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:179)
	at com.xhive.core.index.XhiveIndexInConstruction.indexNext(xdb:199)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindexByWorker(ESSCollection.java:3538)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.FtReindexTask$ReindexWorker.run(FtReindexTask.java:91)
	... 1 common frames omitted
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 39938386
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
	... 36 common frames omitted

 

I don’t remember seeing this error before related to Documentum but I did see something similar on another Lucene based engine and as you can see in the exception stack, it seems to be linked to Lucene indeed… Anyway, I tried to start again the online rebuild but it failed on the exact same document. I wasn’t sure if this was a document issue or some kind of bug in xPlore so I opened the SR#4481792 and in the meantime did some checks. On the current index, I could display the dmftxml content of any random documents in less than a second, except for this specific document where it was just loading forever. Since the availability of the facets was rather time sensitive, I removed this specific document from the index using the “deleteDocs.sh” script and started again the online rebuild… However, it failed again on a second document.

The error above was happening for at least two documents but it might have been much more. Trials and errors by deleting impacted documents and restarting the online rebuild could have taken ages potentially. I was certain that the full reindex would complete for the millions of documents in a couple days because it happened just before. Therefore, instead of continuing to perform the online rebuild, which could have failed dozens of times on wrong documents, I choose another approach:

  • Delete the Data collections containing the indexed documents
    • Navigate to: Home >> Data Management >> <DOMAIN_NAME> (usually Repo name)
    • Delete the collection(s) with Category=dftxml and Usage=Data using the Red Cross on the right side of the table
  • Re-create the needed collections with the same parameters
    • Still under: Home >> Data Management >> <DOMAIN_NAME> (usually Repo name)
    • Click on: New Collection
    • Set the Name to: <COLLECTION_NAME> (e.g.: default or Node1_CPS1 or Node4_CPS2 …)
    • Set the Usage to: Data
    • Set the Document Category to: dftxml
    • Set the Binding Instance to the Dsearch which should be used, probably PrimaryDsearch
    • Select the correct location to use. If you select the “Same location as domain”, it will put the new collection as usual on your domain data folder. If you want to use another location, select the checkbox and pick the correct one: in this case, you must have already created in advance the needed storage location (“Home >> System Overview >> Global Configuration >> Storage Location“)
  • Perform the online rebuild (as mentioned above) on the empty collections (instantaneous)
  • Perform the full reindex

Doing the above will remove all indexed documents, meaning that searches will not return anything anymore, which is worse that just not having facets from a user’s perspective. However, it was just before the week-end so it was fine in this case for the end-users and at least this completely solved the issue and the facets were available on the next Monday morning. With the full reindex logs and some smart processing (I tried to give some example on this blog), I could find the list of all documents that had the above issue… In the end, it was really a document content issue and nothing related to xPlore. As mentioned on the previous blog, I had some exchange with OpenText on this topic and they created the KB15765485 based on these exchanges. It’s not exactly the procedure that I applied since I did it on the Dsearch Admin UI but the result should be the same to cleanup the index. As one would say, all roads lead to Rome… 😉