Everybody familiar with Documentum knows that just starting the JBoss/WildFly hosting an IndexAgent isn’t really enough to have the indexing working: the IndexAgent must be started from the UI (or via DA or via the job or via iapi or automatically via the Repository startup or …). Starting the IA in “normal mode” is usually something that takes a few seconds. I faced a few times an IA that apparently didn’t want to start: whenever the request was submitted, it would just try but never succeed. In this blog, I will try to explain why it happens and what can be done to restore it.
When an IndexAgent start, it will do a few things like setup the filters/exclusions, it will check all the parameters configured and finally it will communicate with the Repository to do cleanup. The step that is most probably causing this “issue” is the last one. What happen is that whenever the IndexAgent is running, it will consumes documents for indexing. During this process, it will mark some of the items in the dmi_queue_item table as taken into account. However, if the xPlore Server is stopped during the processing of these items, it might not be fully completed and therefore, there are still tasks in progress that were cancelled. To avoid non-indexed documents, the very first task of the IndexAgent, even before it is marked as started in normal mode, is therefore to reinitialize the status of these items by putting them back into the queue to process. The IndexAgent will never be marked as running if this doesn’t complete and this is what happen whenever you are facing this issue about an IndexAgent just stuck in the start process.
To see the details of the start process of an IndexAgent, you can just look into its log file whenever you submit the request. This is an example of a “working” startup:
2020-11-13 14:29:29,765 INFO FtIndexAgent [http--0.0.0.0-9202-3]DM_INDEX_AGENT_START 2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter cabinets_to_exclude value: Temp, System, Resources, 2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter types_to_exclude value: dmi_expr_code, dmc_jar, dm_method, dm_activity, dmc_module, dmc_aspect_type, dm_registered, dm_validation_descriptor, dm_location, dmc_java_library, dm_public_key_certificate, dm_client_registration, dm_procedure, dmc_dar, dm_process, dmc_tcf_activity_template, dm_ftwatermark, dmc_wfsd_type_info, dm_menu_system, dm_plugin, dm_script, dmc_preset_package, dm_acs_config, dm_business_pro, dm_client_rights, dm_cont_transfer_config, dm_cryptographic_key, dm_docbase_config, dm_esign_template, dm_format_preferences, dm_ftengine_config, dm_ftfilter_config, dm_ftindex_agent_config, dm_jms_config, dm_job, dm_mount_point, dm_outputdevice, dm_server_config, dm_xml_application, dm_xml_config, dm_ftquery_subscription, dm_smart_list, 2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter folders_to_exclude value: /Temp/Jobs, /System/Sysadmin/Reports, /System/Sysadmin/Jobs, 2020-11-13 14:29:29,811 INFO AgentInfo [http--0.0.0.0-9202-3]Start Documentum Index Agent 1.5.0170.0173 Java Version 1.7.0_72 DFC Version 7.2.0170.0165 DMCL Version 7.2.0170.0165 Docbase (Repo01) 7.2.0160.0297 Linux64.Oracle Start Configuration Information Instance indexagent_instance_name(AgentInstanceName)=xplore_server01_9200_IndexAgent docbase_name(DocbaseName)=Repo01 docbase_user(DocbaseUser)= docbase_domain(DocbaseDomain)= runaway_item_timeout(RunawayItemTimeout)=600000 runaway_thread_timeout(RunawayThreadTimeout)=600000 parameter_list(InstanceOptionalParams) Status frequency(StatusFrequency)=5000 history_size(StatusHistorySize)=20 Connectors class_name(ClassName)=com.documentum.server.impl.fulltext.indexagent.connector.DocbaseNormalModeConnector parameter_list(Options) parameter=save_queue_items, value=false parameter=queue_user, value=dm_fulltext_index_user parameter=wait_time, value=60000 parameter=batch_size, value=1000 class_name(ClassName)=com.documentum.server.impl.fulltext.indexagent.connector.FileConnector parameter_list(Options) parameter=wait_time, value=2000 parameter=batch_size, value=100 parameter=file_name, value=ids.txt Exporter queue_size(PrepQSize)=250 queue_low_percent(PrepQLowPercentage)=90 wait_time(PrepWaitTime)=100 thread_count(PrepWorkers)=2 shutdown_timeout(PrepShutdownTimeout)=60000 runaway_timeout(RunawayItemTimeout)=600000 all_filestores_local(areAll_filestores_local)=false local_content_area(LocalContentArea)=/data/primary/Indexagent_Repo01/export local_filestore_map(LocalFileStoreMap) local_content_remote_mount(LocalContentRemoteMount)=null content_clean_interval(ContentCleanInterval)=2000000 keep_dftxml(KeepDftxml)=false parameter_list(PrepOptionalParameters)= parameter=contentSizeLimit, value=367001600 Indexer queue_size(IndexQSize)=500 queue_low_percent(IndexQLowPercentage)=90 queue_size(CallbackQSize)=200 queue_low_percent(CallbackQLowPercentage)=90 wait_time(IndexWaitTime)=100 thread_count(IndexWorkers)=1 shutdown_timeout(IndexShutdownTimeout)=60000 runaway_timeout(IndexRunawayTimeout)60000 partition_config default_partition collection_name(DefaultCollection)=null partitions(PartitionMap) Indexer Plugin Config class_name(IndexerClassName)=com.documentum.server.impl.fulltext.indexagent.plugins.enterprisesearch.DSearchFTPlugin parameter_list(IndexerParams) parameter=dsearch_qrserver_host, value=lb_xplore_server.domain.com parameter=query_plugin_mapping_file, value=/app/dctm/server/fulltext/dsearch/dm_AttributeMapping.xml parameter=max_tries, value=2 parameter=max_pending_requests, value=10000 parameter=load_balancer_enabled, value=true parameter=dsearch_qrserver_protocol, value=HTTPS parameter=dsearch_qrygen_mode, value=both parameter=security_mode, value=BROWSE parameter=max_requests_in_batch, value=10 parameter=dsearch_qrserver_port, value=9302 parameter=dsearch_config_port, value=9302 parameter=dsearch_config_host, value=xplore_server01.domain.com parameter=max_batch_wait_msec, value=1000 parameter=dsearch_qrserver_target, value=/dsearch/IndexServerServlet parameter=dsearch_domain, value=Repo01 parameter=group_attributes_exclude_list, value=i_all_users_names End Configuration Information 2020-11-13 14:29:29,828 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] running DQL query: select primary_class from dmc_module where any a_interfaces = 'com.documentum.fc.indexagent.IDfCustomIndexFilter' 2020-11-13 14:29:29,833 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.services.message.impl.type.MailMessageChildFilter 2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.services.message.impl.type.MailMessageChildFilter 2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultCabinetFilterAction 2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultFolderFilterAction 2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultTypeFilterAction 2020-11-13 14:29:29,869 INFO defaultFilters [http--0.0.0.0-9202-3]Populated cabinet cache for filter CabinetsToExclude with count 3 2020-11-13 14:29:30,462 INFO defaultFilters [http--0.0.0.0-9202-3]Populated folder id cache for filter FoldersToExclude with count 140 2020-11-13 14:29:30,488 INFO DocbaseNormalModeConnector [http--0.0.0.0-9202-3][DM_INDEX_AGENT_QUERY_BEGIN] update dmi_queue_item objects set task_state = ' ', set sign_off_user = ' ', set dequeued_by = ' ', set message = ' ' where name = 'dm_fulltext_index_user' and task_state = 'acquired' and sign_off_user = 'xplore_server01_9200_IndexAgent' 2020-11-13 14:29:30,488 INFO DocbaseNormalModeConnector [http--0.0.0.0-9202-3][DM_INDEX_AGENT_QUERY_UPDATE_COUNT] 0 2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server host: xplore_server01.domain.com 2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server protocol: HTTPS 2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server port: 9302 2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server domain: Repo01 2020-11-13 14:29:30,502 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] Index Server Status: normal
When this issue occurs, the lines 92 and above will not appear. As you can see, the DQL query executed is actually recorded in the log as well as the number of items updated. The “issue” is that if there are too many items that would match the WHERE clause (acquired items), this query could take hours to complete (if at all) and therefore, it would appear as if the start isn’t working. Because of how DQL works, this kind of query on thousands of objects or more will be very DB intensive and that would introduce a big performance hit.
How is it possible to end-up with hundreds of thousand or even millions of acquired items you may think? Well each time it happened to me, it was in relation to some huge batches or jobs running that would update millions of items or during big migrations/imports of objects. As you know, the events that have been registered in the dmi_registry table will trigger the creation of a new entry in the dmi_queue_item table. Therefore, whenever you are importing a lot of documents for example, it is highly recommended to carefully manage the index table because it can cause huge performance issues since it is used a lot inside Documentum for various purposes. This is especially true whenever Lifecycles are in the picture because then processes (like ApplyD2Config) will generate a lot of dm_save events per documents and therefore duplicates in the table. I won’t go into these details in this blog but in short, you can chose to remove the events from the dmi_registry during the import and put them back afterwards, manually indexing the imported documents at the end or do manual cleanups of the dmi_queue_item table during the process. Unfortunately, if you aren’t aware that a huge migration takes places for example, then the situation can quickly become complicated with millions and millions of items. Last time I saw something similar happening, it was an import started “in secret” before the weekend and filling the dmi_queue_item table. The IndexAgent was initially started and therefore it processed them but it wasn’t fast enough. On the Monday morning, we had the pleasant surprise to see around 6 million of acquired items and 9 more million of awaiting….
I think (to be confirmed) the behavior changed in more recent versions but this environment was using xPlore 1.5 and here, the IndexAgent might pull batches of documents for processing, even if there are still already a lot in process. The xPlore Servers (a Federation) weren’t sleeping at all since they actually processed millions of items already but there were just too many to handle and unfortunately, the IA kind of entered a dead end where updating the dmi_queue_item table would just be too long for the processing to be effective again. I didn’t try to restart the IndexAgent because I knew it would never complete but I thought this might make an interesting blog post. There is probably a KB on the OpenText site describing that since it is rather well known.
As you might expect, triggering a DQL query supposed to update 6 million rows on a table that contains at the very least three times that isn’t gonna happen. So what can be done then to restore the system performance and to allow the IndexAgent to restart properly? DQL isn’t very good for processing of huge batches and therefore, your best bet would be to go to the Database directly to avoid the Documentum overhead. Instead of executing one single SQL command to update the 6 million of items, you should also split it in smaller batches by adding a WHERE clause on the date for example. That would help tremendously and that’s not something that the IndexAgent can do by itself because it has no idea of when things started to go south… So then, which kind of command should be executed? In this case, I wouldn’t recommend to do what the IndexAgent is doing. If you are simply resetting the status from acquired to awaiting, sure the IndexAgent will be able to start but it will still have 6+9 million items awaiting for processing and therefore, you still have bad performance and you have a pretty high probability that the number of acquired will rise again… Therefore, the only reasonable choice is to export all distinct items from the dmi_queue_item table and then clean/remove all FT items. With some luck, you might have 5 or 10 duplicates for each document so instead of indexing 15 million, it would just be 1 or 2 million (distinct).
An example of SQL command to cleanup all the items on a 1 hour timeframe would be for Oracle (I would suggest to make sure the IA isn’t running when messing with the table):
DELETE dmi_queue_item_s WHERE name='dm_fulltext_index_user' AND delete_flag=0 AND date_sent>=to_date('2020-06-28 22:00:00','YYYY-MM-DD HH24:MI:SS') AND date_sent<to_date('2020-06-28 23:00:00','YYYY-MM-DD HH24:MI:SS'); commit;
This cleanup can be done online without issue, just make sure you take an export of all distinct item_id to re-index afterwards, otherwise you will have to execute the FT Integrity utility to find the missing documents in the index. With parallel execution on several DB sessions, the cleanup can actually be done rather quickly and then it’s just background processing for the index via the ids.txt for example.