The Documentum job dm_FTIndexAgentBoot does not start anymore and we have the following job status: The job object indicated the job was in progress. In this blog posting, I will describe how to analyze and solve this issue.
Analysis
The Index Agent does not start automatically when the repository is started although the option start_index_agents = T in server.ini is set.
When the server is started, the dm_FTIndexAgentBoot job is normally executed, but not in our case. All other jobs start well.
The job status indicates:
The job object indicated the job was in progress, but the job was not actually running. It is likely that the dm_agent_exec utility was stopped while the job was in progress
To get more information, we have to start the dm_agent_exec with the trace option is set to true.
- Use DA to change the agent_exec_method method Verb parameter to:
./dm_agent_exec -trace_level 1 - Kill the dm_agent_exec process related to the repository from where the issue is linked:
> ps -ef | grep dm_agent | grep “docbase_name test”
dmadmin 16922 16117 0 14:14 ? 00:00:02 ./dm_agent_exec -docbase_name test.test -docbase_owner dmadmin -sleep_duration 0
> kill 16922. - The dm_agent_exec process will be automatically restarted with the new trace level
> ps -ef | grep dm_agent | grep “docbase_name test”
dmadmin 16486 16117 0 14:07 ? 00:00:02 ./dm_agent_exec -trace_level 1 -docbase_name test.test -docbase_owner dmadmin -sleep_duration 0
Once the trace_level is setted to 1 (true), more information is written into the agent_exec.log file.
By analyzing the log file we can see that the dm_agentexec process checks which job has to be started, by using the following queries:
execquery,s0,F,SELECT ALL r_object_id, a_last_invocation, a_last_completion, a_special_app FROM dm_job WHERE ( ((a_last_invocation IS NOT NULLDATE) AND (a_last_completion IS NULLDATE)) OR ((a_special_app = 'agentexec') AND (r_lock_machine = 'testcs')) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) execquery,s0,F,SELECT ALL r_object_id, a_next_invocation FROM dm_job WHERE ( (run_now = 1) OR ((is_inactive = 0) AND ( ( a_next_invocation date > DATE('now')) OR (expiration_date IS NULLDATE)) AND ((max_iterations = 0) OR (a_iterations < max_iterations))) ) AND (i_is_reference = 0 OR i_is_reference is NULL) AND (i_is_replica = 0 OR i_is_replica is NULL) ORDER BY a_next_invocation, r_object_id
As these queries do not return a result (0 row), the job should have been started. So what is then wrong?
In the log file, the following information is written: Output File Name: /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5
“ls -ltr /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec” does not return any job_08010b9a800003f5 log file
BUT "ls -l job_08010b9a900003f5" returns:
---------- 1 dmadmin dmadmind 245 Aug 27 14:30 /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5
However, the file cannot be written by the agent_exec process due to the wrong file permissions!
Solution
The solution is to set the correct write permissions:
chmod 640 /pkgs/dms/opt/documentum/dba/log/00010b9a/agentexec/job_08010b9a800003f5
Do not forget to disable the trace flag on the agent_exec_method!
After that, the dm_FTIndexAgentBoot job starts well, because the output file can be written.
Of course this solution has to be applied to this specific issue; most important in this procedure, is the use of the trace flag, to check what happens on the agentexec side.