Most Application Servers allow for long and/or stuck threads detection, to let you know that something is currently very slow, waiting on something or just simply stuck. On tomcat, this isn’t enabled by default, and you need to manually configure it as needed. But once done, it will start displaying information into the logs that can be used to find some patterns, some issues on your environments/application. In this blog, I will use a Documentum D2 application hosted on Tomcat. As you probably know, Documentum and D2 are quite the complex software, an out-of-the-box WAR file of D2 will be around 1.1Gb once exploded, so you can imagine there quite a bit of things happening inside.

First of all, the configuration of Tomcat. To be able to display information on the long running threads, you will need to add the corresponding Tomcat Valve into the “server.xml” file. As far as I know, the configuration for this didn’t change recently and it’s pretty well documented as everything else with Apache. There are two optional parameters for this valve:

  • threshold: Duration in seconds after which a thread will be considered “stuck” and will be tracked in the logs, default to 600s (10min)
  • interruptThreadThreshold: Duration in seconds after which Tomcat will try to stop the thread, default to -1 (disabled)

Therefore, to enable it, you simply need to add the valve as done below and then restart Tomcat:

[tomcat@d2-0 ~]$ cd $CATALINA_HOME/conf/
[tomcat@d2-0 conf]$
[tomcat@d2-0 conf]$ cat server.xml
<?xml version="1.0" encoding="UTF-8"?>
...
<Server port="8005" shutdown="b018b3d58bdbb490a1b2baed3c2efaee">
  ...
  <Service name="Catalina">
    ...
    <Engine name="Catalina" defaultHost="localhost">
      ...
      <Host name="localhost"  appBase="webapps" unpackWARs="true" autoDeploy="true">
        ...
        <Valve className="org.apache.catalina.valves.StuckThreadDetectionValve"
               threshold="120" />
      </Host>
    </Engine>
  </Service>
</Server>
[tomcat@d2-0 conf]$
[tomcat@d2-0 conf]$ stopTomcat
  **
  **  Apache Tomcat is running with PID: 4210
  **
Using CATALINA_BASE:   $CATALINA_HOME
Using CATALINA_HOME:   $CATALINA_HOME
Using CATALINA_TMPDIR: $DATA_HOME/temp
Using JRE_HOME:        $JAVA_HOME
Using CLASSPATH:       $CATALINA_HOME/bin/bootstrap.jar:$CATALINA_HOME/bin/tomcat-juli.jar
Using CATALINA_OPTS:
Using CATALINA_PID:    $DATA_HOME/tomcat.pid
NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
Tomcat stopped.
  **
  **  Apache Tomcat is shutdown
  **
[tomcat@d2-0 conf]$
[tomcat@d2-0 conf]$ startTomcat
  **
  **  Apache Tomcat is shutdown
  **
Cleanup of app_data for "D2"

Using CATALINA_BASE:   $CATALINA_HOME
Using CATALINA_HOME:   $CATALINA_HOME
Using CATALINA_TMPDIR: $DATA_HOME/temp
Using JRE_HOME:        $JAVA_HOME
Using CLASSPATH:       $CATALINA_HOME/bin/bootstrap.jar:$CATALINA_HOME/bin/tomcat-juli.jar
Using CATALINA_OPTS:
Using CATALINA_PID:    $DATA_HOME/tomcat.pid
Tomcat started.
  **
  **  Apache Tomcat is running with PID: 8426
  **
[tomcat@d2-0 conf]$

Now that messages are being printed on the logs, it’s a first step but it might take some time to analyze these. For that purpose, you can write a parser to find all messages and extract the needed information. Of course, such tool would be sensitive to the log format (length of date/timestamp, number of fields you integrated in the logs, etc.). Here is an example of a thread being detected as stuck (“threshold” of 120s) and its successful completion:

[tomcat@d2-0 conf]$ cd $CATALINA_HOME/logs/
[tomcat@d2-0 logs]$
[tomcat@d2-0 logs]$ grep notifyStuckThread catalina.out
...
2023-12-13 10:03:21,079 UTC WARNING [Catalina-utility-2] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [https-jsse-nio-8080-exec-417] (id=[27927]) has been active for [122,196] milliseconds (since [12/13/23 10:01 AM]) to serve the same request for [https://dctm-env1.domain.com/D2/x3_portal/searchManagerService] and may be stuck (configured threshold for this StuckThreadDetectionValve is [120] seconds). There is/are [2] thread(s) in total that are monitored by this Valve and may be stuck.
...
2023-12-13 10:03:31,090 UTC WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread [https-jsse-nio-8080-exec-417] (id=[27927]) was previously reported to be stuck but has completed. It was active for approximately [127,144] milliseconds. There is/are still [1] thread(s) that are monitored by this Valve and may be stuck.
...
[tomcat@d2-0 logs]$

On the above example, the thread started around 10:01:19, was detected as stuck 122s later (10:03:21) and completed after 127s (10:03:26). It was related to a search request in D2 and at the time of the detection, there was already another thread stuck (2 threads stuck in the “notifyStuckThreadDetected“). When it completed, there was still another thread stuck (1 thread stuck in the “notifyStuckThreadCompleted“). That’s the kind of information you can get from the Tomcat logs.

Here is an example of parser that will work with this logging format, it’s using “awk” to read all the lines, find the “notifyStuckThreadDetected” occurrences, find the associated “notifyStuckThreadCompleted” occurrences and then display all the needed information in a summary table. In case the thread is still running (didn’t complete yet), it will also mark it as such:

[tomcat@d2-0 logs]$ cd
[tomcat@d2-0 ~]$
[tomcat@d2-0 ~]$ cat parse_stuck.awk
BEGIN {
  print "============================================";
  print "========== COMPLETED LONG THREADS ==========";
  print "============================================";
  printf("%-25s\t%20s\t%12s\t%10s\t%12s\t%3s\t%-s\n","THREAD","STIME","DTIME","DURATION","CTIME","NUM","REQUEST");
}

/./ {
  sub("UTC","");
}

/org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected/ {
  thread=substr($7,2,length($7)-2);
  dtime[thread]=$2;
  tmp=substr($0,index($0,"since")+6);
  stime[thread]=substr(tmp,1,index(tmp,")")-1);
  tmp=substr($0,index($0,"active for ")+11);
  duration[thread]=substr(tmp,1,index(tmp," ")-1);
  tmp=substr($0,index($0,"request for ")+12);
  request[thread]=substr(tmp,1,index(tmp," ")-1);
  tmp=substr($0,index($0,"is/are ")+7);
  count_stuck[thread]=substr(tmp,1,index(tmp," ")-1);
  ended[thread]=0;
}

/org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted/ {
  thread=substr($7,2,length($7)-2);
  tmp=substr($0,index($0,"approximately ")+14);
  duration[thread]=substr(tmp,1,index(tmp," ")-1);
  ctime[thread]=$2;
  ended[thread]=1;
  tmp=substr($0,index($0,"is/are still [")+14);
  nb_stuck[thread]=substr(tmp,1,index(tmp,"]")-1);

  printf("%-25s\t%20s\t%s\t%10s\t%s\t%3d\t%-s\n",thread,stime[thread],dtime[thread],duration[thread],ctime[thread],nb_stuck[thread],request[thread]);

  dtime[thread]="";
  ctime[thread]="";
  stime[thread]=="";
  duration[thread]=="";
  request[thread]=="";
  count_stuck[thread]=="";
  nb_stuck[thread]=="";
  ended[thread]=="";
}

END {
  print "";
  print "============================================";
  print "======= CURRENTLY STUCK LONG THREADS =======";
  print "============================================";
  printf("%-25s\t%20s\t%12s\t%10s\t%-s\n","THREAD","STIME","DTIME","DURATION","REQUEST");
  for ( i in ended ) {
    if (ended[i]==0) {
      printf("%-25s\t%20s\t%s\t%10s\t%-s\n",i,stime[i],dtime[i],duration[i],request[i]);
    }
  }
}
[tomcat@d2-0 ~]$

The execution of this small parser is quite simple, here are two examples:

[tomcat@d2-0 ~]$ cat $CATALINA_HOME/logs/catalina.out | awk -f parse_stuck.awk
============================================
========== COMPLETED LONG THREADS ==========
============================================
THREAD                                         STIME           DTIME      DURATION             CTIME    NUM     REQUEST
https-jsse-nio-8080-exec-420     [12/14/23, 4:53 AM]    04:55:44,045     [228,883]      04:57:24,983      0     [https://dctm-env1.domain.com/D2/servlet/ExportContent?uid=...]
https-jsse-nio-8080-exec-463     [12/14/23, 5:03 AM]    05:05:36,427     [240,259]      05:07:31,980      0     [https://dctm-env1.domain.com/D2/x3_portal/searchManagerService]
https-jsse-nio-8080-exec-502     [12/14/23, 5:34 AM]    05:36:34,645     [185,688]      05:37:38,489      0     [https://dctm-env1.domain.com/D2/servlet/ExportContent?uid=...]
https-jsse-nio-8080-exec-355     [12/14/23, 7:41 AM]    07:43:48,877     [165,236]      07:44:38,900      0     [https://dctm-env1.domain.com/D2/x3_portal/doclistService]

============================================
======= CURRENTLY STUCK LONG THREADS =======
============================================
THREAD                                         STIME           DTIME      DURATION      REQUEST
https-jsse-nio-8080-exec-355    [12/14/23, 11:49 AM]    11:51:22,124     [121,961]      [https://dctm-env1.domain.com/D2/x3_portal/doclistService]
[tomcat@d2-0 ~]$
[tomcat@d2-0 ~]$ cat $CATALINA_HOME/logs/catalina.out | awk -f parse_stuck.awk
============================================
========== COMPLETED LONG THREADS ==========
============================================
THREAD                                         STIME           DTIME      DURATION             CTIME    NUM     REQUEST
https-jsse-nio-8080-exec-33       [12/6/23, 2:37 PM]    14:39:47,521     [280,031]      14:42:27,965      0     [https://dctm-env2.domain.com/D2/x3_portal/checkoutDocumentService]
https-jsse-nio-8080-exec-21       [12/6/23, 4:36 PM]    16:38:45,420     [268,730]      16:41:15,626      0     [https://dctm-env2.domain.com/D2/x3_portal/searchManagerService]
https-jsse-nio-8080-exec-29      [12/12/23, 9:26 PM]    21:28:57,339     [154,656]      21:29:27,356      1     [https://dctm-env2.domain.com/D2/x3_portal/propertiesService]
https-jsse-nio-8080-exec-33      [12/12/23, 9:26 PM]    21:28:47,332     [168,044]      21:29:27,357      0     [https://dctm-env2.domain.com/D2/x3_portal/propertiesService]

============================================
======= CURRENTLY STUCK LONG THREADS =======
============================================
THREAD                                         STIME           DTIME      DURATION      REQUEST
[tomcat@d2-0 ~]$

The different columns are the following ones:

  • THREAD: Name of the thread, can be used to find the start/end of the request
  • STIME: Start date/time of the thread
  • DTIME: Detection time of the thread (~120s after STIME)
  • DURATION: Total duration of the thread
  • CTIME: Completion time of the thread (~DURATIONs after STIME)
  • NUM: Number of threads still stuck
  • REQUEST: URL used that triggered the thread

That concludes this blog, hopefully you find this information useful to debug/improve your applications. Don’t hesitate to change the threshold of 120 seconds in the Tomcat configuration to something relevant for your use-case.