During some monitoring work on a WebLogic 12.1.3 environment, I noticed that top shows the process 12013 using 800% CPU (We have a 12 CPU machine) and had to find out what’s happening !!
I used top to check the process ID and then the threads taking the CPU
ps -ef |grep 12013 weblogic 12013 11836 23 Apr03 ? 03:38:31 /app/weblogic/Java/jdk/bin/java -server -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dweblogic.Name=msD2-02 -Djava.security.policy=/app/weblogic/Middleware/wlserver/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/weblogic/domains/myDomain/servers/msD2-02/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dcom.sun.xml.ws.api.streaming.XMLStreamReaderFactory.woodstox=true -Dcom.sun.xml.ws.api.streaming.XMLStreamWriterFactory.woodstox=true -Djava.io.tmpdir=/app/weblogic/tmp/myDomain/msD2-02 -Ddomain.home=/app/weblogic/domains/myDomain -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.protocolVersion=TLS1 -Dweblogic.security.disableNullCipher=true -Djava.security.egd=file:///dev/./urandom -Dweblogic.security.allowCryptoJDefaultJCEVerification=true -Dweblogic.nodemanager.ServiceEnabled=true -Djava.endorsed.dirs=/app/weblogic/Java/jdk1.8.0_45/jre/lib/endorsed:/app/weblogic/Middleware/wlserver/../oracle_common/modules/endorsed -da -Dwls.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.management.server=https://10.10.1.1:7001 -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true weblogic.Server
top -H -b -p 12013 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12624 weblogic 20 0 10.5g 3.6g 30m S 83.1 11.5 18053:38 java 2540 weblogic 20 0 10.5g 3.6g 30m S 79.4 11.5 3:11.62 java 2591 weblogic 20 0 10.5g 3.6g 30m S 79.4 11.5 3:08.02 java 12436 weblogic 20 0 10.5g 3.6g 30m S 9.2 11.5 3:14.59 java
Now that I have the Unix threads PID, I will get the thread dump of the WebLogic Server process and try to figure out the culprits. To generate the thread dump I used the jstack java tool.
jstack -l 12013 > $HOME/jstack_100_CPU.txt
The Thread dump provides the threads with the ID in Hexadecimal, thus those provided from the top needs to be converted:
[myDomain]$ printf '%xn' 12624 3150 [myDomain]$ printf '%xn' 2540 9ec [myDomain]$ printf '%xn' 2591 a1f [myDomain]$
"LDAPConnThread-1172 ldaps://ldapserver.dbi-lab.com:3636" #167119 daemon prio=5 os_prio=0 tid=0x00007f5444087800 nid=0xa1f runnable [0x00007f5409e88000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked (a sun.nio.ch.Util$2) - locked (a java.util.Collections$UnmodifiableSet) - locked (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729) at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462) at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424) at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked (a java.io.BufferedInputStream) at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source) at netscape.ldap.LDAPConnThread.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) "[STUCK] ExecuteThread: '302' for queue: 'weblogic.kernel.Default (self-tuning)'" #326 daemon prio=1 os_prio=0 tid=0x00007f5560376000 nid=0x3150 runnable [0x00007f543636e000] java.lang.Thread.State: RUNNABLE at java.io.InputStream.skip(InputStream.java:224) at weblogic.utils.http.HttpChunkInputStream.skip(HttpChunkInputStream.java:287) at weblogic.utils.http.HttpChunkInputStream.skipAllChunk(HttpChunkInputStream.java:497) at weblogic.servlet.internal.ServletInputStreamImpl.ensureChunkedConsumed(ServletInputStreamImpl.java:51) at weblogic.servlet.internal.ServletRequestImpl.skipUnreadBody(ServletRequestImpl.java:217) at weblogic.servlet.internal.ServletRequestImpl.reset(ServletRequestImpl.java:169) at weblogic.servlet.internal.HttpConnectionHandler.prepareRequestForReuse(HttpConnectionHandler.java:258) at weblogic.servlet.internal.HttpConnectionHandler.requeue(HttpConnectionHandler.java:665) at weblogic.servlet.internal.VirtualConnection.requeue(VirtualConnection.java:332) at weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1657) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1582) at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:255) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:311) at weblogic.work.ExecuteThread.run(ExecuteThread.java:263) "LDAPConnThread-1171 ldaps://ldapserver.dbi-lab.com:3636"" #167118 daemon prio=5 os_prio=0 tid=0x00007f5544048000 nid=0x9ec runnable [0x00007f540a28c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method) at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317) at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:193) - locked (a java.lang.Object) at java.nio.channels.spi.AbstractSelector$1.interrupt(AbstractSelector.java:213) at java.nio.channels.spi.AbstractSelector.begin(AbstractSelector.java:219) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:78) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked (a sun.nio.ch.Util$2) - locked (a java.util.Collections$UnmodifiableSet) - locked (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729) at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462) at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424) at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked (a java.io.BufferedInputStream) at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source) at netscape.ldap.LDAPConnThread.run(Unknown Source) at java.lang.Thread.run(Thread.java:745)
This looks to be an issue with the LDAP server or the network.
The WebLogic java threads loops on the LDAP connection to get data. This loop should fail in a timeout depending on the Authentication provider LDAP connection timeout settings. But it looks there is an issue with the JDK that re-initiates the connection in the sun.nio.ch.EPollArrayWrapper class.
Once the LDAP connection was back the WebLogic Domain could be restarted properly and the service was back.
A possible workaround could be to not use the weblogic.socket.NIOSocketMuxer but the weblogic.socket.PosixSocketMuxer. This workaround is described in Oracle Support note:2128032.1
Or a better solution is to upgrade to the latest WebLogic Software version and apply the latest PSU where this issue is fixed.