Blog - comments

Hi Greg,

Thanks

great job Dave ! thanks

greg

Nice writeup

yup
using System; using System.Diagnostics; namespace ConsoleApplication1 { class Program { ...
praveen rathore
sql server is one of the best for utilize the data recovery. and its very informative for all the us...
Server instalattion
Blog Franck Pachot Oracle EM agent 12c thread leak on RAC

dbi services Blog

Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle EM agent 12c thread leak on RAC

In a previous post about nproc limit, I wrote that I had to investigate the nproc limit with the number of threads because my Oracle 12c EM agent was having thousands of threads. This post is a short feedback about this issue and the way I have found the root cause. It concerns the enterprise manager agent 12c on Grid Infrasctructure >= 11.2.0.2

 

NLWP

The issue was:

 

ps -o nlwp,pid,lwp,args -u oracle | sort -n
NLWP   PID   LWP COMMAND
   1  8444  8444 oracleOPRODP3 (LOCAL=NO)
   1  9397  9397 oracleOPRODP3 (LOCAL=NO)
   1  9542  9542 oracleOPRODP3 (LOCAL=NO)
   1  9803  9803 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/perl/bin/perl /u00/app/oracle/product/agent12c/core/12.1.0.3.0/bin/emwd.pl agent /u00/app/oracle/product/agent12c/agent_inst/sysman/log/emagent.nohup
  19 11966 11966 /u00/app/11.2.0/grid/bin/oraagent.bin
1114  9963  9963 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/jdk/bin/java ... emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain

 

By default ps has only one entry per process, but each processes can have several threads - implemented on linux as light-weight process (LWP). Here, the NLWP column shows that I have 1114 threads for my EM 12c agent - and it was increasing every day until it reached the limit and the node failed ('Resource temporarily unavailable').

The first thing to do is to know what those threads are. The ps entries do not have a lot of information, but I discovered jstack which every java developer should know, I presume. You probably know that java has very verbose (lengthy) stack traces. Jstack was able to show me thousands of them in only one command:

 

Jstack

$ jstack 9963
2014-06-03 13:29:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f3368002000 nid=0x4c9b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CRSeOns" prio=10 tid=0x00007f32c80b6800 nid=0x3863 in Object.wait() [0x00007f31fe11f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at oracle.eons.impl.NotificationQueue.internalDequeue(NotificationQueue.java:278)
	- locked  (a java.lang.Object)
	at oracle.eons.impl.NotificationQueue.dequeue(NotificationQueue.java:255)
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(SubscriberImpl.java:98)
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(SubscriberImpl.java:79)
	at oracle.eons.proxy.impl.client.ProxySubscriber.receive(ProxySubscriber.java:29)
	at oracle.sysman.db.receivelet.eons.EonsMetric.beginSubscription(EonsMetric.java:872)
	at oracle.sysman.db.receivelet.eons.EonsMetricWlm.run(EonsMetricWlm.java:139)
	at oracle.sysman.gcagent.target.interaction.execution.ReceiveletInteractionMgr$3$1.run(ReceiveletInteractionMgr.java:1401)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at oracle.sysman.gcagent.util.system.GCAThread$RunnableWrapper.run(GCAThread.java:184)
	at java.lang.Thread.run(Thread.java:662)
...
 

CRSeOns

I don't paste all of them here. We have the 'main', we have a few GCs and 'Gang workers' which are present in all JVMs and we have a few enterprise manager threads. And what was interesting was that I had thousands of "CRSeOns" that seemed to be increasing.

Some guesses: I'm on RAC, and I have a 'ons' resource and the EM agent tries to subscribe to it. Goggle search returned nothing, and that's the reason I put that in a blog post now. Then I searched MOS, and bingo, there is a note: Doc ID 1486626.1. It has nothing to do with my issue, but has an interesting comment in it:

In cluster version 11.2.0.2 and higher, the ora.eons resource functionality has been moved to EVM. Because of this the ora.eons resource no longer exists or is controlled by crsctl.

It also explains how to disable EM agent subscription:

emctl setproperty agent -name disableEonsRcvlet -value true

I'm in 11.2.0.3 and I have thousands of threads related to a functionality that doesn't exist anymore. And that leads to some failures in my 4 nodes cluster.

The solution was simple: disable it.

For a long time I have seen a lot of memory leaks or CPU usage leaks related to the enterprise manager agent. With this new issue, I discovered a thread leak and I also faced a SR leak when trying to get support for the 'Resource temporarily unavailable' error, going back and forth between OS, Database, Cluster and EM support teams...

Rate this blog entry:
3

Franck Pachot is Consultant at dbi services. He has 20 years of experience in Oracle databases. Through his expertise as a DBA, Oracle expert, data architect, and performance specialist, he is able to cover all database areas: architecture, data modeling, database design, tuning, operation, and training. Franck Pachot knows how to enable an efficient collaboration between the developers and the operational team when it comes to troubleshooting issues or performance tuning. He has passed the OCP certifications from 8i to 12c, is also Certified Expert for Oracle Database 11g Performance Tuning, and now achived the highest level of certification: Oracle Master Certified OCM 11g. Prior to joining dbi services, Franck Pachot was Oracle Consultant at Trivadis in Lausanne. Previously, he worked in several countries and environements, always as a consultant. Franck Pachot holds a Master of Business Informatics from the University of Paris-Sud. His branch-related experience covers Financial Services / Banking, Public Sector, Food, Transport and Logistics, Pharma, etc.


    O_Database12c_Admin_Professional_clrOCE_ODb11gPerfTun_clr11gocm_logo

Comments

  • Guest
    Martin Decker Saturday, 26 July 2014

    Did you succeed in getting support to open a bug for this? Which agent versions are affected?

    Best regards,
    Martin

  • Franck Pachot
    Franck Pachot Sunday, 27 July 2014

    Hi Martin,
    No unfortunately I've no bug opened. SR were closed as resolved by customer.
    Increasing nproc and deactivating subscription to the ora.eons is probably considered as the right fix. My agent was 12.1.0.1.0
    Regards,
    Franck.

Leave your comment

Guest Friday, 19 September 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter