Blog - comments

Hi goog article can we install avdf firewall with flat network if it's possible please let me know ?

Thilina
First, thank you for your interrest in this blog.Yes, the byte code will be interpreted each time bu...
BIEHLER Stephane
Pretty sure this is wrong:> already said that JVMs interprets the generated byte code - that's true...
Gs
Michael, great article, however, I would disagree on DRS/Host Affinity. You are legally only requir...
David Bradshaw
Hi lauri, db_file_multiblock_read_count is still used in exadata smartscan because it defines the si...
Blog Franck Pachot Oracle EM agent 12c thread leak on RAC

dbi services Blog

Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle EM agent 12c thread leak on RAC

In a previous post about nproc limit, I wrote that I had to investigate the nproc limit with the number of threads because my Oracle 12c EM agent was having thousands of threads. This post is a short feedback about this issue and the way I have found the root cause. It concerns the enterprise manager agent 12c on Grid Infrasctructure >= 11.2.0.2

 

NLWP

The issue was:

 

ps -o nlwp,pid,lwp,args -u oracle | sort -n
NLWP   PID   LWP COMMAND
   1  8444  8444 oracleOPRODP3 (LOCAL=NO)
   1  9397  9397 oracleOPRODP3 (LOCAL=NO)
   1  9542  9542 oracleOPRODP3 (LOCAL=NO)
   1  9803  9803 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/perl/bin/perl /u00/app/oracle/product/agent12c/core/12.1.0.3.0/bin/emwd.pl agent /u00/app/oracle/product/agent12c/agent_inst/sysman/log/emagent.nohup
  19 11966 11966 /u00/app/11.2.0/grid/bin/oraagent.bin
1114  9963  9963 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/jdk/bin/java ... emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain

 

By default ps has only one entry per process, but each processes can have several threads - implemented on linux as light-weight process (LWP). Here, the NLWP column shows that I have 1114 threads for my EM 12c agent - and it was increasing every day until it reached the limit and the node failed ('Resource temporarily unavailable').

The first thing to do is to know what those threads are. The ps entries do not have a lot of information, but I discovered jstack which every java developer should know, I presume. You probably know that java has very verbose (lengthy) stack traces. Jstack was able to show me thousands of them in only one command:

 

Jstack

$ jstack 9963
2014-06-03 13:29:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f3368002000 nid=0x4c9b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CRSeOns" prio=10 tid=0x00007f32c80b6800 nid=0x3863 in Object.wait() [0x00007f31fe11f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at oracle.eons.impl.NotificationQueue.internalDequeue(NotificationQueue.java:278)
	- locked  (a java.lang.Object)
	at oracle.eons.impl.NotificationQueue.dequeue(NotificationQueue.java:255)
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(SubscriberImpl.java:98)
	at oracle.eons.proxy.impl.client.base.SubscriberImpl.receive(SubscriberImpl.java:79)
	at oracle.eons.proxy.impl.client.ProxySubscriber.receive(ProxySubscriber.java:29)
	at oracle.sysman.db.receivelet.eons.EonsMetric.beginSubscription(EonsMetric.java:872)
	at oracle.sysman.db.receivelet.eons.EonsMetricWlm.run(EonsMetricWlm.java:139)
	at oracle.sysman.gcagent.target.interaction.execution.ReceiveletInteractionMgr$3$1.run(ReceiveletInteractionMgr.java:1401)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at oracle.sysman.gcagent.util.system.GCAThread$RunnableWrapper.run(GCAThread.java:184)
	at java.lang.Thread.run(Thread.java:662)
...
 

CRSeOns

I don't paste all of them here. We have the 'main', we have a few GCs and 'Gang workers' which are present in all JVMs and we have a few enterprise manager threads. And what was interesting was that I had thousands of "CRSeOns" that seemed to be increasing.

Some guesses: I'm on RAC, and I have a 'ons' resource and the EM agent tries to subscribe to it. Goggle search returned nothing, and that's the reason I put that in a blog post now. Then I searched MOS, and bingo, there is a note: Doc ID 1486626.1. It has nothing to do with my issue, but has an interesting comment in it:

In cluster version 11.2.0.2 and higher, the ora.eons resource functionality has been moved to EVM. Because of this the ora.eons resource no longer exists or is controlled by crsctl.

It also explains how to disable EM agent subscription:

emctl setproperty agent -name disableEonsRcvlet -value true

I'm in 11.2.0.3 and I have thousands of threads related to a functionality that doesn't exist anymore. And that leads to some failures in my 4 nodes cluster.

The solution was simple: disable it.

For a long time I have seen a lot of memory leaks or CPU usage leaks related to the enterprise manager agent. With this new issue, I discovered a thread leak and I also faced a SR leak when trying to get support for the 'Resource temporarily unavailable' error, going back and forth between OS, Database, Cluster and EM support teams...

Rate this blog entry:
3

I'm a Senior Consultant, and Oracle Technology Leader at dbi services (Switzerland).


Certified DBA (OCM 11g, OCP 12c, Performance Tuning Expert, Exadata Implementation) I cover all database areas: architecture, data modeling, database design, tuning, operation, and training.


My preferred area is troubleshooting oracle and performance tuning, especially when I acheive to enable an efficient collaboration between the developers and the operational team.
Besides this blog, I participate in the Oracle Community in forums, blogs, articles and presentation.

All that is referenced from my twitter account:
 


    O_Database12c_Admin_Professional_clrOCE_ODb11gPerfTun_clr11gocm_logoalt

Comments

  • Guest
    Martin Decker Saturday, 26 July 2014

    Did you succeed in getting support to open a bug for this? Which agent versions are affected?

    Best regards,
    Martin

  • Franck Pachot
    Franck Pachot Sunday, 27 July 2014

    Hi Martin,
    No unfortunately I've no bug opened. SR were closed as resolved by customer.
    Increasing nproc and deactivating subscription to the ora.eons is probably considered as the right fix. My agent was 12.1.0.1.0
    Regards,
    Franck.

Leave your comment

Guest Thursday, 23 October 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter