Blog - comments

Hi Helmy, If you use the GUI to create the availability group, it requires you have the same path / ...

could you please share the needed class files (or the D2.jar) it seems, this class is missing

me
but when i am gonna create an AG the following error shown >> the following folder locations do not ...
Helmy Mohamed
Nice one! I like the outfit of the characters. Wish i could do the same thing too but im not that te...
utah

Can someone please forward me all the netbackup commands with detail information I want CLI.

Shekhar D
Blog Franck Pachot Oracle is hanging? Don't forget hanganalyze and systemstate!

dbi services Blog

Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle is hanging? Don't forget hanganalyze and systemstate!

sqlplus / as sysdba
 oradebug setmypid
 oradebug unlimit
 oradebug hanganalyze 3
 oradebug dump ashdumpseconds 30
 oradebug systemstate 266
 oradebug tracefile_name

Your Oracle database - production DB, of course - is hanging. All users are blocked. You quickly check the obvious suspects (archivelog destination full, system swapping, etc.) but it's something else. Even you, the Oracle DBA, cannot do anything: any select is hanging. And maybe you're even not able to connect with a simple 'sqlplus / as sysdba'.

What do you do ? There may be several ways to investigate deeper (strace or truss for example) but it will take time. And your boss is clear: the only important thing is to get the production running again as soon as possible. No time to investigate. SHUTDOWN ABORT and restart.

Ok, but now that everything is back to normal, your boss rules has changed: the system was down for 15 minutes. We have to provide an explanation. Root Cause Analysis.

But how will you investigate now ? You have restarted everything, so all V$ information is gone. You have Diagnostic Pack ? But the system was hanged: no ASH information went to disk. You can open an SR but what information will you give?

Hang Analyze

The next time it happens, you need to have a way to get some information that can be analyzed post mortem. But you need to be able to do that very quickly just before your boss shouts 'shutdown abort now'. And this is why I've put it at the begining of the post, so that you can find it quickly if you need it...

That takes only a few seconds to generate all post-mortem necessary information. If you can take 1 more minute, you will even be able to read the first lines of hanganalyze output, and you will be able to identify a true hanging situation and maybe just kill the root of the blocking sessions instead of a merciless restart.

 

In order to show you the kind of output you get, I've run a few jobs locking the same resources (TM locks) - which is not a true hanging situation because the blocking session can resolve the situation.

Here is the first lines from the oradebug hanganalyze:

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'PL/SQL lock timer'Systemstate has all information about System Objects (sessions, processes, ...) but you have to navigate into it in order to understand the wait chain. In my example:
SO: 0x914ada70, type: 4, owner: 0x91990478, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x91990478, name=session, file=ksu.h LINE:13580, pg=0 conuid=0
(session) sid: 23 ser: 7 trans: 0x8ea8e3e8, creator: 0x91990478
...
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 7929
machine: vboxora12c program: oracle@vboxora12c (J002)
Current Wait Stack:
 0: waiting for 'enq: TM - contention'
    name|mode=0x544d0003, object #=0x1737c, table/partition=0x0
    wait_id=10 seq_num=11 snap_id=1
    wait times: snap=15.991474 sec, exc=15.991474 sec, total=15.991474 sec
    wait times: max=40.000000 sec, heur=15.991474 sec
    wait counts: calls=6 os=6
    in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
  inst: 1, sid: 254, ser: 5
Dumping final blocker: inst: 1, sid: 256, ser: 5
This is a session that is waiting, and we have the final blocker: inst: 1, sid: 256, ser: 5

Then we get to the final blocker by searching the sid: 256:

SO: 0x9168a408, type: 4, owner: 0x9198d058, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x9198d058, name=session, file=ksu.h LINE:13580, pg=0 conuid=0
(session) sid: 256 ser: 5 trans: 0x8ea6b618, creator: 0x9198d058
...
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 7925
machine: vboxora12c program: oracle@vboxora12c (J000)
Current Wait Stack:
 0: waiting for 'PL/SQL lock timer'
    duration=0x0, =0x0, =0x0
    wait_id=0 seq_num=1 snap_id=1
    wait times: snap=25.936165 sec, exc=25.936165 sec, total=25.936165 sec
    wait times: max=50.000000 sec, heur=25.936165 sec
    wait counts: calls=1 os=9
    in_wait=1 iflags=0x5a0
There are 5 sessions blocked by this session.
Dumping one waiter:
  inst: 1, sid: 254, ser: 5
  wait event: 'enq: TM - contention'
    p1: 'name|mode'=0x544d0004
    p2: 'object #'=0x1737c
    p3: 'table/partition'=0x0
  row_wait_obj#: 95100, block#: 0, row#: 0, file# 0
  min_blocked_time: 19 secs, waiter_cache_ver: 44

Analysing the System State takes much longer than the hanganalyze, but it has more information.

V$WAIT_CHAINS

When the blocking situation is not so desesperate, but you just want to see what is blocking, the hanganalyze information is also available online in V$WAIT_CHAINS. The advantage over ASH is that you see all processes (not only foreground, not only active ones).

Here is an example:

CHAIN_ID CHAIN CHAIN_SIGNATURE INSTANCE OSID PID SID BLOCK
1 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7929 42 23 TRUE
1 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7927 41 254 TRUE
1 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7925 39 256 FALSE
2 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7933 46 25 TRUE
3 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7931 45 260 TRUE
4 FALSE 'PL/SQL lock timer' <='enq: TM - contention' <='enq: TM - contention' 1 7935 47 262 TRUE

ASH Dump

There is something else that you can get if you have Diagnostic Pack. The ASH information can be dumped to trace file even if it cannot be collected in the database.

oradebug dump ashdumpseconds 30

that will gather ASH from latest 30 seconds, and the trace file will even have the sqlldr ctl file to load it in an ASH like table.

sqlplus -prelim

But what can you do if you can't even connect / as sysdba ? There is the 'preliminary connection' that does not create a session: sqlplus -prelim / as sysdba
With that you will be able to get a systemstate. You will be able to get a ashdump. But unfortunately, since 11.2.0.2 you cannot get a hanganalyze:

ERROR: Can not perform hang analysis dump without a process state object and a session state object.

But there is a workaround for that (from Tanel Poders's blog): try to use a session that is already connected.
For exemple I use the DIAG background process (it's better not to use vital processes for that)

SQL> oradebug setorapname diag
Oracle pid: 8, Unix process pid: 7805, image: oracle@vboxora12c (DIAG)

Core message

Even in hurry,
  • Always check an hanganalyze to understand the problem.
  • Always get a systemstate before a shutdown abort.
and you will have information to investigate later, or to provide to Oracle Support.
Rate this blog entry:
2

Franck Pachot is Consultant at dbi services. He has 20 years of experience in Oracle databases. Through his expertise as a DBA, Oracle expert, data architect, and performance specialist, he is able to cover all database areas: architecture, data modeling, database design, tuning, operation, and training. Franck Pachot knows how to enable an efficient collaboration between the developers and the operational team when it comes to troubleshooting issues or performance tuning. He has passed the OCP certifications from 8i to 12c, is also Certified Expert for Oracle Database 11g Performance Tuning, and now achived the highest level of certification: Oracle Master Certified OCM 11g. Prior to joining dbi services, Franck Pachot was Oracle Consultant at Trivadis in Lausanne. Previously, he worked in several countries and environements, always as a consultant. Franck Pachot holds a Master of Business Informatics from the University of Paris-Sud. His branch-related experience covers Financial Services / Banking, Public Sector, Food, Transport and Logistics, Pharma, etc.


    O_Database12c_Admin_Professional_clrOCE_ODb11gPerfTun_clr11gocm_logo

Comments

  • No comments made yet. Be the first to submit a comment

Leave your comment

Guest Monday, 01 September 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter