Blog - comments

Hi Perry, The spare CPU power is of course available for other processes ( i.e non-caged database)! ...
When I studied Oracle New Feature Guide "Media Failure: PDB SYSTEM Data File" , I was surprised tha...
Hayat Khan

Really a nice article to study.

Thanks,

Amol Bhoite

Bonjour,

Tout d'abord merci pour cet article. J'aimerai savoir si ACFS est gratuit ?

Chris

Chris
Hi Jerome.. I have a question. Suppose if we have 8 CPUs on server and we assign 2 CPUs to a databas...
Perry Kahlon
Blog Michael Schwalm Oracle RAC 11.2.0.2: A disturbing loop in a "ohasd" startup script

dbi services Blog

Welcome to the dbi services Blog! This blog focuses on IT infrastructure - featuring news, troubleshooting, and tips & tricks. It covers database, middleware, and OS technologies such as Oracle, Microsoft SQL Server, Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on the postings!

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle RAC 11.2.0.2: A disturbing loop in a "ohasd" startup script

Last February, I performed an operating system rolling upgrade on a four-nodes RAC cluster (11.2.0.2). I then faced a strange problem when restarting the operating system...

The first step of the procedure was to stop all Grid Infrastructure and Database services running on the first node as well as to disable Cluster and ASM autostart. The following command is supposed to prevent Oracle High Availability Service (OHAS) to be run at operating system startup:
 

# crsctl disable crs

 
Then, I powered off the server and asked the storage administrator to disable all LUNs attached to this server, including the one containing Oracle binaries (/u00).

This step was necessary because the storage bay requires specific drivers which are not shipped with the Linux installation media. Without drivers, all mountpoints attached to LUNs and detected during the upgrade process would have displayed errors.

However, when starting the server again to check that all mountpoints were disabled, I saw that the startup procedure was blocked to the OHASD service, preventing the server to finish the startup.

In fact, even if crs autostart is disabled, a deamon called "ohasd" is still running at server startup. Among other things, it checks and indefinitively waits for the presence of CRS binaries. No luck, the LUN attached to the mountpoint containg CRS binaries was disabled...

We can see this check in /etc/init.d/ohasd file:

 

# Wait until it is safe to start CRS daemons
  while [ ! -r $CRSCTL ]
  do
    $LOGMSG "Waiting for filesystem containing $CRSCTL."
    $SLEEP $DEP_CHECK_WAIT
  done

Where $CRSCTL corresponds to /u00/app/11.2.0/grid/bin/crsctl

 

What is crazy is that the loop is performed no matter if autostart is enabled or not. Just after the loop is done, the script checks if autostart is enabled - thanks to the file /etc/oracle/scls_scr/$MY_HOST/root/ohasdstr, which contains "enable" or "disable" depending of the autostart configuration.

Why do not check if autostart is enabled before looking for CRS binaries? A question that Oracle does not seem to have answered in 11.2.0.3, because we can see that, even if the ohasd startup mecanism was updated, ohasd is still waiting for CRS binaries:

 

# Wait until it is safe to start CRS daemons.
  # Wait for 10 minutes for filesystem to mount
  # Print message to syslog and console
  works=true
  for minutes in 10 9 8 7 6 5 4 3 2 1
  do
    if [ ! -r $CRSCTL ]
    then
      works=false
      log_console "Waiting $minutes minutes for filesystem containing $CRSCTL."
      $SLEEP $DEP_CHECK_WAIT
    else
      works=true
      break
    fi
  done

 

As you can see, in 11.2.0.3, the server will now finish to start, but will be waiting 10 minutes. A message is displayed in the log startup:

 

Apr 30 15:31:29 rac1 logger: Waiting 10 minutes for filesystem containing /u00/app/11.2.0/grid/bin/crsctl.
Apr 30 15:32:29 rac1 logger: Waiting 9 minutes for filesystem containing /u00/app/11.2.0/grid/bin/crsctl.
Apr 30 15:33:29 rac1 logger: Waiting 8 minutes for filesystem containing /u00/app/11.2.0/grid/bin/crsctl.
Apr 30 15:34:29 rac1 logger: Waiting 7 minutes for filesystem containing /u00/app/11.2.0/grid/bin/crsctl.
[...]

 

At this time, the workaround I found to allow the server to start immediatly is to prevent "ohasd" service to run by renaming or moving the /etc/init.d/ohasd file, or to comment the loop section in order to skip the infinite loop.

Hopefully, SSH deamon (if enabled) runs before OHAS deamon. It is possible, if the startup procedure is blocked, to access the machine through an SSH session in order to apply this workaround and to restart the server.

I finally added this pre-requisite to the upgrade procedure for the remaining nodes of the cluster.

Rate this blog entry:
1

Michael Schwalm is Consultant at dbi Services and has more than two years of experience in Oracle database administration. He has a broad knowledge in the realization of virtualization infrastructures such as vMware vSphere. He took his first steps in database administration as an integrator of a web applications on Unix, Oracle, and Websphere environments. Michael Schwalm is Oracle Certified Professional 11g. Prior to joining dbi services, Michael Schwalm was application administrator at SOGETI Est (F) on behalf of PSA Peugeot Citroen and responsible for the realization and managing of Unix environments and Oracle databases in the context of migration projects. Michael Schwalm holds a BTS diploma in Information System Management from Belfort (F) and a TSAR diploma in advanced network administration from Strasbourg (F). His branch-related experience covers Automotive, Software industry, Financial Services / Banking, etc.

Comments

  • No comments made yet. Be the first to submit a comment

Leave your comment

Guest Friday, 25 April 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)
NewsOfficesContact

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter