nmon (nmon is short for Nigel’s performance Monitor for Linux on POWER, x86, x86_64, Mainframe & now ARM (Raspberry Pi)) is a nice tool to monitor a Linux system. Originally it came from the AIX-platform, where it is very popular. I asked myself if I can add nmon to the OSWatcher framework to automatically gather nmon data. It is actually possible. Here’s what I did:

First I installed nmon:

In my case (Oracle Enterprise Linux 7.9. downloaded as part of an Oracle 19c DB vagrant) I had the EPEL repository:


[root@oracle-19c6-vagrant ~]# yum search nmon
============================================================ N/S matched: nmon ============================================================
conmon.x86_64 : OCI container runtime monitor
nmon.x86_64 : Nigel's performance Monitor for Linux
xfce4-genmon-plugin.x86_64 : Generic monitor plugin for the Xfce panel

  Name and summary matches only, use "search all" for everything.
[root@oracle-19c6-vagrant ~]# yum install nmon.x86_64
Resolving Dependencies
--> Running transaction check
---> Package nmon.x86_64 0:16g-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================================================================
 Package                    Arch                         Version                            Repository                                Size
===========================================================================================================================================
Installing:
 nmon                       x86_64                       16g-3.el7                          ol7_developer_EPEL                        69 k

Transaction Summary
===========================================================================================================================================
Install  1 Package

Total download size: 69 k
Installed size: 156 k
Is this ok [y/d/N]: y
Downloading packages:
nmon-16g-3.el7.x86_64.rpm                                                                                           |  69 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : nmon-16g-3.el7.x86_64                                                                                                   1/1 
  Verifying  : nmon-16g-3.el7.x86_64                                                                                                   1/1 

Installed:
  nmon.x86_64 0:16g-3.el7                                                                                                                  

Complete!
[root@oracle-19c6-vagrant ~]# which nmon
/bin/nmon
[root@oracle-19c6-vagrant ~]#

REMARK 1: You may download it from here as well.
REMARK 2: I do recommend to download nmonchart as well, so that nice html-files can be generated.

The difficulty with nmon is, that it does not produce output like vmstat by showing an additional line each time. You run it interactively like e.g. top. Alternatively you can produce nmon-files, which can then be used as a source for nmonchart. Here’s an example of an nmon-command to produce a nmon-file:


oracle@oracle-19c6-vagrant:/home/oracle/ [rdbms19] nmon -t -f -s 2 -c 10
oracle@oracle-19c6-vagrant:/home/oracle/ [rdbms19] ps -ef | grep nmon
oracle    5929     1  0 13:57 pts/1    00:00:00 nmon -t -f -s 2 -c 10
oracle@oracle-19c6-vagrant:/home/oracle/ [rdbms19] ls -l *.nmon
-rw-r--r--. 1 oracle oinstall 37793 Mar 23 13:57 oracle-19c6-vagrant_210323_1357.nmon

The command-prompt returns immediately and nmon continues in the background. In the example above I create a snapshot every 2 secs and do 10 snapshots. I.e. nmon would run in the background for 20 seconds here.

The produced file-name follows a naming convention: <host-name>_<YYMMDD>_<HH24MI>.nmon

Fortunately nmon also provides the option “-F” to use a file-name I choose:


oracle@oracle-19c6-vagrant:/home/oracle/ [rdbms19] nmon -t -f -s 2 -c 10 -F ./my_own_nmon_file.dat
oracle@oracle-19c6-vagrant:/home/oracle/ [rdbms19] ls -l my_own_nmon_file.dat 
-rw-r--r--. 1 oracle oinstall 34752 Mar 23 13:58 my_own_nmon_file.dat

To add custom data collections to OSWatcher you have to provide a shell-script and add an entry to the file extras.txt in the OSWatcher home directory (in my case /home/oracle/tools/OSWatcher/oswbb).

An example for an additional data collector has been provided in My Oracle Support Note
How to extend OSW to monitor PeopleSoft domains (Doc ID 1531211.1)

First of all I created my shell-script nmonoswbb.sh :


#!/bin/bash

# get the time between snapshots in secs fron the running OSWatcher script.
TIME_BETWEEN_SNAPSHOTS=$(ps -ef | grep OSWatcher.sh | grep -v grep | tr -s " " | cut -d " " -f10)

# in case OSWatcher.sh was started without parameter I set the TIME_BETWEEN_SNAPSHOTS to the default of 30 secs.
if [ -z "$TIME_BETWEEN_SNAPSHOTS" ]
then
   TIME_BETWEEN_SNAPSHOTS=30
fi

# snapshots we will do until the end of the hour
# this is calculated as follows: seconds_left_this_hour / TIME_BETWEEN_SNAPSHOTS
SNAPSHOTS=$(((3600-(($(date +"%M" | sed 's/^0*//')*60)+$(date +"%S" | sed 's/^0*//')))/${TIME_BETWEEN_SNAPSHOTS}))

OSWBB_ARCHIVE_FILE=$1

if [ ! -f "$OSWBB_ARCHIVE_FILE" ]
then
   echo "zzz ***"`date '+%a %b %e %T %Z %Y'` >> $1
fi

LINES_IN_ARCHIVE=$(cat $OSWBB_ARCHIVE_FILE | wc -l)

if [ $LINES_IN_ARCHIVE -lt 3 ]
then
   # start a new nmon file
   rm -f $OSWBB_ARCHIVE_FILE
   /bin/nmon -t -f -s $TIME_BETWEEN_SNAPSHOTS -c $SNAPSHOTS -F $OSWBB_ARCHIVE_FILE
fi

What I’m doing here is to calculate how many snapshots I have to take to gather nmon-data for the remaining of the current hour. And finally I do start nmon if the archive file (provided from OSWatcher through the parameter $1) has less than 3 lines. At the top of the hour OSWatcher produces an archive-file which initially contains 1 line:


Linux OSWbb v8.4.0

If the file is new (has 1 line) or is not available, I’m starting nmon for the next hour. I.e. in the script above I’m doing an nmon-snapshot every e.g. 60 seconds (TIME_BETWEEN_SNAPSHOTS). I.e. the number of snapshots to do is then calculated as

(3600 – seconds_happened_this_hour) / TIME_BETWEEN_SNAPSHOTS
E.g. if the script was initially started at 14:01:20 then we do have

(3600 – 80)/60 = 58

I.e. nmon will do 58 snapshots (1 every minute) in the remaining 58.5 minutes until 15:00.

Finally we just need the file extras.txt which looks as follows:


# File format is as follows...
# shell_script name directory_name
# where shell_script = name of shell to execute
# name = name of this program
# directory_name is directory name under the archive
nmonoswbb.sh nmon cus_nmon

That means OSWatcher will start nmonoswbb.sh every cycle and put the archives in a folder cus_nmon.
I.e. let’s start OSWatcher to keep the data for 5 hours and snapshot every 20 seconds as follows:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ./startOSWbb.sh 20 5 gzip
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] Info...Zip option IS specified. 
Info...OSW will use gzip to compress files.
Setting the archive log directory to/home/oracle/tools/OSWatcher/oswbb/archive

Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
IP found on your system.
TOP found on your system.
Warning... /proc/slabinfo not found on your system. Check to see if this user has permission to access this file.
PIDSTAT found on your system.
NFSIOSTAT found on your system.
Warning... TRACEROUTE not found on your system. No TRACEROUTE data will be collected.

Discovery of CPU CORE COUNT
CPU CORE COUNT will be used by oswbba to automatically look for cpu problems

CPU CORE COUNT = 4
VCPUS/THREADS = 4

Discovery completed.

Starting OSWatcher v8.4.0  on Tue Mar 23 16:47:56 +01 2021
With SnapshotInterval = 20
With ArchiveInterval = 5

OSWatcher - Written by Carl Davis, Center of Expertise,
Oracle Corporation
For questions on install/usage please go to MOS (Note:301137.1)

Data is stored in directory: /home/oracle/tools/OSWatcher/oswbb/archive

Starting Data Collection...

oswbb heartbeat:Tue Mar 23 16:48:02 +01 2021

nmon started to write to the archive-file then:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ls -l archive/cus_nmon
-rw-r--r--. 1 oracle oinstall 31778 Mar 23 16:48 oracle-19c6-vagrant_nmon_21.03.23.1600.dat
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)]

Here the running nmon-process:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ps -ef | grep nmon | grep -v grep
oracle   13905     1  0 16:48 pts/0    00:00:00 /bin/nmon -t -f -s 20 -c 35 -F /home/oracle/tools/OSWatcher/oswbb/archive/cus_nmon/oracle-19c6-vagrant_nmon_21.03.23.1600.dat
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] 

After the full hour OSWatcher will zip the file from last hour and a new file is produced. Here the files from an earlier run:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ls -ltr archive/cus_nmon_old/
-rw-r--r--. 1 oracle oinstall 10920 Mar 23 12:58 oracle-19c6-vagrant_nmon_21.03.23.1200.dat.gz
-rw-r--r--. 1 oracle oinstall 16597 Mar 23 13:58 oracle-19c6-vagrant_nmon_21.03.23.1300.dat.gz
-rw-r--r--. 1 oracle oinstall 53453 Mar 23 14:32 oracle-19c6-vagrant_nmon_21.03.23.1400.dat
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] 

The only thing remaining is to stop nmon when stopping OSWatcher. I copied stopOSWbb.sh to stopOSWbb_nmon.sh and adjusted the file:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] cp -p stopOSWbb.sh stopOSWbb_nmon.sh 
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] vi stopOSWbb_nmon.sh 
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] cat stopOSWbb_nmon.sh 
#!/bin/sh
######################################################################
# stopOSW.sh
# This is the script which terminates all processes associated with
# the OSWatcher program.
######################################################################
# Kill the OSWatcher processes
######################################################################
PLATFORM=`/bin/uname`

case $PLATFORM in
  AIX)
    kill -15 `ps -ef | grep OSWatch | grep -v grep | awk '{print $2}'` 
    ;;
  *)
    kill -15 `ps -e | grep OSWatch | awk '{print $1}'`
    kill -12 `ps -e | grep nmon | awk '{print $1}'`
    ;;
esac
######################################################################
# Clean up heartbeat file from /tmp
######################################################################
rm /tmp/osw.hb
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] 

I.e. I added the line with “kill -12”, because the nmon-developer mentioned on his website that kill -USR2 is the correct way to stop nmon. Then we can also stop OSWatcher including nmon:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ./stopOSWbb_nmon.sh 
User defined signal 2
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] ps -e | grep nmon
oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)]

REMARK: You only have to consider that after a restart of OSWatcher nmon won’t be active until the next top of the hour.

After gathering data for a while you may produce nice html-files using nmonchart:


oracle@oracle-19c6-vagrant:/home/oracle/tools/OSWatcher/oswbb/ [orclcdb1 (CDB$ROOT)] nmonchart archive/cus_nmon_old/oracle-19c6-vagrant_nmon_21.03.23.1400.dat /tmp/oracle-19c6-vagrant_nmon_21.03.23.1400.html

Then enjoy the useful info the html-file provides.