When you do not know what the Oracle Trace File Analyzer (tfa) is or you have heard about it but you don’t know what it is for then you probably should read this, especially when you are working in clustered Oracle environments. You know, Oracle loves to create plenty of trace files in various places (yes, this got much better with the Automatic Diagnostics Repository(ADR)). Now imagine you have a cluster and something goes wrong, where do you start? We created a very nice picture for our Grid Infrastructure workshop which gives an idea of all the components and where it can go wrong. Lets start with the components.

This is approximately what you get when you install a three node Oracle RAC infrastructure (I know there are even more when you go for a flex cluster but this is not important for the scope of this post):

cluster_overview

These are quite a few components and all need to work together properly for the cluster being healthy and doing what is expected. And when you have so many components many things can go wrong:

cluster_overview_issues

Depending on how good your monitoring is you might know where to start looking for the issue quite fast or you can rely on your experience for starting to troubleshoot. Always a good staring point are the alert logs of the cluster, asm and the database. They all are usually located under $ORACLE_BASE and there are many, many other trace files and directories. For a 12.2 database it looks like this:

oracle@oelrac1:/u01/app/oracle/diag/rdbms/db1/DB1_1/ [DB1_1] ls –la
drwxr-x---.  2 oracle asmadmin    20 Mar 21 14:55 alert
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 cdump
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 hm
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 incident
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 incpkg
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 ir
drwxr-x---.  2 oracle asmadmin  4096 Mar 21 15:00 lck
drwxr-x---.  7 oracle asmadmin    60 Mar 21 14:55 log
drwxr-x---.  2 oracle asmadmin  4096 Mar 21 15:00 metadata
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 metadata_dgif
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 metadata_pv
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 stage
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 sweep
drwxr-x---.  2 oracle asmadmin 36864 May 23 08:26 trace

For a 12.2 Grid Infrastrucure it looks like this:

oracle@oelrac1:/u01/app/oracle/diag/crs/oelrac1/crs/ [DB1_1] ls -la
drwxrwxr-x.  2 oracle oinstall    20 Mar 21 12:59 alert
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 cdump
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 incident
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 incpkg
drwxrwxr-x.  2 oracle oinstall  4096 Mar 21 12:59 lck
drwxrwxr-x.  4 oracle oinstall    29 Mar 21 12:59 log
drwxrwxr-x.  2 oracle oinstall  4096 Mar 21 12:59 metadata
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 metadata_dgif
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 metadata_pv
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 stage
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 sweep
drwxrwxr-x.  2 oracle oinstall 20480 May 23 08:28 trace

But there are even more when you go one level up:

oracle@oelrac1:/u01/app/oracle/diag/ [DB1_1] ls -la
drwxrwxr-x.  3 oracle oinstall   22 Mar 21 13:02 afdboot
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 apx
drwxrwxr-x.  5 oracle oinstall   51 Mar 21 13:04 asm
drwxrwxr-x.  4 oracle oinstall   40 Mar 21 13:04 asmtool
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 bdsql
drwxrwxr-x.  4 oracle oinstall   40 Mar 21 13:05 clients
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 diagtool
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 dps
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 em
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 gsm
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 ios
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 lsnrctl
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 netcman
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 ofm
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 plsql
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 plsqlapp
drwxrwxr-x.  3 oracle oinstall   20 Mar 21 13:08 tnslsnr

It can quite take some time to locate the trace file which contains the information you need for troubleshooting or for uploading to Oracle support. And this is where the Oracle Trace File Analyzer is a great help. There is a great support note to get started and to download the latest bundle: TFA Collector – TFA with Database Support Tools Bundle (Doc ID 1513912.1).
mos_tfa

As the note explains TFA is a bundle of tools. What you get when you download the bundle is this:

tfa_overview

So in words TFA consists of: A collecter, an Analyzer, the tfactl command line utility and a bunch of tools. Probably you already know one or more of these tools and maybe you already installed and used some of them. Stop doing that immediately: Use TFA which brings them all. To come back to the components picture from above: Once you installed TFA the picture looks like this:

tfa_overview_2

You use tfactl (the initiator) to talk to a local TFA daemon that can talk to the TFA daemons on all other nodes in the cluster. When you install an Oracle 12.2 Grid Infrastructure tfactl is already there:

oracle@oelrac1:/var/tmp/ [+ASM1] which tfactl
/u01/app/12.2.0.1/grid/bin/tfactl

The issue with that is that you do not have all of the support tools available, e.g. you won’t have SQLT, DA/RDA, Procwatcher and OSWatcher. For getting those you’ll need to download the complete bundle from the support note referenced above and then remove the current TFA installation and re-install it:

[root@oelrac1 tmp]$ /u01/app/12.2.0.1/grid/bin/tfactl uninstall
[root@oelrac1 tmp]$ ./installTFALite 
Enter a location for installing TFA (/tfa will be appended if not supplied) [/var/tmp/tfa]:
/u01/app/12.2.0.1/grid/tfa
Enter a Java Home that contains Java 1.5 or later : /u01/app/12.2.0.1/grid/jdk/

Once this is through you should see all the tools:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl toolstatus
.--------------------------------------.
|        External Support Tools        |
+---------+--------------+-------------+
| Host    | Tool         | Status      |
+---------+--------------+-------------+
| oelrac1 | oswbb        | RUNNING     |
| oelrac1 | darda        | DEPLOYED    |
| oelrac1 | prw          | NOT RUNNING |
| oelrac1 | orachk       | DEPLOYED    |
| oelrac1 | vi           | DEPLOYED    |
| oelrac1 | changes      | DEPLOYED    |
| oelrac1 | ps           | DEPLOYED    |
| oelrac1 | param        | DEPLOYED    |
| oelrac1 | events       | DEPLOYED    |
| oelrac1 | alertsummary | DEPLOYED    |
| oelrac1 | ls           | DEPLOYED    |
| oelrac1 | dbperf       | DEPLOYED    |
| oelrac1 | sqlt         | DEPLOYED    |
| oelrac1 | summary      | DEPLOYED    |
| oelrac1 | pstack       | DEPLOYED    |
| oelrac1 | tail         | DEPLOYED    |
| oelrac1 | oratop       | DEPLOYED    |
| oelrac1 | dbglevel     | DEPLOYED    |
| oelrac1 | exachk       | DEPLOYED    |
| oelrac1 | grep         | DEPLOYED    |
| oelrac1 | srdc         | DEPLOYED    |
| oelrac1 | history      | DEPLOYED    |
'---------+--------------+-------------'

I will not go into all the tools but highlight some of them. Lets start with oratop. When you want to know which statements are currently executing in your instance:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl 

tfactl> oratop -database DB1

This bring up something very similar to the “top” command which comes with the operating system but displays information of what is going on in the database:
oratop

Can be quite handy when you only have access via ssh. Another great tool is OSWatcher which gathers operating statistics in the background and is able to create graphs:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl run oswbb

Starting OSW Analyzer V7.3.3
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c)  2014 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...


Parsing file oelrac1_iostat_17.05.23.1500.dat ...
Parsing file oelrac1_iostat_17.05.23.1600.dat ...
Parsing file oelrac1_iostat_17.05.24.0800.dat ...
Parsing file oelrac1_iostat_17.05.24.0900.dat ...
Parsing file oelrac1_iostat_17.05.24.1000.dat ...
Parsing file oelrac1_iostat_17.05.24.1100.dat ...


Parsing file oelrac1_vmstat_17.05.23.1500.dat ...
Parsing file oelrac1_vmstat_17.05.23.1600.dat ...
Parsing file oelrac1_vmstat_17.05.24.0800.dat ...
Parsing file oelrac1_vmstat_17.05.24.0900.dat ...
Parsing file oelrac1_vmstat_17.05.24.1000.dat ...
Parsing file oelrac1_vmstat_17.05.24.1100.dat ...


Parsing file oelrac1_netstat_17.05.23.1500.dat ...
Parsing file oelrac1_netstat_17.05.23.1600.dat ...
Parsing file oelrac1_netstat_17.05.24.0800.dat ...
Parsing file oelrac1_netstat_17.05.24.0900.dat ...
Parsing file oelrac1_netstat_17.05.24.1000.dat ...
Parsing file oelrac1_netstat_17.05.24.1100.dat ...


Parsing file oelrac1_top_17.05.23.1500.dat ...
Parsing file oelrac1_top_17.05.23.1600.dat ...
Parsing file oelrac1_top_17.05.24.0800.dat ...
Parsing file oelrac1_top_17.05.24.0900.dat ...
Parsing file oelrac1_top_17.05.24.1000.dat ...
Parsing file oelrac1_top_17.05.24.1100.dat ...


Parsing file oelrac1_ps_17.05.23.1500.dat ...
Parsing file oelrac1_ps_17.05.23.1600.dat ...
Parsing file oelrac1_ps_17.05.24.0800.dat ...
Parsing file oelrac1_ps_17.05.24.0900.dat ...
Parsing file oelrac1_ps_17.05.24.1000.dat ...
Parsing file oelrac1_ps_17.05.24.1100.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Alter Graph Time Scale Only (Does not change analysis dataset)
Enter D to Return to Default Graph Time Scale
Enter R to Remove Currently Displayed Graphs

Enter A to Analyze Data
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)

Enter P to Generate A Profile
Enter X to Export Parsed Data to File
Enter Q to Quit Program

Please Select an Option:4

When you for example chose “4 Display Memory Graphs” this will bring up nice graphs about the memory consumption:
oswatcher_memory

Very handy as well. But what do you do when you have a seriou issues and need to collect all the trace/log files? Quite easy as well, lets do a short demo and force an ORA-07445:

oracle@oelrac1:/home/oracle/ [+ASM1] kill -l | grep SIGSEGV
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
oracle@oelrac1:/home/oracle/ [+ASM1] ps -ef | grep dbw  | grep DB1
oracle   14821     1  0 08:20 ?        00:00:00 ora_dbw0_DB1_1
oracle@oelrac1:/home/oracle/ [+ASM1] kill -11 14821

This will cause the database to restart, so please don’t do this on other systems than your lab systems. What we can do now is to ask TFA to collect all the files from all nodes which are required for troubleshooting the issue:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl diagcollect -srdc ora7445
Enter the time of the ORA-07445 [YYYY-MM-DD HH24:MI:SS,=ALL] :            
Enter the Database Name [=ALL] : DB1

1. May/24/2017 09:39:52 : [db1] ORA-07445: exception encountered: core dump [semtimedop()+10] [SIGSEGV] [ADDR:0xD43100006015] [PC:0x7F746C15DFCA] [unknown code] []

Please choose the event : 1-1 [1] 1
…
Logs are being collected to: /u01/app/oracle/tfa/repository/srdc_ora7445_collection_Wed_May_24_09_43_07_CEST_2017_node_local
/u01/app/oracle/tfa/repository/srdc_ora7445_collection_Wed_May_24_09_43_07_CEST_2017_node_local/oelrac1.tfa_srdc_ora7445_Wed_May_24_09_43_07_CEST_2017.zip

Get the zip file and either start to analyze yourself or upload to Oracle Support. Usually they should not ask you for additional log or trace files anymore 🙂 Saves a lot of time.

If you prefer a menu driven interface to all the tools you can have that as well:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl run darda

This will bring up a menu similar to a Linux text based installation program:
darda1

Chose what you want to do, e.g. “1” for “Oracle Database: Collectors”:
darda2

Go to “2 Optimize Performance”:
darda3

You have a “1 Slow Running Database”, so:
darda4

… and you have the possibility to run an AWR report from here. Of course you should monitor that TFA is running so that you get notified when things like this happen:

oracle@oelrac1:/home/oracle/ [+ASM1] sudo $ORACLE_HOME/bin/tfactl status

.----------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+---------+---------------+------+------+------------+----------------------+------------------+
| oelrac1 | RUNNING       | 2532 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
| oelrac2 | NOT RUNNING   | -    |      |            |                      |                  |
'---------+---------------+------+------+------------+----------------------+------------------'

There is much more you can do with tfa (check the documentation) and tfa is useful for single instances as well. Hope this helps ….