Everybody has already faced performance problem with oracle CLOB columns.

The aim of this blog is to show you (always from a real user case) how to use one of Oracle Text Indexes (CONTEXT index) to solve performance problem with CLOB column.

The oracle text complete documentation is here : Text Application Developer’s Guide

Let’s start with the following SQL query which take more than 6.18 minutes to execute :

SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from v_sc_case_pat_hist pat_hist where upper(pat_hist.note) LIKE '%FC IV%';

168 rows selected.

Elapsed: 00:06:18.09

Execution Plan
----------------------------------------------------------
Plan hash value: 1557300260

--------------------------------------------------------------------------------
---------------------

| Id  | Operation                           | Name          | Rows  | Bytes | Co
st (%CPU)| Time     |

--------------------------------------------------------------------------------
---------------------

|   0 | SELECT STATEMENT                    |               |   521 |  9455K| 24
285   (1)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE      |     1 |    28 |
  2   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN                   | PK_DWH_CODE   |     1 |       |
  1   (0)| 00:00:01 |

|*  3 |  VIEW                               |               |   521 |  9455K| 24
285   (1)| 00:00:01 |

|*  4 |   TABLE ACCESS FULL                 | CASE_PAT_HIST |   521 |   241K| 24
283   (1)| 00:00:01 |

--------------------------------------------------------------------------------
---------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DWH_CODE_NAME"='DWH_PIT_DATE')
       filter("DWH_CODE_NAME"='DWH_PIT_DATE')
   3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND
              "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE"))
   4 - filter(UPPER("NOTE") LIKE '%FC IV%')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     134922  consistent gets
     106327  physical reads
        124  redo size
      94346  bytes sent via SQL*Net to client
      37657  bytes received via SQL*Net from client
        338  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        168  rows processed

SQL>

 

Checking the execution plan, oracle optimizer does a Full Scan to access the table CASE_PAT_HIST.

As the the sql function upper(pat_hist.note) is used, let’s try to create a function based index :

CREATE INDEX IDX_FBI_I1 ON SCORE.CASE_PAT_HIST (UPPER(note))
                                                *
ERROR at line 1:
ORA-02327: cannot create index on expression with datatype LOB

 

The message is clear, we cannot create an index on a column with dataype LOB.

Moreover :

  1. Even if my column datatype would not be a CLOB, the search criteria LIKE ‘%FC IV%’  prevent the use of any index since the oracle optimizer has no idea from which letter the string get started, so it will scan the whole table.
  2. Indeed, only the following search criteria will use the index :
    1. LIKE ‘%FC IV’
    2. LIKE ‘FC IV%’
    3. LIKE ‘FC%IV’

So to improve the performance of my SQL query and to index my CLOB column, the solution is to create an Oracle Text Index :

In Oracle 12.1 release, three different Oracle Text Index exists :

  • CONTEXT: Suited for indexing collections or large coherent documents.
  • CTXCAT: Suited for small documents or text fragments.
  • CTXRULE: Used to build a document classification or routing application.

So, let’s try to create an oracle text index of type CONTEXT :

SQL> CREATE INDEX IDX_CPH_I3 ON SCORE.CASE_PAT_HIST LOB(note) INDEXTYPE IS CTXSYS.CONTEXT;

Index created.

Elapsed: 00:00:51.76
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SCORE','CASE_PAT_HIST');

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.20

 

Now we have to change the queries Where Clause in order to query it with the CONTAINS operator :

SQL> SELECT * from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 0;

170 rows selected.

Elapsed: 00:00:00.82

Execution Plan
----------------------------------------------------------
Plan hash value: 768870586

--------------------------------------------------------------------------------
---------------------

| Id  | Operation                           | Name          | Rows  | Bytes | Co
st (%CPU)| Time     |

--------------------------------------------------------------------------------
---------------------

|   0 | SELECT STATEMENT                    |               |  2770 |    49M|  3
355   (1)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE      |     1 |    28 |
  2   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN                   | PK_DWH_CODE   |     1 |       |
  1   (0)| 00:00:01 |

|*  3 |  VIEW                               |               |  2770 |    49M|  3
355   (1)| 00:00:01 |

|   4 |   TABLE ACCESS BY INDEX ROWID       | CASE_PAT_HIST |  2770 |  1284K|  3
353   (1)| 00:00:01 |

|*  5 |    DOMAIN INDEX                     | IDX_CPH_I3    |       |       |
483   (0)| 00:00:01 |

--------------------------------------------------------------------------------
---------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DWH_CODE_NAME"='DWH_PIT_DATE')
       filter("DWH_CODE_NAME"='DWH_PIT_DATE')
   3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND
              "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE"))
   5 - access("CTXSYS"."CONTAINS"("NOTE",'%FC IV%',1)>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


Statistics
----------------------------------------------------------
         59  recursive calls
          0  db block gets
       3175  consistent gets
        417  physical reads
        176  redo size
      95406  bytes sent via SQL*Net to client
      38098  bytes received via SQL*Net from client
        342  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        170  rows processed

 

Now the query is executed in just few millisecond.

By checking the execution plan, we note an access to the table SCORE.CASE_PAT_HIST within a DOMAIN INDEX represented by our Context Index (IDX_CPH_I3).

Now let’s compare the results given by the LIKE and the CONTAINS operators:

SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE upper(note) LIKE '%FC IV%'
  2  UNION ALL
  3  SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1;

  COUNT(*)
----------
       168
       170

 

The CONTAINS clause return 2 rows in more, let’s checking :

SQL> select note from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 and case_id in (1,2);

NOTE
--------------------------------------------------------------------------------
Text Before , functional class (FC) IV
Text Before , WHO FC-IV

 

For the LIKE clause, the wildcard %FC IV% returns :

  • Text Before FC IV
  • FC IV Text After
  • Text Before FC IV Text After

For the CONTAINS clause, the wildcard %FC IV% returns :

  • Text Before FC IV
  • FC IV Text After
  • Text Before FC IV Text After
  • Text Before FC%IV
  • FC%IV Text After
  • Text Before FC%IV Text After

So, in term of functionality LIKE and CONTAINS clause are not exactly the same since the former returns less data than the last one.

If we translate the CONTAINS clause in the LIKE clause, we should write : “LIKE ‘%FC%IV%'”

For my customer case, the CONTAINS clause is correct, the business confirmed me this datas must be returned.

The CONTEXT oracle text index has some limitations :

  • You cannot using several CONTAINS through the operand OR / AND, you will face oracle error “ORA-29907: found duplicate labels in primary invocations”
  • SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%FC TZ%', 1) > 1;
    SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%BE TZ%', 1) > 1
                                                                                               *
    ERROR at line 1:
    ORA-29907: found duplicate labels in primary invocations

 

To solve this issue, let’s rewrite the SQL through an UNION clause :

SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1
  2  UNION
  3  SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%BE TZ%', 1) > 1;

  COUNT(*)
----------
       170
       112

 

Conclusion :

  • The benefits of a creating Oracle Text Index (CONTEXT index in our case) include fast response time for text queries with the CONTAINS, CATSEARCH and MATCHES Oracle Text operators.We decrease the response tie from 6.18 mins to few milliseconds.
  • CATSEARCH and MATCHES are respectively the operators used for CTXCAT index and CTXRULE index I will present you in a next blog.
  • Transparent Data Encryption-enabled column does not support Oracle Text Indexes.
  • Always check if the data returned by the CONTAINS clause correspond to your business needs.