Everybody has already faced performance problem with oracle CLOB columns.
The aim of this blog is to show you (always from a real user case) how to use one of Oracle Text Indexes (CONTEXT index) to solve performance problem with CLOB column.
The oracle text complete documentation is here : Text Application Developer’s Guide
Let’s start with the following SQL query which take more than 6.18 minutes to execute :
SQL> set timing on SQL> set autotrace traceonly SQL> select * from v_sc_case_pat_hist pat_hist where upper(pat_hist.note) LIKE '%FC IV%'; 168 rows selected. Elapsed: 00:06:18.09 Execution Plan ---------------------------------------------------------- Plan hash value: 1557300260 -------------------------------------------------------------------------------- --------------------- | Id | Operation | Name | Rows | Bytes | Co st (%CPU)| Time | -------------------------------------------------------------------------------- --------------------- | 0 | SELECT STATEMENT | | 521 | 9455K| 24 285 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE | 1 | 28 | 2 (0)| 00:00:01 | |* 2 | INDEX SKIP SCAN | PK_DWH_CODE | 1 | | 1 (0)| 00:00:01 | |* 3 | VIEW | | 521 | 9455K| 24 285 (1)| 00:00:01 | |* 4 | TABLE ACCESS FULL | CASE_PAT_HIST | 521 | 241K| 24 283 (1)| 00:00:01 | -------------------------------------------------------------------------------- --------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DWH_CODE_NAME"='DWH_PIT_DATE') filter("DWH_CODE_NAME"='DWH_PIT_DATE') 3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE")) 4 - filter(UPPER("NOTE") LIKE '%FC IV%') Note ----- - dynamic statistics used: dynamic sampling (level=2) - 1 Sql Plan Directive used for this statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 134922 consistent gets 106327 physical reads 124 redo size 94346 bytes sent via SQL*Net to client 37657 bytes received via SQL*Net from client 338 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 168 rows processed SQL>
Checking the execution plan, oracle optimizer does a Full Scan to access the table CASE_PAT_HIST.
As the the sql function upper(pat_hist.note) is used, let’s try to create a function based index :
CREATE INDEX IDX_FBI_I1 ON SCORE.CASE_PAT_HIST (UPPER(note)) * ERROR at line 1: ORA-02327: cannot create index on expression with datatype LOB
The message is clear, we cannot create an index on a column with dataype LOB.
Moreover :
- Even if my column datatype would not be a CLOB, the search criteria LIKE ‘%FC IV%’ prevent the use of any index since the oracle optimizer has no idea from which letter the string get started, so it will scan the whole table.
- Indeed, only the following search criteria will use the index :
- LIKE ‘%FC IV’
- LIKE ‘FC IV%’
- LIKE ‘FC%IV’
So to improve the performance of my SQL query and to index my CLOB column, the solution is to create an Oracle Text Index :
In Oracle 12.1 release, three different Oracle Text Index exists :
- CONTEXT: Suited for indexing collections or large coherent documents.
- CTXCAT: Suited for small documents or text fragments.
- CTXRULE: Used to build a document classification or routing application.
So, let’s try to create an oracle text index of type CONTEXT :
SQL> CREATE INDEX IDX_CPH_I3 ON SCORE.CASE_PAT_HIST LOB(note) INDEXTYPE IS CTXSYS.CONTEXT; Index created. Elapsed: 00:00:51.76 SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SCORE','CASE_PAT_HIST'); PL/SQL procedure successfully completed. Elapsed: 00:00:25.20
Now we have to change the queries Where Clause in order to query it with the CONTAINS operator :
SQL> SELECT * from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 0; 170 rows selected. Elapsed: 00:00:00.82 Execution Plan ---------------------------------------------------------- Plan hash value: 768870586 -------------------------------------------------------------------------------- --------------------- | Id | Operation | Name | Rows | Bytes | Co st (%CPU)| Time | -------------------------------------------------------------------------------- --------------------- | 0 | SELECT STATEMENT | | 2770 | 49M| 3 355 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE | 1 | 28 | 2 (0)| 00:00:01 | |* 2 | INDEX SKIP SCAN | PK_DWH_CODE | 1 | | 1 (0)| 00:00:01 | |* 3 | VIEW | | 2770 | 49M| 3 355 (1)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID | CASE_PAT_HIST | 2770 | 1284K| 3 353 (1)| 00:00:01 | |* 5 | DOMAIN INDEX | IDX_CPH_I3 | | | 483 (0)| 00:00:01 | -------------------------------------------------------------------------------- --------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DWH_CODE_NAME"='DWH_PIT_DATE') filter("DWH_CODE_NAME"='DWH_PIT_DATE') 3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE")) 5 - access("CTXSYS"."CONTAINS"("NOTE",'%FC IV%',1)>0) Note ----- - dynamic statistics used: dynamic sampling (level=2) - 1 Sql Plan Directive used for this statement Statistics ---------------------------------------------------------- 59 recursive calls 0 db block gets 3175 consistent gets 417 physical reads 176 redo size 95406 bytes sent via SQL*Net to client 38098 bytes received via SQL*Net from client 342 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 170 rows processed
Now the query is executed in just few millisecond.
By checking the execution plan, we note an access to the table SCORE.CASE_PAT_HIST within a DOMAIN INDEX represented by our Context Index (IDX_CPH_I3).
Now let’s compare the results given by the LIKE and the CONTAINS operators:
SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE upper(note) LIKE '%FC IV%' 2 UNION ALL 3 SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1; COUNT(*) ---------- 168 170
The CONTAINS clause return 2 rows in more, let’s checking :
SQL> select note from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 and case_id in (1,2); NOTE -------------------------------------------------------------------------------- Text Before , functional class (FC) IV Text Before , WHO FC-IV
For the LIKE clause, the wildcard %FC IV% returns :
- Text Before FC IV
- FC IV Text After
- Text Before FC IV Text After
For the CONTAINS clause, the wildcard %FC IV% returns :
- Text Before FC IV
- FC IV Text After
- Text Before FC IV Text After
- Text Before FC%IV
- FC%IV Text After
- Text Before FC%IV Text After
So, in term of functionality LIKE and CONTAINS clause are not exactly the same since the former returns less data than the last one.
If we translate the CONTAINS clause in the LIKE clause, we should write : “LIKE ‘%FC%IV%'”
For my customer case, the CONTAINS clause is correct, the business confirmed me this datas must be returned.
The CONTEXT oracle text index has some limitations :
- You cannot using several CONTAINS through the operand OR / AND, you will face oracle error “ORA-29907: found duplicate labels in primary invocations”
-
SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%FC TZ%', 1) > 1; SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%BE TZ%', 1) > 1 * ERROR at line 1: ORA-29907: found duplicate labels in primary invocations
To solve this issue, let’s rewrite the SQL through an UNION clause :
SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 2 UNION 3 SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%BE TZ%', 1) > 1; COUNT(*) ---------- 170 112
Conclusion :
- The benefits of a creating Oracle Text Index (CONTEXT index in our case) include fast response time for text queries with the CONTAINS, CATSEARCH and MATCHES Oracle Text operators.We decrease the response tie from 6.18 mins to few milliseconds.
- CATSEARCH and MATCHES are respectively the operators used for CTXCAT index and CTXRULE index I will present you in a next blog.
- Transparent Data Encryption-enabled column does not support Oracle Text Indexes.
- Always check if the data returned by the CONTAINS clause correspond to your business needs.