By Franck Pachot

.
In the previous post we have seen how the OracleText index tables are maintained when new document arrives: At sync the new documents are read up to the available memory and words are inserted in the $I table with their mapping information. Now we will see how removed documents are processed. We will not cover updates as their are just delete + insert.

Previous state

Here is the state from the previous post where I had those 3 documents:

  • ‘Hello World’

which was synced alone, and then the two following ones were synced together:

  • ‘Hello Moon, hello, hello’
  • ‘Hello Mars’

The $K is a IOT which maps the OracleText table ROWID to the DOCID (the fact that the primary key TEXTKEY is not at start is a bit misleading):

SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
         2 AAAXUtAAKAAABWlAAB
         3 AAAXUtAAKAAABWlAAC

The $R is a table mapping the opposite navigation (docid to rowid) storing a fixed-length array of ROWIDs indexed by the docid, and split into several lines:

SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000

The $I table stores the tokens, the first 5 columns being indexed ($X) and the TOKEN_INFO blob stores detailed location of the token:

SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802

We have seen that the $I table can be fragmented for 3 reasons:

  • Each sync insert his tokens (instead of merging with other ones)
  • TOKEN_INFO size is limited to fit in-row (we will see 12c new features later)
  • Only tokens that fit in the allocated memory can be merged

And the $N is empty for the moment:

SQL> select * from DR$DEMO_CTX_INDEX$N;

no rows selected

Delete

Do you remember how inserts are going to the CTXSYS.DR$PENDING table? Deletes are going to CTXSYS.DR$DELETE table:

SQL> delete from DEMO_CTX_FRAG where num=0002;

1 row deleted.

SQL> select * from CTXSYS.DR$DELETE;

DEL_IDX_ID DEL_IXP_ID  DEL_DOCID
---------- ---------- ----------
      1400          0          2

I’ve deleted docid=2 but the tokens are still there:

SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802

as well as their mapping to the ROWID:

SQL> -- $R is for rowid - docid mapping (IOT)
SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100001752D00028000005694041420000

However, the $N has been maintained to know that docid=2 has been removed:

SQL> select * from DR$DEMO_CTX_INDEX$N;

 NLT_DOCID N
---------- -
         2 U

This is the goal of $N (Negative) table which records the docid that should not be there and that must be deleted at next optimization (garbage collection).

From there, a search by words (‘normal lookup’) will give docid’s and rowid’s and the CTXSYS.DR$DELETE must be read in order to know that the document is not there anymore. It’s an IOT and the docid can be found with an index unique scan.

However for the opposite way, having a ROWID and checking if it contains some words (‘functional lookup’) we need to know that there is no document. In my case I deleted the row, but you may update the document, so the ROWID is still there. There is no pending table for that. It is maintained immediately in the $K table:

SQL> select * from DR$DEMO_CTX_INDEX$K;

     DOCID TEXTKEY
---------- ------------------
         1 AAAXUtAAKAAABWlAAA
         3 AAAXUtAAKAAABWlAAC

the entry that addressed docid=2 has been deleted.

Commit

All those changes were done within the same transaction. Other sessions still see the old values. No need to read CTXSYS.DR$DELETE for them. What I described above is only for my session: the normal lookup reading the queuing table, and the functional lookup stopping at $K. We don’t have to wait a sync to process CTXSYS.DR$DELETE. It’s done at commit:

SQL> commit;

Commit complete.

SQL> select * from CTXSYS.DR$DELETE;

no rows selected

SQL> select * from DR$DEMO_CTX_INDEX$R;

    ROW_NO DATA
---------- ------------------------------------------------------------
         0 00001752D000280000056940414100000000000000000000000000000000

Of course we can’t read it but we see that part of it has been zeroed. That $R table is definitely special: it’s not stored in a relational way, and its maintenance is deferred at commit time.

But nothing has changed in $I which contains garbage (and sync is not changing anything to that):

SQL> select * from DR$DEMO_CTX_INDEX$I;

TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO
---------- ---------- ----------- ---------- ----------- ----------
HELLO               0           1          1           1 008801
WORLD               0           1          1           1 008802
HELLO               0           2          3           2 0098010201
MARS                0           3          3           1 008802
MOON                0           2          2           1 008802

And of course $N row is still there to record the deleted docid:

SQL> select * from DR$DEMO_CTX_INDEX$N;

 NLT_DOCID N
---------- -
         2 U

Sync

I’ve not reproduced it here, but sync is not changing anything. Sync is for new documents – not for deleted ones.

Conclusion

What you need to remember here is:

  • New documents are made visible through OracleText index at sync
  • Removed document are immediately made invisible at commit

Of course, you can sync at commit, but the second thing to remember is that

  • New documents brings fragmentation
  • Removed document brings garbage

and both of them increase the size of the $I table and its $X index, making range scans less efficient. We will see more about that but the next post will be about queries. I’ve talked about normal and functional lookups and we will see how they are done. Let’s detail that.