{"id":4362,"date":"2015-02-15T18:37:29","date_gmt":"2015-02-15T17:37:29","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/"},"modified":"2015-02-15T18:37:29","modified_gmt":"2015-02-15T17:37:29","slug":"oracletext-inserts-and-fragmentation","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/","title":{"rendered":"OracleText: inserts and fragmentation"},"content":{"rendered":"<h2>By Franck Pachot<\/h2>\n<p>.<\/p>\n<p>\nI plan to write several posts about OracleText indexes, which is a feature that is not used enough in my opinion. It&#8217;s available in all editions and can index small text or large documents to search by words. When you create an OracleText index, a few tables are created to store the words and the association between those words and the table row that contains the document. I&#8217;ll start to show how document inserts are processed.\n<\/p>\n<h3>Create the table and index<\/h3>\n<p>\nI&#8217;m creating a simple table with a CLOB\n<\/p>\n<pre><code>SQL&gt; create table DEMO_CTX_FRAG\n     (num number constraint DEMO_CTX_FRAG_PK primary key,txt clob);\n\nTable created.\n<\/code><\/pre>\n<p>and a simple OracleText on that column<\/p>\n<pre><code>SQL&gt; create index DEMO_CTX_INDEX on DEMO_CTX_FRAG(txt)\n     indextype is ctxsys.context;\n\nIndex created.\n<\/code><\/pre>\n<p>That creates the following tables:<\/p>\n<ul>\n<li>DR$DEMO_CTX_INDEX$I which stores the tokens (e.g words)<\/li>\n<li>DR$DEMO_CTX_INDEX$K which index the documents (docid) and links them to the table ROWID<\/li>\n<li>DR$DEMO_CTX_INDEX$R which stores the opposite way navigation (get ROWID from a docid)<\/li>\n<li>DR$DEMO_CTX_INDEX$N which stores docid for deferred maintenance cleanup.<\/li>\n<\/ul>\n<h3>Inserts<\/h3>\n<p>\nI&#8217;m inserting a row with some text in the clob column\n<\/p>\n<pre><code>SQL&gt; insert into DEMO_CTX_FRAG values (0001,'Hello World');\n\n1 row created.\n<\/code><\/pre>\n<p>I commit<\/p>\n<pre><code>SQL&gt; commit;\n\nCommit complete.\n<\/code><\/pre>\n<p>And here is what we have in the OracleText tables:<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$K;\nno rows selected\n\nSQL&gt; select * from DR$DEMO_CTX_INDEX$R;\nno rows selected\n\nSQL&gt; select * from DR$DEMO_CTX_INDEX$I;\nno rows selected\n\nSQL&gt; select * from DR$DEMO_CTX_INDEX$N;\nno rows selected\n<\/code><\/pre>\n<p>Nothing is stored here yet. Which means that we cannot find our newly inserted row from an OracleText search.<\/p>\n<p>\nBy default, all inserts maintain the OracleText tables asynchronously.<br \/>\nThe inserted row is referenced in a CTXSYS queuing table that stores the pending inserts:\n<\/p>\n<pre><code>SQL&gt; select * from CTXSYS.DR$PENDING;\n\n   PND_CID    PND_PID PND_ROWID          PND_TIMES P\n---------- ---------- ------------------ --------- -\n      1400          0 AAAXUtAAKAAABWlAAA 13-FEB-15 N\n<\/code><\/pre>\n<p>and we have a view over it:<\/p>\n<pre><code>SQL&gt; select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;\n\nPND_INDEX_NAME                 PND_ROWID          PND_TIMES\n------------------------------ ------------------ ---------\nDEMO_CTX_INDEX                 AAAXUtAAKAAABWlAAA 13-FEB-15\n<\/code><\/pre>\n<\/p>\n<h3>Synchronization<\/h3>\n<p>\nlet&#8217;s synchronize:\n<\/p>\n<pre><code>SQL&gt; exec ctx_ddl.sync_index('DEMO_CTX_INDEX');\n\nPL\/SQL procedure successfully completed.\n<\/code><\/pre>\n<p>The queuing table has been processed:<\/p>\n<pre><code>SQL&gt; select pnd_index_name,pnd_rowid,pnd_timestamp from ctx_user_pending;\n\nno rows selected\n<\/code><\/pre>\n<p>and here is how that document is sotred in our OracleText tables.<\/p>\n<p>\n$K records one document (docid=1) and the table rowid that contains it:\n<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$K;\n\n     DOCID TEXTKEY\n---------- ------------------\n         1 AAAXUtAAKAAABWlAAA\n<\/code><\/pre>\n<p>$R table stores the docid -&gt; rowid is a non-relational way:<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$R;\n\n    ROW_NO DATA\n---------- ------------------------------------------------------------\n         0 00001752D0002800000569404141\n<\/code><\/pre>\n<p>How is it stored? It&#8217;s an array of ROWIDs which are fixed length. Then from the docid we can directly go to the offset and get the rowid. Because DATA is limited to 4000 bytes, there are several rows. But a docid determines the ROW_NO as well as the offset in DATA.<\/p>\n<p>\n$I stores the tokens (which are the words here as we have TEXT token &#8211; which is the type 0) as well as their location information:\n<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$I;\n\nTOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO\n---------- ---------- ----------- ---------- ----------- ----------\nHELLO               0           1          1           1 008801\nWORLD               0           1          1           1 008802\n<\/code><\/pre>\n<p>For each word it stores the range of docid that contains the work (token_first and token_last are those docid) and token_info stores in an binary way the occurrences of the word within the documents (it stores pairs of docid and offest within the document). It&#8217;s a BLOB but is limited to 4000 bytes so that it is stored inline. Which means that if a token is present in a lot of document, several lines in $I will be needed, each covering a different range of docid. This has changed in 12c and we will see that in future blog posts.<\/p>\n<p>\nThus, we can have several rows for one token. This is the first cause of fragmentation. Searching for documents that contain such a word will have to read several lines of the $I table.<\/p>\n<p>The $N has nothing here because we synchronized only inserts and there is nothing to cleanup.\n<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$N;\n\nno rows selected\n<\/code><\/pre>\n<\/p>\n<h3>Several inserts<\/h3>\n<p>\nI will insert two lines, which also contain the &#8216;hello&#8217; word.\n<\/p>\n<pre><code>SQL&gt; insert into DEMO_CTX_FRAG values (0002,'Hello Moon, hello, hello');\n\n1 row created.\n\nSQL&gt; insert into DEMO_CTX_FRAG values (0003,'Hello Mars');\n\n1 row created.\n\nSQL&gt; commit;\n\nCommit complete.\n<\/code><\/pre>\n<p>And I synchronize:<\/p>\n<pre><code>SQL&gt; exec ctx_ddl.sync_index('DEMO_CTX_INDEX');\n\nPL\/SQL procedure successfully completed.\n<\/code><\/pre>\n<p>So, I&#8217;ve now 3 documents: <\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$K;\n\n     DOCID TEXTKEY\n---------- ------------------\n         1 AAAXUtAAKAAABWlAAA\n         2 AAAXUtAAKAAABWlAAB\n         3 AAAXUtAAKAAABWlAAC\n<\/code><\/pre>\n<p>The reverse mapping array has increased:<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$R;\n\n    ROW_NO DATA\n---------- ------------------------------------------------------------\n         0 00001752D000280000056940414100001752D00028000005694041420000\n<\/code><\/pre>\n<p>And now the tokens:<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$I;\n\nTOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT TOKEN_INFO\n---------- ---------- ----------- ---------- ----------- ----------\nHELLO               0           1          1           1 008801\nWORLD               0           1          1           1 008802\nHELLO               0           2          3           2 0098010201\nMARS                0           3          3           1 008802\nMOON                0           2          2           1 008802\n<\/code><\/pre>\n<p>What is interesting here is that the previous lines (docid 1) have not been updated and new lines have been inserted for docid 2 and 3.<\/p>\n<ul>\n<li>&#8216;moon&#8217; is only in docid 2<\/li>\n<li>&#8216;mars&#8217; is only in docid 3<\/li>\n<li>&#8216;hello&#8217; is in 2 (token_count) documents, from docid 2 to docid 3 (token_first and token_last)<\/li>\n<\/ul>\n<p>\nThis is the other cause of fragmentation coming from frequent sync. Each sync will add new rows. However, when multiple documents are processed in the same sync, then only one $I entry per token is needed.\n<\/p>\n<p>\nThere is a third cause of fragmentation. We see here that the token_info is larger for that HELLO covering docid 2 to 3 because there are several occurrences of the token. All that must fit in memory when we synchronize. So it&#8217;s good to synchronize when we have several documents (so that the common tokens are not too fragmented) but we need also to have enough memory. The default is 12M and is usually too small. It can be increased with the &#8216;index memory&#8217; parameter of the index. And there is also a maximum set by ctx_adm.set_parameter for which the default (50M) is also probably too low.\n<\/p>\n<p>\nNothing yet in the $N table that we will see in the next post:\n<\/p>\n<pre><code>SQL&gt; select * from DR$DEMO_CTX_INDEX$N;\n\nno rows selected\n<\/code><\/pre>\n<\/p>\n<h3>summary<\/h3>\n<p>\nThe important points here is that inserted document are visible only after synchronization, and synchronizing too frequently will cause fragmentation. If you need to synchronize in real time (on commit) and you commit for each document inserted, then you will probably have to plan frequent index optimization. If on the other hand we are able to synchronize only when we have inserted a lot of documents, then fragmentation is reduced according that we had enough memory to process all documents in one pass.\n<\/p>\n<p>The <a href=\"https:\/\/www.dbi-services.com\/index.php\/blog\/entry\/oracletext-deletes-and-garbage\">next<\/a> post will be about deletes and updates.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Franck Pachot . I plan to write several posts about OracleText indexes, which is a feature that is not used enough in my opinion. It&#8217;s available in all editions and can index small text or large documents to search by words. When you create an OracleText index, a few tables are created to store [&hellip;]<\/p>\n","protected":false},"author":27,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[198,59],"tags":[96],"type_dbi":[],"class_list":["post-4362","post","type-post","status-publish","format-standard","hentry","category-database-management","category-oracle","tag-oracle"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>OracleText: inserts and fragmentation - dbi Blog<\/title>\n<meta name=\"description\" content=\"How OracleText processes inserts.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OracleText: inserts and fragmentation\" \/>\n<meta property=\"og:description\" content=\"How OracleText processes inserts.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2015-02-15T17:37:29+00:00\" \/>\n<meta name=\"author\" content=\"Oracle Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Oracle Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\"},\"author\":{\"name\":\"Oracle Team\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\"},\"headline\":\"OracleText: inserts and fragmentation\",\"datePublished\":\"2015-02-15T17:37:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\"},\"wordCount\":852,\"commentCount\":0,\"keywords\":[\"Oracle\"],\"articleSection\":[\"Database management\",\"Oracle\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\",\"name\":\"OracleText: inserts and fragmentation - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\"},\"datePublished\":\"2015-02-15T17:37:29+00:00\",\"author\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\"},\"description\":\"How OracleText processes inserts.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.dbi-services.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OracleText: inserts and fragmentation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#website\",\"url\":\"https:\/\/www.dbi-services.com\/blog\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee\",\"name\":\"Oracle Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g\",\"caption\":\"Oracle Team\"},\"url\":\"https:\/\/www.dbi-services.com\/blog\/author\/oracle-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"OracleText: inserts and fragmentation - dbi Blog","description":"How OracleText processes inserts.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/","og_locale":"en_US","og_type":"article","og_title":"OracleText: inserts and fragmentation","og_description":"How OracleText processes inserts.","og_url":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/","og_site_name":"dbi Blog","article_published_time":"2015-02-15T17:37:29+00:00","author":"Oracle Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Oracle Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/"},"author":{"name":"Oracle Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"headline":"OracleText: inserts and fragmentation","datePublished":"2015-02-15T17:37:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/"},"wordCount":852,"commentCount":0,"keywords":["Oracle"],"articleSection":["Database management","Oracle"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/","url":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/","name":"OracleText: inserts and fragmentation - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"datePublished":"2015-02-15T17:37:29+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee"},"description":"How OracleText processes inserts.","breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/oracletext-inserts-and-fragmentation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"OracleText: inserts and fragmentation"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/66ab87129f2d357f09971bc7936a77ee","name":"Oracle Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f711f7cd2c9b09bf2627133755b569fb5be0694810cfd33033bdd095fedba86d?s=96&d=mm&r=g","caption":"Oracle Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/oracle-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/4362","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=4362"}],"version-history":[{"count":0,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/4362\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=4362"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=4362"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=4362"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=4362"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}