This is part IV of the article, see here for Part I here, Part II here, and Part III here.

Instantiating a new docbase

See dbi services for the script’s source.

Roughly speaking, the instantiation process uncompresses a seed, or model, docbase from its compressed tar ball and runs the Migration Utility against it to adapt its name and id. Those actions are performed by the script instantiate_docbase.sh. Its usage is:

$ instantiate_docbase.sh seed_repo new_repo

where seed_repo and new_repo are stems pointing to the settings of the respective docbase defined in the global_parameters file. The easy-to-remember syntax is the same as that of the cp command, for a similar concept.

Instantiation is always performed locally so if it needs to be done on a machine different from the one where the model docbase was created, the new machine needs first to be enrolled (see script pre-requisites.sh) and the script invoked from there.

The procedure looks in the ${DOCUMENTUM} tree and changes all the references to the seed docbase so they match the target docbase, including the settings such as port numbers, docbroker names, host names, http server memory sizing, sym links, etc, and then tackles the metadata’s big part. For this to be successful, make sure the seed docbase’s definition in the global_parameter file has not changed and still matches its installation in the compressed tar ball.

For an Oracle RDBMS, a new schema is created with the required tablespaces and grants, the dump file that was produced by the exp utility is copied onto the database’s host, slightly edited (references to the seed’s schema are changed to the new instance’s schema) and imported into the new schema. Next, as the creation of some views’ fails with “IMP-00008: unrecognized statement in the export file:” errors, the view creation statements (almost 1000 of them) are extracted from the dump file (although this is a binary file, the SQL statements it contains can be extracted as text and edited), corrected and re-executed separately, as shown here:

# extract the view creation statements from export file and create the views;
strings ${seed_dmp_file} | sed -r 's/^[ \t\$]*(\(*)(SELECT)(.+)/\1\2\3/I' | gawk '{if (sub(/^CREATE VIEW /, "CREATE or REPLACE VIEW ", \$0)) {print; getline; pos = 2000; if (length(\$0) > pos) {while (substr(\$0, --pos, 1) != ","); print substr(\$0, 1, pos); print substr(\$0, pos + 1)}else print; print "/"}}END{print "quit"}' >> create_views.sql

# count the views to be created;
grep "CREATE or REPLACE VIEW" create_views.sql | wc -l
# 977 in CS 22.2;
# 983 in CS 22.4;

# create the views;
sqlplus ${ACTIVE_DATABASE_OWNER}/${ACTIVE_DATABASE_PASSWORD}@${db_remote_connect_string} @create_views.sql

The view text needs to be reformatted to work around the weird legacy format of the dump file.

For good measure, all the new schema’s compilable objects are recompiled and the schema’s statistics computed. A verification of the schema’s object validity is performed:

set echo on
set pagesize 10000
set linesize 100
col object_name format a30
col object_type format a30
select object_name, object_type from user_objects where status = 'INVALID';
select object_name, object_type, status from user_objects;
quit
eoq
# output:
no rows selected
OBJECT_NAME                    OBJECT_TYPE                    STATUS
------------------------------ ------------------------------ -------
D_1F00271080000195             INDEX                          VALID
D_1F00271080000241             INDEX                          VALID
DMI_CHANGE_RECORD_S            TABLE                          VALID
D_1F002710800001A8             INDEX                          VALID
DMI_DD_ATTR_INFO_R             TABLE                          VALID
...
DM_INDEXES                     VIEW                           VALID
DM_RESYNC_DD_ATTR_INFO         VIEW                           VALID
DM_RESYNC_DD_TYPE_INFO         VIEW                           VALID
DMI_DD_ATTR_INFO_DDEN          VIEW                           VALID

1931 rows selected.

For a PostgreSQL RDBMS, things are much less complicated. The server port is adapted and the server started. Next, the names of the schema, database and user are adapted and the ownership of each of the schema’s objects is displayed as a verification. Example of output:

  datname  
-----------
 postgres
 template1
 template0
 repo02
(4 rows)

                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 dmadmin   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repo02    | No inheritance                                             | {}

You are now connected to database "repo02" as user "dmadmin".
ALTER SCHEMA
You are now connected to database "postgres" as user "dmadmin".
ALTER DATABASE
NOTICE:  MD5 password cleared because of role rename
ALTER ROLE
ALTER ROLE
ALTER ROLE

You are now connected to database "repo03" as user "repo03".
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 dmadmin   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repo03    | No inheritance                                             | {}

 current_user 
--------------
 repo03
(1 row)

                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

  datname  
-----------
 postgres
 template1
 template0
 repo03
(4 rows)

 schemaname |              objectname               | len | objectowner | objecttype 
------------+---------------------------------------+-----+-------------+------------
 repo03     | adm_turbo_size                        |  14 | repo03      | TABLE
 repo03     | creation_date_message_id_ui           |  27 | repo03      | INDEX
 repo03     | d_1f00271280000001                    |  18 | repo03      | INDEX
 repo03     | d_1f00271280000002                    |  18 | repo03      | INDEX
...
 repo03     | dm_federation_log                     |  17 | repo03      | TABLE
 repo03     | dm_federation_r                       |  15 | repo03      | TABLE
 repo03     | dm_federation_rp                      |  16 | repo03      | VIEW
 repo03     | dm_federation_rv                      |  16 | repo03      | VIEW
 repo03     | dm_federation_s                       |  15 | repo03      | TABLE
...
 repo03     | status_no_idx                         |  13 | repo03      | INDEX
(1932 rows)

The above output shows the repo03 account and its roles, a database named repo03 and docbase’s objects owned by user repo03. Also, repo03 was instantiated from repo02 as the seed.

As for Oracle, the connectivity is also adapted here but instead of the SQL*Net’s tnsnames.ora, the installation owner’s ~/.odbc.ini file is adapted to match the target docbase’s settings and the reference to it in the server.ini file is adapted too. To accommodate instantiation concurrency, the file ~/.odbc.ini must be protected against simultaneous editions by a mutex. We apply the simple trick based on the atomicity of the mkdir command:

   bodbc_done=0
   mutex=/tmp/odbc_mutex_in_instantiate_docbase
   mutext_iterations=10
   while [[ ${mutext_iterations} -gt 0 ]]; do
      if mkdir ${mutex}; then
         # edit ~/.odbc.ini
...
         rmdir ${mutex}
         bodbc_done=1
         break
      else
         sleep 1
         ((mutext_iterations--))
      fi
   done
   if [[ ${bodbc_done} -ne 1 ]]; then
      echo "could not get a mutex on ~/.odbc.ini, aborting current instance of $0 ..."
      exit 1
   fi

So, the script attempts to get the mutex for up to 20s, which should be enough as the editions are very fast. That concludes the RDBMS part.

The new instance’s password is also saved using the documentum’s dm_encrypt_password utility, and tested using dmdbtest. At this point, the Migration Utility step can be started.

The Migration Utility

The Migration Utility is located in ${DOCUMENTUM}/product/22.2/install/external_apps/MigrationUtil. This tool takes its instructions from the xml configuration file ${DM_HOME}/install/external_apps/MigrationUtil/config.xml. The following relevant attributes’ values are edited to match the target docbase’s settings:

  • the type of RDBMS, the listener’s (for Oracle) or the server’s port (for PostgreSQL)
  • the database host name
  • the install owner’s password
  • the docbase’s name and database’s password
  • the old and new docbase ids
  • the new docbase server name (this is not a machine name but the server object’s one in dm_server_config)
  • the docbase’s machine name
  • the installation owner’s name

Here is an example of the config.xml file generated while instantiating repo02 into bookstore:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
   <comment>Database connection details</comment>
   <entry key="dbms">postgres</entry> <!-- This would be either sqlserver, oracle, db2 or postgres -->
   <entry key="tgt_database_server">localhost</entry> <!-- Database Server host or IP -->
   <entry key="port_number">6000</entry> <!-- Database port number -->
   <entry key="InstallOwnerPassword">dmadmin</entry>
   <entry key="isRCS">no</entry> <!-- set it to yes, when running the utility on secondary CS -->
   <entry key="TomcatPath"></entry> <!-- Optional. set it to appropriate tomcat path if DM_JMS_HOME environment variable is not set -->
   <!-- <comment>List of docbases in the machine</comment> -->
   <entry key="DocbaseName.1">repo02</entry>
   <entry key="DocbaseName.2">Docbase2</entry>
   <entry key="DocbaseName.3"></entry>
   <!-- <comment>docbase owner password</comment> -->
   <entry key="DocbasePassword.1">bookstore</entry>
   <entry key="DocbasePassword.2">Docbase owner password2</entry>
   <entry key="DocbasePassword.3"></entry>
   <entry key="ChangeDocbaseID">yes</entry> <!-- To change docbase ID or not -->
   <entry key="Docbase_name">repo02</entry> <!-- has to match with DocbaseName.1 -->
   <entry key="NewDocbaseID">20000</entry> <!-- New docbase ID -->
   <entry key="ChangeServerName">yes</entry>
   <entry key="NewServerName.1">bookstore</entry>
   <entry key="NewServerName.2"> </entry>
   <entry key="ChangeDocbaseName">yes</entry>
   <entry key="NewDocbaseName.1">bookstore</entry>
   <entry key="NewDocbaseName.2"> </entry>
   <entry key="ChangeHostName">no</entry>
   <entry key="HostName">Old Host name </entry>
   <entry key="NewHostName">New Host name </entry>
   <entry key="ChangeInstallOwner">no</entry>
   <entry key="InstallOwner">Old Install Owner </entry>
   <entry key="NewInstallOwner"> New Install Owner </entry>
   <entry key="NewInstallOwnerPassword">New Install Owner password </entry>
   <entry key="DockerSeamlessUpgrade">no</entry>
   <entry key="PrimaryHost">Primary Host name </entry>
   <entry key="SecondaryHost">Secondary Host name </entry>
   <entry key="PrimaryServerConfig">Primary Server name </entry>
   <entry key="SecondaryServerConfig">Secondary Server name </entry>
   <entry key="DocbaseService">Docbase Service name </entry>
</properties>

Since the Migration Utility is a java application, it uses JDBC to connect to the database; thus, prior to launching it, a JDBC connection to the database is attempted in order to check the connectivity parameters. To this effect, a simple but generic utility has been specially written for this project, jdbc_tester_generic.java (see article A Generic JDBC Tester), located in ${scripts_dir}. Its usage is:

$ java -cp jdbc_driver_jar:path_to_the_tester jdbc_tester_generic jdbc_driver user_name password jdbc_url sql_statement

# Example for a PostgreSQL database:
$ export CLASSPATH=${dctm_software}/${postgresql_jdbc_package}:$CLASSPATH
$ java -cp $CLASSPATH:${scripts_dir} jdbc_tester_generic org.postgresql.Driver dmadmin xxx jdbc:postgresql://localhost:6000/bookstore "select 'now in postgresql db: ' || to_char(current_timestamp(0), 'DD/MM/YYYY HH24:MI:SS') as now;"
The command-line parameters are:
arg 0 = org.postgresql.Driver
arg 1 = dmadmin
arg 2 = xxx
arg 3 = jdbc:postgresql://localhost:6000/bookstore
arg 4 = select 'now in postgresql db: ' || to_char(current_timestamp(0), 'DD/MM/YYYY HH24:MI:SS') as now;
Loading driver [org.postgresql.Driver]:
JDBC Driver org.postgresql.Driver successfully loaded
Connecting to url [jdbc:postgresql://localhost:6000/bookstore] as user [dmadmin]:
Successful connection to database
Executing SQL query [select 'now in postgresql db: ' || to_char(current_timestamp(0), 'DD/MM/YYYY HH24:MI:SS') as now;]:
Retrieving the result set:
now in postgresql db: 08/01/2023 19:00:53
JDBC connection successfully tested, quitting ...

# Example for an Oracle database;
$ ln -s ${ORACLE_HOME}/ojdbc8.jar ${ORACLE_HOME}/ojdbc.jar
$ export CLASSPATH=${ORACLE_HOME}/ojdbc.jar:$CLASSPATH
# using Oracle thin:
$ java -cp $CLASSPATH:${scripts_dir} jdbc_tester_generic oracle.jdbc.driver.OracleDriver seed seed jdbc:oracle:thin:@//db:1521/pdb1 "select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual"
The command-line parameters are:
arg 0 = oracle.jdbc.driver.OracleDriver
arg 1 = seed
arg 2 = seed
arg 3 = jdbc:oracle:thin:@//db:1521/pdb1
arg 4 = select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual
Loading driver [oracle.jdbc.driver.OracleDriver]:
JDBC Driver oracle.jdbc.driver.OracleDriver successfully loaded
Connecting to url [jdbc:oracle:thin:@//db:1521/pdb1] as user [seed]:
Successful connection to database
Executing SQL query [select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual]:
Retrieving the result set:
now in Oracle db: 08/01/2023 18:51:51
JDBC connection successfully tested, quitting …

# using Oracle thick, OCI-based drivers and the SQL*Net alias:
$ java -cp $CLASSPATH:${scripts_dir} jdbc_tester_generic oracle.jdbc.driver.OracleDriver seed seed jdbc:oracle:oci:@pdb1 "select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual"

The command-line parameters are:
arg 0 = oracle.jdbc.driver.OracleDriver
arg 1 = seed
arg 2 = seed
arg 3 = jdbc:oracle:oci:@pdb1
arg 4 = select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual
Loading driver [oracle.jdbc.driver.OracleDriver]:
JDBC Driver oracle.jdbc.driver.OracleDriver successfully loaded
Connecting to url [jdbc:oracle:oci:@pdb1] as user [seed]:
Successful connection to database
Executing SQL query [select 'now in Oracle db: ' || to_char(sysdate, 'DD/MM/YYYY HH24:MI:SS') as now from dual]:
Retrieving the result set:
now in Oracle db: 08/01/2023 18:52:37
JDBC connection successfully tested, quitting ...

This utility works for any RDBMS as long as its JDBC drivers are provided. Note how simpler the syntax for Oracle JDBC Thick is compared to Oracle Thin’s or PosgreSQL’s, as its required parameters are actually read from the tnsnames.ora using the service name as an index, which allows to reuse an existing configuration instead of extracting each required piece of information from it. Unfortunately, but understandably since it is Oracle-specific, it is not used by the Migration Utility in favor of the more universal, pure java one that works with any RDBMS.

Finally, the Migration Utility is launched and the requested changes are performed. Each one has its own log file so it is easy to check them for any error. As an example, here is the log for the change docbase id step:

Reading config.xml from path: config.xmlReading server.ini parameters
Retrieving server.ini path for docbase: repo02
Found path: /u01/dctm/bookstore/documentum/dba/config/repo02/server.ini
Set the following properties:
Docbase Name:repo02
Docbase ID:10002
New Docbase ID:20000
DBMS: postgres
DatabaseName: bookstore
SchemaOwner: bookstore
ServerName: localhost
PortNumber: 6000
DatabaseOwner: bookstore
-------- PostgreSQL JDBC Connection Testing ------
jdbc:postgresql://localhost:6000/bookstore
Connected to database
Utility is going to modify Objects with new docbase ID
Fri Jan 06 13:52:47 UTC 2023
-----------------------------------------------------------
Processing tables containing r_object_id column
-----------------------------------------------------------
-------- PostgreSQL JDBC Connection Testing ------
jdbc:postgresql://localhost:6000/bookstore
Connected to database
-----------------------------------------------------------
Update the object IDs of the Table: dm_replication_events with new docbase ID:4e20
-----------------------------------------------------------
Processing objectID columns
-----------------------------------------------------------
Getting all ID columns from database
-----------------------------------------------------------
Processing ID columns in each documentum table
Column Name: job_id
Update the ObjectId columns of the Table: with new docbase ID
-----------------------------------------------------------
Update the object IDs of the Table: dm_state_extension_s with new docbase ID:4e20
-----------------------------------------------------------
Processing objectID columns
-----------------------------------------------------------
Getting all ID columns from database
-----------------------------------------------------------
Processing ID columns in each documentum table
Column Name: r_object_id
Update the ObjectId columns of the Table: with new docbase ID
...

where ID 4e20 is the hexadecimal representation of the bookstore’s docbase id 20000. We can deduce that all the columns containing ids of all the tables of the bookstore schema are scanned for the old value as a substring and replaced with the new value as a substring. It is understandable that these changes can put quite a heavy load on the database and are quite touchy since any failure can corrupt its integrity. Hopefully, they are done in an efficient manner. However, in our case with an out of the box seed docbase, there are very few objects to change so this is not a concern and the utility completes very quickly.

Although the resulting docbase is perfectly usable, the migration utility leaves lots of things unchanged, e.g. the dm_location’s root directories on disk (they still contain the seed docbase’s hexadecimal id), the name of the index tablespace (if Oracle is used), the ACLs’ owner (i.e. the ACLs’ domain is still the seed’s database owner; this does not jeopardize the database but is not justified in an empty docbase; actually, by keeping them unchanged, potential applications that refer to acl_domain/object_name pairs are protected), etc. Thus, the documentum services are started and a final corrective pass is executed to normalize all those remaining traces of the prior incarnation.

A few jobs, such as the dm_ContentWarning, dm_StateOfDocbase and dm_ConsistencyChecker, are forced to be executed as soon as possible so some more post-instantiation checks can be done. The db-crawler script presented in the article db-crawler, a database search utility for Documentum is also launched in the background at the end against the Oracle schema to check if there are still remains of the former seed docbase. Likewise, for PostgreSQL schemas, the pg_dump utility is launched and the resulting text file is grepped for any occurrence of the former seed docbase which is basically what the db-crawler does, except much quicker.

Like with create_docbase.sh, and for the same reason, multiple concurrent instantiations may be launched simultaneously, as follows:

# Example:
for i in {1..10}; do
   time ./instantiate_docbase.sh SEED "DCTM${i}" &
done
where SEED and DCTM1_* to DCTM10_* are the respective docbases's stem "pointing" to the settings defined in the global_properties file.

# Another example:
./instantiate_docbase.sh DCTM2 bookstore &
./instantiate_docbase.sh SEED sales &
./instantiate_docbase.sh PUBLICATIONS research_papers &

See Part V here