dbi Blog

Some things never change: gravity, human foolishness, and the length of content servers’ installation time. Indeed, have you ever noticed how long it takes to create a new Documentum repository ? I’m not talking of the time spent interactively behind a keyboard to type commands or answer prompts from the installation and configuration tools, those are just tedious, but of the time actually spent by those tools to deliver the new repository once all the settings have been provided. Whether it happens in a VM or a container, interactive or automated, the final approach is the same slow one. Most of it is spent inside the database creating tables (414 tables), views (974 views) and indexes (536 indexes, including the implicit ones for non-nullity constraints), for a total of 1928 user objects in ContentServer v22.2, and populate them; plus, initializing the new repository with system objects, which will ultimately also affects the database. Comparatively, not so much time is spent at copying and configuring files around, even though there are about 13’000 of them in the CS v22.2. Clearly, if the database part could be sped up, it would make the whole process much less time-consuming. A lot of time, in the order of several total minutes, is also spent waiting for processes to start or stop, e.g. the method server; removing these inefficient delays could also substantially reduce the overall installation time.

So, here is a challenge: is there a way to speed up the creation of a repository by optimizing some of its steps ? As you can guess, there is, partly thanks to a tool Documentum has made available since Content Server v7.3, the Migration Utility.

In this article, I propose to use an out of the box docbase as a seed repository, copy it along with the Documentum binaries anywhere it is deemed necessary, on the same host or on a different one, and launch the Documentum’s Migration Utility to change its name, docbase id, host name and/or server name as needed. The procedure, let’s call it henceforth docbase instantiation, can be repeated as many times as needed, even in parallel (which is normally not recommended by the vendor), whenever new, fresh, distinct repositories are necessary, either temporarily or permanently.

Besides the technical challenge, the use cases are aplenty: quickly instantiate a docbase to rehearse a migration, apply an update or patch procedure, deploy a development docbase, test a Documentum’s feature inside a fresh repository, test an application’s deployment, rollback some heavy customizations, unit testing or application testing, etc., basically anything that pertains to development and testing. To be fair, nothing precludes applying this procedure in production once one has gained sufficient confidence with the docbases churned out this way.

To wet your appetite, let’s just say that on my 8-year old, puffing and panting but sturdy laptop, instantiating a new repository out of the seed one with a new name and id takes less than 6 minutes, considerably less that the 25 minutes of a standard, even silent, creation on the same machine. On a much more recent 4-year old machine, with a lighter virtualization configuration (linux containers vs. VMs), its took 2.5 minutes down from 15 minutes. On the same infrastructure but using a private and local PostgreSQL database instead of a remote Oracle one, this time drops down to less than 1 minute. Did I get your attention yet ? If so, read on !

Of course, the absolute timings don’t mean much as they are subject to the infrastructure in use but the ratio does: at least a remarkable 4-fold on old hardware, and a more than 15 times speed increase with a local PostgreSQL database on more recent hardware/infrastructure. Even on faster hardware, where the speed gain is less appealing, the automated instantiating procedure is still convenient thanks to its simplicity.

The seed docbase approach

So far, we used to clone repositories using long and daunting procedures (see e.g. Knowledge Base articles KB8966391, KB8715468, KB0528105, and KB0499567 at OpenText). While renaming a docbase is explained in OTX note KB7713200, before the Migration Utility there was no official in-place way to change a docbase id. Such a change must be applied to all the docbase’s objects metadata and all the references to object ids in configuration files (e.g. in server.ini) on disk. The risk is to forget or misinterpret some embedded docbase id in some strings in the database or in disk files; thus, only the vendor could really propose an exhaustive and reliable way to do it. Without the Migration Utility, the only safe way at our disposal to do that is to first create an empty repository with the new name and id, and importing in there the original repository’s documents, quite a complicated and boring process. Consequently, it is easier to just copy a whole docbase somewhere else and use it from there. Such clones are nonetheless quite practical as long as they don’t attempt to project to the same docbroker(s) (i.e. they don’t have any common docbroker) and their docbrokers are not listed in the same client’s dfc.properties file. A DFC client wishing to connect to any of the clones would either use a tailored dfc.properties file with the clone docbroker’s host as the sole occurrence of the clone docbase (plus, optionally, any other docbroker hosts of non conflicting repositories) as discussed in the article Connecting to a Repository via a Dynamically Edited dfc.properties File (part I) or use the enhancement presented in the article Connecting to Repositories with the Same Name and/or ID, which require some code editing and is therefore not possible in closed-source applications.

Experience shows that while stand-alone clones are acceptable at first, sooner or later they must be accessed together by the same client (e.g. some third-party service such as a pdf rendition service), and that’s where the name and id restrictions kick in. Failure to remember their common origin yields strange, hard to diagnose error messages, e.g. non-existing objects with ids seemingly coming out of thin air.

To remove all these limitations, the repository’s name and id have to be changed, which is the Migration Utility’s main purpose. Once done, as many clone repositories as needed can run on any machine, the same machine or different ones, and project to common docbrokers, or dedicated ones listed in the same client’s dfc.properties file. This enhancement was missed for so long that it is a real relief to finally have it, so kudos to OpenText.

Besides the docbase’s name and id, the Migration Utility also allows to change the host name, the owner name, server name and/or password, and all these from one single place. The most time consuming part is the id change because each and every repository object has an id (or more) containing the 5 hexadedimal digits of the docbase id which must be corrected. However, in a freshly created docbase containing less than 1’800 objects, this step is completed quickly at the database level. The Migration Utility itself can be used against any repository, not necessarily an empty one but of course the more objects it contains (e.g. in a production repository), the longer it can potentially take to complete, with a considerable stress on the database if the update operation is transactional. Here, as we are starting with an empty docbase, the migration cannot go any faster and, in case of a failure, the instantiation procedure using the seed can be restarted confidently as many times as needed after performing some clean up to roll back the changes done so far.

In order to prevent distracting issues, the copied repository must run under the same O/S version as the original one, which won’t be an limitation if the host is the same as the seed’s (e.g. identical machine or a precisely configured O/S such as in containers); the RDBMS’s version can be more relaxed as Documentum only uses very basic functionalities from it (tables, views and indexes, and foreign keys, not much more, although Oracle’s index-organized tables were also used in a few occasions), but it must be the same RDBMS software as the seed docbase’s (i.e. either Oracle or PostgreSQL in both the seed and the copy; the other RDBMS supported by Documentum, DB2 and SQLServer, were not tested). In this project, we worked in the following environments:

O/S

Oracle Linux 8.4 in a VirtualBox VM; also tested with Ubuntu Linux 22.0.4 (Jammy Jellyfish) in a linux container managed by Proxmox.

RDBMS

Oracle RDBMS v12.1.0.1.0 (12c) on a remote VM, the same database for both the seed and clone docbases but distinct schemas of course.

Also tested with Oracle RDBMS v21.3.0.0.0 in a linux container running Oracle Linux 8.6, and with PostgreSQL v15.1, the most recent as of this writing.

Database connectivity

Oracle InstantClient 21.7.0.0 and ODBC drivers v42.5.1 for PostgreSQL.

For the Migration Utility: Oracle JDBC drivers included with the InstantClient and JDBC v42.5.1 for PostgreSQL.

JDK

AWS Corretto JDK v11.0.14.

Content Server

Content Server v22.2 and v22.4.

Some of the above products’ versions are not implicitly mentioned in the system requirement guides, which does not mean they don’t work, only that they were not tested, which is understandable given the large number of possible products’ and versions’ combinations. In this project, we seize the opportunity to test the latest available versions of each component, and downgrade in case an issue was encountered, a way to stay on the bleeding edge of the platform.

Since a docbase consists of the Documentum binaries, configuration files, content files (collectively grouped under the “Documentum files”) and a database schema, each of these parts of a seed repository must be addressed. Once the seed docbase along with its docbroker and method server have been created and shut down, its logs and caches will be deleted, and the ${DOCUMENTUM} tree and the content files will be moved into a tar ball. When using an Oracle database, the docbase’s schema will be exported and the dump file added to the archive, When using a PostgreSQL database, the whole tree including binaries and data, will be added to the archive. The compressed archive can then be stored on a convenient location (e.g. a network drive, a git or nexus repository, etc.) ready to be used by anyone, mostly developers or administrators. After that, the seed docbase can be removed from the host and from the database (if using Oracle) as it is no longer needed. Should it ever be, it can be restored from the tar ball or instantiated out of itself since, unsurprisingly, the instantiation process is idempotent.

On the receiving end, the instantiating procedure will explode the tarball to another root directory, possibly on another machine after having created an O/S account for the content server files’ owner. Next, if an Oracle database is used, it will create a db account for the new docbase and import the dumped data into the db schema. If a PostgreSQL database is used, no data import is necessary since the whole database, binaries included, has been extracted from the tarball; only some quick renaming of the database, schema and user are performed. Next, a few adjustments will be applied in the restored configuration files to match their new location and schema, and finally the Migration Utility will be run. A few final corrections will be applied and, optionally, a check of the db schema and files under ${DOCUMENTUM} too.

This approach implies that the Documentum binaries are considered part of the repository, just like its documents, which is a common situation in dedicated VMs and containers. One pro of this approach is that it takes care of the installation of the binaries as well, they are simply extracted from the tarball, a relief knowing how long the serverSetup.sh program can take to execute. When using PostgreSQL as the RDBMS, the database becomes part of the seed package, binaries and data, and is treated as if it were a component of the repository. Maybe one day, OpenText will propose an optional installation of a PostgreSQL database aside the content server, just like it currently does with tomcat and jboss as the method servers; being open source, there are no legal deterrents in doing so.

A single parameter file, global_parameters, defines all the parameters for the creation of the seed docbase and its future instantiations. Actually, there is nothing special in a seed docbase, it is just a role; any docbase created by the create_docbase.sh script can later be used as a seed. If only the creation of a docbase is wanted, just by setting suitable parameters in that file, an automated installation of the binaries and a repository creation is possible the traditional – but slow – way. Even without the benefits of a full instantiation, this is already quite a useful by-product of the project.

The complete procedure consists of the following general steps:

1. Edit the global_parameters file and provide all the required settings for the machine that will host the repositories;

2. Enroll a machine for the seed docbase’s installation, see pre-requisites.sh;

3. Edit the global_parameters file and provide all the required settings for the repositories to be created, the seed and any instantiation of it to come;

4. Create a repository, archive its ${DOCUMENTUM} directory along with its extracted database schema or full database if using PostgreSQL, see script create_docbase.sh; this is done only once and the docbase will be the model – or seed – for all future instantiations;

5. For each docbase to instantiate, execute the script instantiate_docbase.sh with as parameters the docbase to use as the seed and the docbase to instantiate from it. The detailed parameters will be taken from global_parameters. If the docbases must be created on different hosts, enroll those hosts first using pre-requisites.sh;

Let’s now see all the above scripts. In order to save space, only code snippets are included in the current article. See dbi services github for the complete scripts.

The prerequisites

The script is accessible here: dbi services’ github.

Prior to running the scripts, the following conditions must be met:

1. An existing directory location to hold the global_parameters file and the scripts; at the start of the scripts’ execution, the current working directory will be changed there. This directory can be a mounted volume or it can be copied onto the machine where the seed docbase and/or the instantiations will be created; it will be referred to as global_parameters.${scripts_dir}. The notation global_parameters.${parameter_name} means the value of parameter_name as defined in the global_parameters file;

2. The global_parameters file edited to suit the needs (see next paragraph); that’s where all the parameters below are taken from;

3. An installation volume global_parameters.${binaries_root}. If it does not exist, it is created under /. If a mounted volume must be used, be sure to mount it first so the installation directory gets created there;

4. Download into global_parameters.${dctm_software} the Content Server from OpenText, e.g. documentum_server_22.4_linux64_oracle.tar and documentum_server_22.4_linux64_postgres.tar. The other packages get downloaded anonymously by create_docbase.sh as needed but the content server needs a logged in session at OpenText (and a contractual account).

If the PostgreSQL RDBMS is used, nothing is required at the database level since the scripts have a complete control over this locally installed software.

However, if the selected RDBMS is Oracle, the requirements are clearly more complicated. An existing database is required that is accessible through global_parameters.${db_server_host_ip_address} plus its service name (global_parameters.${db_service_name}) and listener port (global_parameters.${db_listener_port}); the InstantClient’s local tnsnames.ora will be created and filled in with that information. Also, an account is required on the database host’s, typically oracle (cf. global_parameters.${db_server_host_account}/global_parameters.${db_server_host_password}), to remotely launch the dump and load utilities, and transfer the dump file from/to the database host.

Moreover, the sys account’s credentials global_parameters.${db_sys_account}/global_parameters.${db_sys_password} is needed to manipulate the repositories’schemas while connected using global_parameters.${db_remote_connect_string}.

Note that the server host account and the sys account are unnecessary if the steps to perform under those accounts are delegated to the DBAs, which is a must in some large organizations where personnel’s’ roles are very specialized. Therefore, some coordination with them is needed, which may make the whole process not as straightforward and fast as intended. The schema’s dumps after the seed docbases are created and their subsequent loads when instantiating new docbases can also be done by the DBAs. The same applies to the schemas if they are created beforehand. Adapt the scripts as needed by the actual situation.

After the above conditions are satisfied where applicable, the script prerequisites.sh can be invoked as root to enroll the machine with the settings from global_parameters. It will perform the following actions:

1. Set the FQDN of the machine (global_parameters.${dctm_machine} and global_parameters.${dctm_domain}, and append it in /etc/hosts;

2. Set IP aliases for the content server machine (cs) and the database (global_parameters.${db_server_host_alias} for global_parameters.${db_server_host_ip_address});

2. Install sshpass; this program allows a non-interactive ssh connection without being prompted by taking the password from an environment variable or the command-line and passing it to ssh. It is used in order to fully automate the scripts’ execution only when the RDBMS is a supposedly remote Oracle database. If PostgreSQL is used, sshpass is not necessary since those databases are always local;

3. Install a few other utilities such as unzip, gawk and curl used by the scripts. Additionally, tcl and expect are also installed although they may not be necessary since the Documentum programs serverSetup.bin and dm_launch_server_config_program.sh are invoked non interactively, i.e. during silent installations;

4. Create an installation owner account on current machine (defined as global_parameters.${dctm_machine}), usually dmadmin, defined as global_parameters.${dctm_owner}. In order to simplify the scripting and to execute the Documentum root tasks, dmadmin is made sudoer with no password required; This is not an issue as most of the time dmadmin runs on a fully dedicated VM or container and cannot hurt anybody except itself, which is already possible with no additional privilege. Anyway, the privilege can be revoked later once the procedure is completed;

5. Create the global_parameters.${binaries_root} sub-directory where the repositories will be created on the enrolled machine. Make sure the parent volume is large enough to contain the docbases to create locally;

Again, points 1 to 4 depends on the organization; sometimes, the machines (e.g. VMs) are delivered already configured; sometimes the root account is a no go or under some strict conditions. Therefore, those steps are only applicable in private machines under complete control of an individual. Adapt the procedure as needed.

There are also a few commented out kernel settings related to limits as recommended by Documentum in the OpenText™ Documentum™ System Upgrade and Migration Guide for CS22.2. They may apply in VMs or non-virtualized environments. In our Promox linux container environment, some of them may only be set at the host’s level. Activate them as needed, although in a light developer’s environment those limits will probably never be hit.

See Part II here

Post Views: 4,916

A quick repository creation utility (part I)

The seed docbase approach

The prerequisites

Middleware Team

Leave a Reply:

Related blog articles

The seed docbase approach

The prerequisites

Middleware Team

Leave a Reply:

Related blog articles

The lifecycle of a document is more important than the document itself

Why search is the most underrated ECM feature

Designing metadata cards that users like

M-Files Outlook Pro add-in configuration