As the saying goes, “A Picture Is Worth A Thousands Words”. I’d add “And A Simple Graph Is Worth A Long, Abstruse List of Numbers”. And of words too, so let’s show off a little bit:
Screenshot from 2018-11-03 19-39-57
Interested ? Then, please read on.
It happens not so infrequently that we wish we could quickly plot a few numbers to have a look at their overall trend. The usual procedure is to output the numbers into a csv file, transfer it to a desktop machine, subsequently import it into a spreadsheet program and interactively set up a chart, a tedious manual procedure at best, especially if it has to be repeated several times.
What if we could query a few relevant values from a system and plot them in one go to visualize the produced graphs from within a browser ? What if we could generalize that procedure to any extracted tabular data ? I’m not talking of a sophisticated interface to some enterprise-class graphing tool but just of a simple way to get a quick visual overall feeling of some variables with as little installed software as possible.
I’ll show here how to do that for a Documentum repository but the target system can be anything, a database, an ldap, an O/S, just adapt the queries and the script as needed.
To simplify, I assume that we generally work on a server machine, likely a Linux headless VM, and that the browser runs remotely on any GUI-based desktop, quite a common configuration for Documentum.

The Data

As an administrator, I often have to connect to docbases I never visited before, and I find it useful to run the following queries to help me make acquaintance with those new beasts. In the examples below, the target docbase is an out of the box one with no activity in it, which explains the low numbers and lack of custom doctypes:

— 1. what distinct document types are there ?

select r_object_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type order by 1
r_object_type count(*) tot_size
-------------------------------- ---------------------- ----------------------
dm_document 1295 6384130
dm_esign_template 1 46255
dm_format_preferences 1 109
dm_menu_system 2 352034
dm_plugin 2 212586
dm_xml_config 1 534
dmc_jar 236 50451386
dmc_preset_package 2 11951
dmc_tcf_activity_template 10 12162
(9 rows affected)

A reminder of some definitions: The result table above is a dataset. The r_object_type is the category, typically shown in the X-axis. The count(*) and tot_size columns are the “variables” to be plot, typically as bars, lines or pie slices. They are also named “traces” in some graphing tools. Some of the datasets here have 1, 2 or 3 variables to be plot. Datasets with 1 variable can be plot as bars, lines or pies charts. Datasets with 2 variables can be plot as grouped or stacked bars, or as 2 distinct graphs of one variable. There are many possibilities and combinations, and the choice depends on which representation offers the best visual clarity. Sometimes, the plotting library even lets one edit the programmatically generated graph and interactively choose the best type of graph with no coding needed !

— 2. how does their population vary over time ?

select r_object_type, datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, datefloor(month, r_creation_date) order by 1, 2;
r_object_type creation_month count(*) tot_size
-------------------------------- ------------------------- ------------ ------------
dm_document 11/1/2017 01:00:00 448 3055062
dm_document 12/1/2017 01:00:00 120 323552
dm_document 3/1/2018 01:00:00 38 66288
dm_document 4/1/2018 02:00:00 469 1427865
dm_document 5/1/2018 02:00:00 86 584453
dm_document 6/1/2018 02:00:00 20 150464
dm_document 7/1/2018 02:00:00 40 301341
dm_document 8/1/2018 02:00:00 32 151333
dm_document 9/1/2018 02:00:00 46 356386
dm_esign_template 11/1/2017 01:00:00 1 46255
dm_format_preferences 11/1/2017 01:00:00 1 109
dm_menu_system 11/1/2017 01:00:00 2 352034
dm_plugin 11/1/2017 01:00:00 2 212586
dm_xml_config 11/1/2017 01:00:00 1 534
dmc_jar 11/1/2017 01:00:00 236 50451386
dmc_preset_package 11/1/2017 01:00:00 2 11951
dmc_tcf_activity_template 11/1/2017 01:00:00 10 12162
(17 rows affected)

This query tells how heavily used the repository is. Also, a broad spectre of custom document types tends to indicate that the repository is used in the context of applications.

— 3. how changing are those documents ?

select r_object_type, datefloor(month, r_modify_date) as "modification_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by r_object_type, datefloor(month, r_modify_date) order by 1, 2;
r_object_type modification_month count(*) tot_size
-------------------------------- ------------------------- ------------ ------------
dm_document 11/1/2017 01:00:00 127 485791
dm_document 12/1/2017 01:00:00 33 122863
dm_document 3/1/2018 01:00:00 16 34310
dm_document 4/1/2018 02:00:00 209 749370
dm_document 5/1/2018 02:00:00 42 311211
dm_document 6/1/2018 02:00:00 10 79803
dm_document 7/1/2018 02:00:00 20 160100
dm_document 8/1/2018 02:00:00 12 81982
dm_document 9/1/2018 02:00:00 23 172299
dm_esign_template 8/1/2018 02:00:00 1 46255
dmc_jar 11/1/2017 01:00:00 14 1616218
dmc_preset_package 11/1/2017 01:00:00 2 11951
dmc_tcf_activity_template 11/1/2017 01:00:00 10 12162
(13 rows affected)

This query shows if a repository is used interactively rather than for archiving. If there are lots of editions, the docbase is a lively one; on the contrary, if documents are rarely or never edited, the docbase is mostly used for archiving. The document ownership can tell too, technical accounts vs. real people.

— samething but without distinction of document type;
— 4. new documents;

select datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date) order by 1;
creation_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 703 54142079
12/1/2017 01:00:00 120 323552
3/1/2018 01:00:00 38 66288
4/1/2018 02:00:00 469 1427865
5/1/2018 02:00:00 86 584453
6/1/2018 02:00:00 20 150464
7/1/2018 02:00:00 40 301341
8/1/2018 02:00:00 32 151333
9/1/2018 02:00:00 42 323772
(9 rows affected)

— 5. modified documents;

select datefloor(month, r_modify_date) as "modification_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by datefloor(month, r_modify_date) order by 1;
modification_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 153 2126122
12/1/2017 01:00:00 33 122863
3/1/2018 01:00:00 16 34310
4/1/2018 02:00:00 209 749370
5/1/2018 02:00:00 42 311211
6/1/2018 02:00:00 10 79803
7/1/2018 02:00:00 20 160100
8/1/2018 02:00:00 13 128237
9/1/2018 02:00:00 20 140354
(9 rows affected)

— 6. what content types are used in the repository ?

select a_content_type, count(*) as "count_content_type", sum(r_full_content_size) as "tot_content_size" from dm_document(all) group by a_content_type order by 1
col a_content_type 20
a_content_type count_content_type tot_content_size
-------------------- ------------------ ----------------
11 0
amipro 1 4558
crtext 43 389681
dtd 5 163392
excel12bbook 1 8546
excel12book 1 7848
excel12mebook 1 7867
excel12metemplate 1 7871
excel12template 1 7853
excel5book 1 15360
excel8book 1 13824
excel8template 1 13824
ibmshrlib 2 212586
jar 233 48856783
java 3 1594603
maker55 5 117760
mdoc55 9 780288
ms_access7 1 83968
ms_access8 1 59392
ms_access8_mde 1 61440
msw12 1 11280
msw12me 1 10009
msw12metemplate 1 10004
msw12template 1 9993
msw6 1 11776
msw8 1 19456
msw8template 1 27136
pdf 2 50214
powerpoint 1 14848
ppt12 1 29956
ppt12me 1 29944
ppt12meslideshow 1 29943
ppt12metemplate 1 29941
ppt12slideshow 1 29897
ppt12template 1 29914
ppt8 1 7680
ppt8_template 1 9728
text 1175 4236008
ustn 1 10240
vrf 1 158993
wp6 1 941
wp7 1 1362
wp8 1 1362
xml 28 37953
zip 1 255125
(45 rows affected)

The content type may indicates the kind of activity the repository is used for. While pdf are generally final documents suitable for archiving, a predominance of illustrator/pagemaker/QuarkXPress vs photoshop vs multimedia vs Office documents can give a hint at the docbase’s general function.
Since most content types are unused (who uses Word Perfect or Ami Pro any more ?), a constraint on the count can be introduced, e.g. “having count(*) > 10” to filter out those obsolete formats.

— 7. and how do they evolve over time ?

select datefloor(month, r_creation_date) as "creation_month", a_content_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date), a_content_type order by 1, 2
creation_month a_content_type count(*) tot_size
------------------------- -------------------- ------------ ------------
11/1/2017 01:00:00 1 0
11/1/2017 01:00:00 amipro 1 4558
11/1/2017 01:00:00 crtext 43 389681
11/1/2017 01:00:00 dtd 5 163392
11/1/2017 01:00:00 excel12bbook 1 8546
11/1/2017 01:00:00 excel12book 1 7848
11/1/2017 01:00:00 excel12mebook 1 7867
11/1/2017 01:00:00 excel12metemplate 1 7871
11/1/2017 01:00:00 excel12template 1 7853
11/1/2017 01:00:00 excel5book 1 15360
11/1/2017 01:00:00 excel8book 1 13824
11/1/2017 01:00:00 excel8template 1 13824
11/1/2017 01:00:00 ibmshrlib 2 212586
11/1/2017 01:00:00 jar 233 48856783
11/1/2017 01:00:00 java 3 1594603
11/1/2017 01:00:00 maker55 5 117760
11/1/2017 01:00:00 mdoc55 9 780288
11/1/2017 01:00:00 ms_access7 1 83968
11/1/2017 01:00:00 ms_access8 1 59392
11/1/2017 01:00:00 ms_access8_mde 1 61440
11/1/2017 01:00:00 msw12 1 11280
11/1/2017 01:00:00 msw12me 1 10009
11/1/2017 01:00:00 msw12metemplate 1 10004
11/1/2017 01:00:00 msw12template 1 9993
11/1/2017 01:00:00 msw6 1 11776
11/1/2017 01:00:00 msw8 1 19456
11/1/2017 01:00:00 msw8template 1 27136
11/1/2017 01:00:00 pdf 2 50214
11/1/2017 01:00:00 powerpoint 1 14848
11/1/2017 01:00:00 ppt12 1 29956
11/1/2017 01:00:00 ppt12me 1 29944
11/1/2017 01:00:00 ppt12meslideshow 1 29943
11/1/2017 01:00:00 ppt12metemplate 1 29941
11/1/2017 01:00:00 ppt12slideshow 1 29897
11/1/2017 01:00:00 ppt12template 1 29914
11/1/2017 01:00:00 ppt8 1 7680
11/1/2017 01:00:00 ppt8_template 1 9728
11/1/2017 01:00:00 text 338 906940
11/1/2017 01:00:00 ustn 1 10240
11/1/2017 01:00:00 vrf 1 158993
11/1/2017 01:00:00 wp6 1 941
11/1/2017 01:00:00 wp7 1 1362
11/1/2017 01:00:00 wp8 1 1362
11/1/2017 01:00:00 xml 28 37953
11/1/2017 01:00:00 zip 1 255125
12/1/2017 01:00:00 text 120 323552
3/1/2018 01:00:00 text 38 66288
4/1/2018 02:00:00 text 469 1427865
5/1/2018 02:00:00 text 86 584453
6/1/2018 02:00:00 text 20 150464
7/1/2018 02:00:00 text 40 301341
8/1/2018 02:00:00 10 0
8/1/2018 02:00:00 text 22 151333
9/1/2018 02:00:00 text 42 323772
(54 rows affected)

— 8. where are those contents stored ?

select a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_storage_type order by a_storage_type
col a_storage_type 20
a_storage_type count(*) tot_size
-------------------- ------------ ------------
11 0
filestore_01 1539 57471147
(2 rows affected)

Filestores are conceptually quite similar to Oracle RDBMS tablespaces. It is a good practice to separate filestores by applications/type of documents and/or even by time if the volume of documents in important. This query tells if this is in place and, if so, what the criteria were.

— 9. dig by document type;

select r_object_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, a_storage_type order by r_object_type, a_storage_type
r_object_type a_storage_type count(*) tot_size
-------------------------------- -------------------- ------------ ------------
dm_document 11 0
dm_document filestore_01 1284 6384130
dm_esign_template filestore_01 1 46255
dm_format_preferences filestore_01 1 109
dm_menu_system filestore_01 2 352034
dm_plugin filestore_01 2 212586
dm_xml_config filestore_01 1 534
dmc_jar filestore_01 236 50451386
dmc_preset_package filestore_01 2 11951
dmc_tcf_activity_template filestore_01 10 12162
(10 rows affected)

— 10. same but by content format;

select a_content_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, a_storage_type having count(*) > 10 order by a_content_type, a_storage_type
a_content_type a_storage_type count(*) tot_size
-------------------------------- --------------- ---------------------- ----------------------
crtext filestore_01 43 389681
jar filestore_01 233 48856783
text filestore_01 1204 4432200
xml filestore_01 28 37953
(4 rows affected)

— 11. what ACLs do exist and who created them ?

select owner_name, count(*) from dm_acl group by owner_name order by 2 desc
col owner_name 30
owner_name count(*)
------------------------------ ------------
dmadmin 179
dmtest 44
dm_bof_registry 6
dmc_wdk_preferences_owner 2
dmc_wdk_presets_owner 2
dm_mediaserver 1
dm_audit_user 1
dm_autorender_mac 1
dm_fulltext_index_user 1
dm_autorender_win31 1
dm_report_user 1
(11 rows affected)

Spoiler alert: the attentive readers have probably noticed the command “col” preceding several queries, like in “col name 20”; this is not a standard idql command but part of an extended idql which will be presented in a future blog.
ACLs are part of the security model in place, if any. Usually, the owner is a technical account, sometimes a different one for each application or application’s functionality or business line, so this query can show if the repository is under control of applications or rather simply used as a replacement for a shared drive. Globally, it can tell how the repository’s security is managed, if at all.

— 12. ACLs most in use;

select acl_name, count(*) from dm_document(all) where acl_name not like 'dm_%' group by acl_name having count(*) >= 10 order by 1
acl_name count(*)
-------------------------------- ----------------------
BOF_acl 233
(1 row affected)

Here too, a filter by usage can be introduced so that rarely used acls are not reported, e.g. “having count(*) > 10”.

— 13. queue items;

select name, count(*) from dmi_queue_item group by name order by 2 desc
name count(*)
-------------------- ------------
dm_autorender_win31 115
dmadmin 101
dmtest 54
(3 rows affected)

This query is mainly useful to show if external systems are used for generating renditions, thumbnails or any other document transformation.

— non quantifiable queries;

select object_name from dm_acl where object_name not like 'dm_%' order by 1
object_name
--------------------------------
BOF_acl
BOF_acl2
BPM Process Variable ACL
BatchPromoteAcl
Default Preset Permission Set
Global User Default ACL
WebPublishingAcl
Work Queue User Default ACL
dce_world_write
desktop_client_acl1
replica_acl_default
(11 rows affected)
 
col name 20
col root 30
col file_system_path 70
select f.name, f.root, l.file_system_path from dm_filestore f, dm_location l where f.root = l.object_name
name root file_system_path
-------------------- ------------------------------ ----------------------------------------------------------------------
filestore_01 storage_01 /home/dmadmin/documentum/data/dmtest/content_storage_01
thumbnail_store_01 thumbnail_storage_01 /home/dmadmin/documentum/data/dmtest/thumbnail_storage_01
streaming_store_01 streaming_storage_01 /home/dmadmin/documentum/data/dmtest/streaming_storage_01
replicate_temp_store replicate_location /home/dmadmin/documentum/data/dmtest/replicate_temp_store
replica_filestore_01 replica_storage_01 /home/dmadmin/documentum/data/dmtest/replica_content_storage_01
(5 rows affected)

No numbers here, just plain text information.

There is of course much more information to query (e.g. what about lifecycles and workflows, users and groups ?) depending on what one needs to look at in the repositories, but those are enough examples for our demonstration’s purpose. The readers can always add queries or refine the existing ones to suit their needs.
At this point, we have a lot of hard-to-ingest numbers. Plotting them gives a pleasant 2-dimensional view of the data and will allow to easily compare and analyze them, especially if the generated plot offers some interactivity.

The Graphing Library

To do this, we need a graphing library and since we want to be able to look at the graphs from within a browser for platform-independence and zero-installation on the desktop, we need one that lets us generate an HTML page containing the charts. This is possible in 2 ways:
o   the library is a javascript one; we will programmatically build up json literals, pass them to the library’s plotting function and wrap everything into an html page which is saved on disk for later viewing;
o   the library is usable from several scripting languages and includes a function to generate an HTML page containing the graph; we will chose one for python since this language is ubiquitous and we have now a binding for Documentum (see my blog here);
Now, there are tons of such javascript libraries available on-line (see for example here for a sample) but our choice criteria will be simple:
o   the library must be rich enough but at the same time easy to use;
o   it must do its work entirely locally as servers are generally not allowed to access the Internet;
o   it must be free in order to get rid of the licensing complexity as it can often be overkill in such a simple and confined usage;
As we must pick one, let’s choose Plotly since it is very capable and largely covers our needs. Besides, it works either as a javascript or as a python library but for more generality let’s just use it as a javascript library and generate ourselves the html pages. As an additional bonus, Plotly also allows to interactively edit a plotted graph so it can be tweaked at will. This is very convenient because it lessens the effort to optimize the graphs’ readability as this can be done later by the users themselves while viewing the graphs. For example, the users can zoom, unselect variables, move legends around and much more.
To install it, simply right-click here and save the link to the default graph location in the project’s directory, e.g. /home/dmadmin/graph-stats/graphs.

pwd
/home/dmadmin/graph-stats/graphs
ls -l
...
-rw-rw-r-- 1 dmadmin dmadmin 2814564 Oct 11 11:40 plotly-latest.min.js
...

That’s less than 2.7 MiB of powerful compact js source code. Since it is a text file, there shouldn’t be any security concern in copying that file on a server. Also, as it is executed inside the sandbox of a browser, security is normally not an issue.
The library needs to be in the same directory as the generated html files. If those files are copied to a desktop for direct viewing from a browser with File/Open, don’t forget to copy over the Plotly.js library too, open the html file in an editor, go to the line 6 below and set src to the path of the html file:

1
2
3
4
5
6
7
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
   <head>
      <title>Graphical Overview of Repository SubWay</title>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <script src="...../plotly-latest.min.js"></script>
   </head>

Viewing the html pages

Since those pages must be accessed from any machine on the network (typically, a desktop with a browser), despite they may be created directly on the server that hosts the repositories of interest, we need a web server. It may happen that a disk volume is shared between desktops and servers but this is far from being the general case. Fortunately, python is very helpful here and saves us from the tedious task of setting up a full-fledged web server such as apache. The one-liner below will start a server for visualizing the files in the current directory and its sub-directories:

python3 -m http.server [port] # for python >= v3
python -m SimpleHTTPServer [port] # for python 2.x

It can also be moved in the background to free the command-line. To stop it, bring it in the foreground and type ctrl-C.
Its default port is 8000.
This mini-web server is now waiting for requests on the given port and its IP address. In order to determine that IP address, use the ifconfig command and try one address that is on the same network as the desktop with the browser. Then, load the URL, e.g. http://192.168.56.10:8000/, and you’ll be greeted by a familiar point-and-click interface.
The server presents a directory listing from which to select the html files containing the generated graphs.

An Example

Here is a complete example, from the query to the graph.
The 4th DQL query and its output:

select datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date) order by 1;
creation_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 703 54142079
12/1/2017 01:00:00 120 323552
3/1/2018 01:00:00 38 66288
4/1/2018 02:00:00 469 1427865
5/1/2018 02:00:00 86 584453
6/1/2018 02:00:00 20 150464
7/1/2018 02:00:00 40 301341
8/1/2018 02:00:00 32 151333
9/1/2018 02:00:00 42 323772
(9 rows affected)

The command to generate the graph:

pwd
/home/dmadmin/graph-stats
./graph-stats.py --docbase dmtest

A simplified generated html page showing only the count(*) variable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
      <head>
         <title>Statistical Graphs for Repository dmtest</title>
         <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
         <script src="http://192.168.56.10:8000/plotly-latest.min.js"></script>
      </head>
      <body>
      <div id="TheChart" style="width:400px;height:400px;"></div>
      <script>
         var ctx = document.getElementById("TheChart");
         Plotly.newPlot(ctx, [{
                             type: "bar",
                             name: "new docs",
                             x: ['2017/11', '2017/12', '2018/03', '2018/04', '2018/05', '2018/06', '2018/07', '2018/08', '2018/09'],
                             y: ['703', '120', '38', '469', '86', '20', '40', '32', '42'],
                             marker: {
                                        color: '#4e6bed',
                                        line: {
                                                 width: 2.5
                                              }
                                     }
                            }],
                            {
                               title: "New Documents",
                               font: {size: 18}
                            },
                        {responsive: true});
      </script>
      </body>
</html>

Its resulting file:

ll graphs
total 2756
-rw-rw-r-- 1 dmadmin dmadmin 2814564 Oct 12 14:18 plotly-latest.min.js
-rw-rw-r-- 1 dmadmin dmadmin 1713 Oct 12 14:18 dmtest-20181012-141811.html

Same but exposed by the mini-web server and viewed from a browser:
Screenshot from 2018-11-03 15-02-56
When clicking on one of the html file, a graph such as the one below is displayed:
Screenshot from 2018-11-03 15-08-52

The python script

The following python script runs the 13 queries above and produce one large HTML page containing the graphs of all the datasets. It can also be used to query a list of local docbases and generate one single, huge html report or one report per docbase. It can even directly output the produced html code to stdout for further in-line processing, e.g. to somewhat compact the javascript code:

./graph-stats.py --docbase dmtest -o stdout | gawk '{printf $0}' > graphs/dmtest-gibberish.html
 
wc -l graphs/-gibberish.html
0 graphs/-gibberish.html
 
less graphs/dmtest-gibberish.html
Screenshot from 2018-11-03 15-26-11

So, here is the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
#!/usr/bin/env python
 
# 10/2018, C. Cervini, dbi-services;
  
import sys
import getopt
from datetime import datetime
import json
import DctmAPI
 
def Usage():
   print("""
Usage:
Connects as dmadmin/xxxx to a local repository and generates an HTML page containing the plots of several DQL queries result;
Usage:
       ./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} \[-o|--output_file <output_file>\]
<docbase> can be one repository or a comma-separated list of repositories;
if <output_file> is omitted, the html page is output to ./graphs/<docbase>-$(date +"%Y%m%d-%H%M%S").html;
<output_file> can be "stdout", which is useful for CGI programming;
Example:
       ./graph-stats.py -d dmtest,mail_archive,doc_engineering
will query the docbases dmtest, mail_archive and doc_engineering and output the graphs to the files <docbase>-$(date +"%Y%m%d-%H%M%S").html, one file per docbase;
       ./graph-stats.py -d mail_archive,doc_engineering -o all_docbase_current_status
will query the docbases mail_archive and doc_engineering and output all the graphs to the unique file all_docbase_current_status;
       ./graph-stats.py -d mail-archive
will query docbase mail-archive and output the graphs to the file mail-archive-$(date +"%Y%m%d-%H%M%S").html;
       ./graph-stats.py --docbase dmtest --output dmtest.html
will query docbase dmtest and output the graphs to the file dmtest.html;
       ./graph-stats.py -d dmtest --output stdout
will query docbase dmtest and output the graphs to stdout;
""")
 
def Plot2HTML(div, graph_title, data, data_labels, bLineGraph = False, mode = None):
   global html_output
   if None != html_output:
      sys.stdout = open(html_output, "a")
 
   # start the html page;
   if "b" == mode:
      global server, page_title
      print('''
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
   <head>
      <title>''' + page_title + '''</title>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <script src="http://''' + server + '''/plotly-latest.min.js"></script>
   </head>
   <body>
   <center><h3>''' + page_title + '''</h3></center>
''')
 
   # append to the body of the html page;
   if len(data_labels) == 4:
      stack_labels = {}
      for point in data:
         stack_labels[point[data_labels[0]]] = 0
      print('''
      <div style="width:1500px;height:600px;">
          <div id="''' + div + '-' + data_labels[2] + '''" style="width:50%; float:left;"></div>
          <div id="''' + div + '-' + data_labels[3] + '''" style="width:50%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '-' + data_labels[2] + '''");
      ''')
      variables = ""
      for stack in stack_labels:
         x = [point[data_labels[1]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         y1 = [point[data_labels[2]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         variables +=  ("" if not variables else ", ") + "data_" + stack
         if bLineGraph:
            vars()['data_' + stack] = {
                                        'name': stack,
                                        'type': "scatter",
                                        'mode': "lines",
                                        'x': x,
                                        'y': y1,
                                      };
         else:
            vars()['data_' + stack] = {
                                        'name': stack,
                                        'type': "bar",
                                        'x': x,
                                        'y': y1,
                                        'width': 0.25,
                                        'marker': {
                                                     #'color': '#009933',
                                                     'line': {
                                                                'width': 1.0
                                                             }
                                                  }
                                      };
         print('data_' + stack + ' = ' + json.dumps(vars()['data_' + stack]) + ';')
         layout = {
                     'title': '<b>' + graph_title + '<br>' + data_labels[2] + '</b>',
                     'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                     'font': {'size': 12},
                     'width': 750,
                     'height': 600,
                     'xaxis': {
                                 'title': '<b>' + data_labels[1] + '</b>',
                                 'titlefont': {'size': 10},
                                 'tickangle': -45,
                                 'tickfont': {'size': 8},
                                 'zeroline': True,
                                 'showline': True,
                                 'categoryorder': "category ascending",
                                 'type': "category"
                              },
                     'yaxis': {
                                 'title': '<b>' + data_labels[2] + '</b>',
                                 'titlefont': {'size': 10},
                                 'zeroline': True,
                                 'showline': True,
                                 'tickfont': {'size': 8},
                                 'showgrid': False
                              }
                  }
         if not bLineGraph:
            layout.update({'barmode': "stack", 'bargap': 0.15, 'bargroupgap': 0.5})
         interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''Plotly.newPlot(ctx,
                              [''' + variables + '], ' +
                              json.dumps(layout) + ',' +
                              json.dumps(interaction) + ''');''')
      for stack in stack_labels:
         x = [point[data_labels[1]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         y2 = [point[data_labels[3]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         if bLineGraph:
            vars()['data_' + stack] = {
                                         'name': stack,
                                         'type': "scatter",
                                         'mode': "lines",
                                         'x': x,
                                         'y': y2
                                      };
         else:
            vars()['data_' + stack] = {
                                         'name': stack,
                                         'type': "bar",
                                         'x': x,
                                         'y': y2,
                                         'width': 0.25,
                                         'marker': {
                                                      #'color': '#009933',*/
                                                      'line': {
                                                                 'width': 1.0
                                                              }
                                                   }
                                      };
         print('data_' + stack + ' = ' + json.dumps(vars()['data_' + stack]) + ';')
      layout = {
                  'title': '<b>' + graph_title + '<br>' + data_labels[3] + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {
                             'size': 12
                          },
                  'width': 750,
                  'height': 600,
                  'xaxis': {
                            'title': '<b>' + data_labels[1] + '</b>',
                            'titlefont': {'size': 10},
                            'tickangle': -45,
                            'tickfont': {'size': 8},
                            'zeroline': True,
                            'showline': True,
                            'categoryorder': "category ascending",
                            'type': "category"
                         },
                  'yaxis': {
                    'title': '<b>' + data_labels[3] + '</b>',
                    'titlefont': {'size': 10},
                    'zeroline': True,
                    'showline': True,
                    'tickfont': {'size': 8},
                    'showgrid': False
                  }
               }
      if not bLineGraph:
         layout.update({'barmode': "stack", 'bargap': 0.15, 'bargroupgap': 0.5})
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''
      var ctx = document.getElementById("''' + div + '-' + data_labels[3] + '''");
      Plotly.newPlot(ctx,
                     [''' + variables + '],' +
                     json.dumps(layout) + ''',
                     ''' + json.dumps(interaction) + ''');
      </script>
''')
   elif len(data_labels) == 3:
      print('''
      <div style="width:1200px;height:600px;">
          <div id="''' + div + '''" style="width:75%; float:left;"></div>
          <div id="''' + div + '''-pie" style="width:25%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '''");
''')
      traces = []
      if not bLineGraph:
         traces = [
                     {
                        'name': data_labels[1],
                        'type': "bar",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[1]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'marker': {
                                     'color': '#009933',
                                     'line': {
                                              'width': 1.0
                                             }
                                  },
                     },
                     # work around for bug "Grouped bar charts do not work with multiple Y axes";
                     # see https://github.com/plotly/plotly.js/issues/78;
                     # must be inserted here;
                     # invisible second trace in the first group
                     {
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'barmode': "overlay",
                        'y': [0], 'type': 'bar', 'hoverinfo': 'none', 'showlegend': False
                     },
                     # invisible first trace in the second group
                     {
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [0], 'type': 'bar', 'yaxis': 'y2', 'hoverinfo': 'none', 'showlegend': False
                     },
                     {
                        'name': data_labels[2],
                        'type': "bar",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[2]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'yaxis': "y2",
                        'marker': {
                                     'color': '#4e6bed',
                                     'line': {
                                                'width': 1.0
                                             }
                                  }
                     }
                   ]
      else:
         traces = [
                     {
                        'name': data_labels[1],
                        'type': "scatter",
                        'mode': "lines",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[1]] for i, point in enumerate(data)],
                        'width': 0.25,
                     },
                     {
                        'name': data_labels[2],
                        'type': "scatter",
                        'mode': "lines",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[2]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'yaxis': "y2",
                     }
                  ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {'size': 12},
                  'width': 800,
                  'height': 600,
                  'xaxis': {
                              'title': '<b>' + data_labels[0] + '</b>',
                              'titlefont': {'size': 10},
                              'tickangle': -45,
                              'tickfont': {'size': 8},
                              'zeroline': True,
                              'showline': True
                           },
                  'yaxis': {
                              'title': '<b>' + data_labels[1] + '</b>',
                              'titlefont': {'size': 10},
                              'zeroline': True,
                              'showline': True,
                              'tickfont': {'size': 8},
                              'showgrid': False,
                           },
                  'yaxis2': {
                               'title': '<b>' + data_labels[2] + '</b>',
                               'titlefont': {'size': 10},
                               'tickfont': {'size': 8},
                               'zeroline': True,
                               'showline': True,
                               'overlaying': "y",
                               'side': "right",
                               'showgrid': False,
                            }
               }
      if bLineGraph:
         layout['yaxis'].update({'type': "linear"})
         layout['yaxis2'].update({'type': "linear"})
         pass
      else:
         layout.update({'barmode': "group", 'bargap': 0.15, 'bargroupgap': 0.5})
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''
         Plotly.newPlot(ctx,
                        ''' + json.dumps(traces) + ''',
                        ''' + json.dumps(layout) + ''',
                        ''' + json.dumps(interaction) + ''');
      </script>
 
''')
      if not bLineGraph:
         traces =  [
                      {
                         'name': data_labels[1],
                         'hole': .4,
                         'type': "pie",
                         'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                         'values': [point[data_labels[1]] for i, point in enumerate(data)],
                         'domain': {
                            'row': 0,
                            'column': 0
                         },
                         'outsidetextfont': {'size': 8},
                         'insidetextfont': {'size': 8},
                         'legend': {'font': {'size': 8}},
                         'textfont': {'size': 8},
                         'font': {'size': 8},
                         'hoverinfo': 'label+percent+name',
                         'hoverlabel': {'font': {'size': 8}},
                         'textinfo': 'none'
                      },
                      {
                         'name': data_labels[2],
                         'hole': .4,
                         'type': "pie",
                         'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                         'values': [point[data_labels[2]] for i, point in enumerate(data)],
                         'domain': {
                            'row': 1,
                            'column': 0
                         },
                         'outsidetextfont': {'size': 8},
                         'insidetextfont': {'size': 8},
                         'legend': {'font': {'size': 8}},
                         'textfont': {'size': 8},
                         'font': {'size': 8},
                         'hoverinfo': 'label+percent+name',
                         'hoverlabel': {'font': {'size': 8}},
                         'textinfo': 'none'
                      },
                   ]
         layout = {
                     'title': '<b>' + graph_title + '</b>',
                     'annotations': [
                                       {
                                          'font': {
                                                    'size': 8
                                                  },
                                          'showarrow': False,
                                          'text': '<b>' + data_labels[1] + '</b>',
                                          'x': 0.5,
                                          'y': 0.8
                                       },
                                       {
                                          'font': {
                                                     'size': 8
                                                  },
                                          'showarrow': False,
                                          'text': '<b>' + data_labels[2] + '</b>',
                                          'x': 0.5,
                                          'y': 0.25
                                       }
                                    ],
                     'height': 600,
                     'width': 600,
                     'grid': {'rows': 2, 'columns': 1},
                     'legend': {'font': {'size': 10}}
                  }
 
         print('''
      <script>
         var ctx = document.getElementById("''' + div + '''-pie");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(traces) + ''',
                        ''' + json.dumps(layout) + '''
);
      </script>
''')
   elif len(data_labels) == 2:
      trace = [
                 {
                    'name': data_labels[1],
                    'type': "bar",
                    'x': [point[data_labels[0]] for i, point in enumerate(data)],
                    'y': [point[data_labels[1]] for i, point in enumerate(data)],
                    'width': 0.25,
                    'marker': {
                                 'color': '#009933',
                                 'line': {
                                            'width': 1.0
                                         }
                              }
                 }
              ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {'size': 12},
                  'height': 600,
                  'width': 800,
                  'xaxis': {
                            'title': '<b>' + data_labels[0] + '</b>',
                            'titlefont': {'size': 10},
                            'tickangle': -45,
                            'tickfont': {'size': 8},
                            'zeroline': True,
                          },
                  'yaxis': {
                            'title': '<b>' + data_labels[1] + '</b>',
                            'titlefont': {'size': 10},
                            'zeroline': True,
                            'showline': True,
                            'tickfont': {'size': 8},
                            'showgrid': False
                          },
                  'barmode': "group",
                  'bargap': 0.15,
                  'bargroupgap': 0.5
               }
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True};
      print('''       
      <div style="width:1200px;height:600px;">
          <div id="''' + div + '''" style="width:75%; float:left;"></div>
          <div id="''' + div + '''-pie" style="width:25%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '''");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(trace) + ''',
                        ''' + json.dumps(layout) + ''',
                        ''' + json.dumps(interaction) + ''');
      </script>
''')
      trace =  [
                  {
                     'name': data_labels[1],
                     'hole': .4,
                     'type': "pie",
                     'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                     'values': [point[data_labels[1]] for i, point in enumerate(data)],
                     'domain': {
                                  'row': 0,
                                  'column': 0
                               },
                     'outsidetextfont': {'size': 8},
                     'insidetextfont': {'size': 8},
                     'legend': {'font': {'size': 8}},
                     'textfont': {'size': 8},
                     'font': {'size': 8},
                     'hoverinfo': 'label+percent+name',
                     'hoverlabel': {'font': {'size': 8}},
                     'textinfo': 'none'
                  },
               ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'annotations': [
                                    {
                                       'font': {
                                                  'size': 8
                                               },
                                       'showarrow': False,
                                       'text': '<b>' + data_labels[1] + '</b>',
                                       'x': 0.5,
                                       'y': 0.5
                                    },
                                 ],
                  'height': 600,
                  'width': 600,
                  'grid': {'rows': 1, 'columns': 1},
                  'legend': {'font': {'size': 10}}
               }
      print('''       
      <script>
         var ctx = document.getElementById("''' + div + '''-pie");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(trace) + ''',
                        ''' + json.dumps(layout) + ''');
      </script>
''')
   else:
      print("illegal data_label value: " + repr(data_label))
 
   # closes the html page;
   if "e" == mode:
      print('''
   </body>
</html>
''')
 
   # restores default output stream;
   if None != html_output:
      sys.stdout = sys.__stdout__
 
def cumulAndExtend2():
   """
   for 2 variables in resultset;
   """
   global rawdata
   sum_count = 0
   sum_size = 0
   new_month = str(datetime.now().year) + "/" + str(datetime.now().month)
   for ind, point in enumerate(rawdata):
      sum_count += int(point['count(*)'])
      point['count(*)'] = sum_count
      sum_size += int(point['tot_size'])
      point['tot_size'] = sum_size
   if rawdata[ind]['month'] < new_month:
      new_point = dict(rawdata[ind])
      new_point['month'] = new_month
      rawdata.append(new_point)
   DctmAPI.show(rawdata)
 
def cumulAndExtend3(key):
   """
   for 3 variables in resultset;
   """
   global rawdata
   prec_doc = ""
   sum_count = 0
   sum_size = 0
   new_month = str(datetime.now().year) + "/" + str(datetime.now().month)
   for ind, point in enumerate(rawdata):
      if "" == prec_doc:
         prec_doc = point[key]
      if point[key] != prec_doc:
         # duplicate the last point so the line graphe shows a flat line and not a simple dot when there is a unique point for the document type;
         if rawdata[ind - 1]['month'] < new_month:
            new_point = dict(rawdata[ind - 1])
            new_point['month'] = new_month
            rawdata.insert(ind, new_point)
            prec_doc = point[key]
            sum_count = 0
            sum_size = 0
      else:
         sum_count += int(point['count(*)'])
         point['count(*)'] = sum_count
         sum_size += int(point['tot_size'])
         point['tot_size'] = sum_size
   if rawdata[ind]['month'] < new_month:
      new_point = dict(rawdata[ind])
      new_point['month'] = new_month
      rawdata.append(new_point)
   DctmAPI.show(rawdata)
 
# -----------------
# main;
if __name__ == "__main__":
   DctmAPI.logLevel = 0
  
   # parse the command-line parameters;
   # old-style for I don't need more flexibility here;
   repository = None
   output_file = None
   try:
      (opts, args) = getopt.getopt(sys.argv[1:], "hd:o:", ["help", "docbase=", "output="])
   except getopt.GetoptError:
      print("Illegal option")
      print("./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} [-o|--output_file <output_file>]")
      sys.exit(1)
   for opt, arg in opts:
      if opt in ("-h", "--help"):
         Usage()
         sys.exit()
      elif opt in ("-d", "--docbase"):
         repository = arg
      elif opt in ("-o", "--output"):
         output_file = arg
   if None == repository:
      print("at least one repository must be specified")
      Usage()
      sys.exit()
   DctmAPI.show("Will connect to docbase(s): " + repository + " and output to " + ("stdout" if "stdout" == output_file else "one single file " + output_file if output_file is not None else "one file per docbase"))
  
   # needed to locally import the js library Plotly;
   server = "192.168.56.10:8000"
 
   docbase_done = set()
   status = DctmAPI.dmInit()
   for pointer, docbase in enumerate(repository.split(",")):
      # graphe_1;
      if docbase in docbase_done:
         print("Warning: docbase {:s} was already processed and won't be again, skipping ...".format(docbase))
         continue
      docbase_done.add(docbase)
      session = DctmAPI.connect(docbase = docbase, user_name = "dmadmin", password = "dmadmin")
      if session is None:
         print("no session opened, exiting ...")
         exit(1)
 
      page_title = "Graphical Overview of Repository " + docbase
      if None == output_file:
         html_output = "./graphs/" + docbase + "-" + datetime.today().strftime("%Y%m%d-%H%M%S") + ".html"
      elif "stdout" == output_file:
         html_output = None
      else:
         html_output = output_file
 
      graph_name = "graphe_1"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q1. Present Count & Size Per Document Type",
                data = rawdata,
                data_labels = attr_name,
                mode = ("b" if (("stdout" == output_file or output_file is not None) and 0 == pointer) or output_file is None else None))
 
      # graphe_2;
      graph_name = "graphe_2"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, datetostring(r_creation_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, datetostring(r_creation_date, 'yyyy/mm') order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q2. Monthly New Documents per Type",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('r_object_type')
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q2l. Cumulated Monthly Documents per Type",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)
    
      # graphe_3;
      graph_name = "graphe_3"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, datetostring(r_modify_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by r_object_type, datetostring(r_modify_date, 'yyyy/mm') order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q3. Monthly Modified Documents per Type",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('r_object_type')
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q3l. Cumulated Monthly Modified Documents per Type",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)
 
      # graphe_4;
      graph_name = "graphe_4"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select datetostring(r_creation_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datetostring(r_creation_date, 'yyyy/mm') order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q4. Monthly New Documents",
                data = rawdata,
                data_labels = attr_name )
      cumulAndExtend2()
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q4l. Cumulated Monthly Documents",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)
 
      # graphe_5;
      graph_name = "graphe_5"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select datetostring(r_modify_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date != r_modify_date group by datetostring(r_modify_date, 'yyyy/mm') order by 1;
        """
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q5. Monthly Modified Documents",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend2()
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q5l. Cumulated Monthly Modified Documents",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True,
                mode = 'e')
 
      # graphe_6;
      graph_name = "graphe_6"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q6. Count & Size Per Content Format",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_7;
      graph_name = "graphe_7"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, datetostring(r_creation_date, 'yyyy-mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, datetostring(r_creation_date, 'yyyy-mm') having count(*) > 10 order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q7. Monthly Created Documents Per Content Format with count(*) > 10",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('a_content_type')
      DctmAPI.show(rawdata)
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q7l. Cumulated Monthly Created Documents Per Content Format with count(*) > 10",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)
 
      # graphe_8;
      graph_name = "graphe_8"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_storage_type order by a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q8. Count & Size Per Filestore",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_9;
      graph_name = "graphe_9"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, a_storage_type order by r_object_type, a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q9. Count & Size Per Document Type & Filestore",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_10;
      graph_name = "graphe_10"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, a_storage_type having count(*) > 10 order by a_content_type, a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q10. Count & Size Per Content Format & Filestore with count(*) > 10",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_11;
      graph_name = "graphe_11"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select owner_name, count(*) from dm_acl group by owner_name order by 2 desc"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q11. ACLs per owner",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_12;
      graph_name = "graphe_12"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select acl_name, count(*) from dm_document(all) where acl_name not like 'dm_%' group by acl_name having count(*) >= 10 order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q12. External ACLs in Use by >= 10 documents",
                data = rawdata,
                data_labels = attr_name)
 
      # graphe_13;
      graph_name = "graphe_13"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select name, count(*) from dmi_queue_item group by name order by 2 desc"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q13. Queue Items by Name",
                data = rawdata,
                data_labels = attr_name,
                mode = ('e' if (("stdout" == output_file or output_file is not None) and len(repository.split(",")) - 1 == pointer) or output_file is None else None))
 
   status = DctmAPI.disconnect(session)
   if not status:
      print("error while  disconnecting")

A pdf (sorry, uploading python scripts is not allowed on this site) of the script is available here too graph-stats.
Here is the DctmAPI.py as pdf too: DctmAPI.
The script invokes select2dict() from the module DctmAPI (as said before, it is also presented and accessible here) with the query and receives back the result into an array of dictionaries, i.e. each line of the array is a line from the query’s resultset. It then goes on creating the HTML page by invoking the function Plot2HTML(). On line 47, the js library is imported. Each graph is plotted in its own DIV section by Plotly’s js function newPlot(). All the required parameters are set in python dictionaries and passed as json literals to newPlot(). The conversion from python to javascript is conveniently done by the function dumps() from the json module. The HTML and js parts are produced by multi-line python’s print() statements.

Invoking the script

The script is invoked from the command-line through the following syntax:

./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} [-o|--output_file <output_file>]

A Help option is available, which outputs the text below:

./graph-stats.py --help
Usage:
Connects as dmadmin/xxxx to a repository and generates an HTML page containing the plots of several DQL queries result;
Usage:
./graph-stats.py -h|--help | -d|--docbase {,<docbase>} [-o|--output_file <output_file>]
output_file can be one repository or a comma-separated list of repositories;
if output_file is omitted, the html page is output to ./graphs/-$(date +"%Y%m%d-%H%M%S").html;
output_file can be "stdout", which is useful for CGI programming;
Example:
./graph-stats.py -d dmtest,mail_archive,doc_engineering
will query the docbases dmtest, mail_archive and doc_engineering and output the graphs to the files -$(date +"%Y%m%d-%H%M%S").html, one file per docbase;
./graph-stats.py -d mail_archive,doc_engineering -o all_docbase_current_status
will query the docbases mail_archive and doc_engineering and output all the graphs to the unique file all_docbase_current_status;
./graph-stats.py -d mail-archive
will query docbase mail-archive and output the graphs to the file mail-archive-$(date +"%Y%m%d-%H%M%S").html;
./graph-stats.py --docbase dmtest --output dmtest.html
will query docbase dmtest and output the graphs to the file dmtest.html;
./graph-stats.py -d dmtest --output stdout
will query docbase dmtest and output the graphs to stdout;

As the account used is the trusted, password-less dmadmin one, the script must run locally on the repository server machine but it is relatively easy to set it up for remote docbases, with a password prompt or without if using public key authentication (and still password-less in the remote docbases if using dmadmin).

Examples of charts

Now that we’ve seen the bits and pieces, let’s generate an HTML page with the graphs of all the above queries.
Here is an example when the queries where run against an out of the box repository named dmtest (all the graphs in one pdf file): dmtest.pdf
For more complex examples, just run the script against one of your docbases and see the results by yourself, as I can’t upload neither html files nor compressed files here.
As shown, some queries are charted using 2 to 4 graphs, the objective being to offer the clearest view. E.g. query 1 computes counts and total sizes. They will be plotted on the same graph as grouped bars of 2 independent variables and on two distinct pie charts, one for each variable. Query 2 shows the count(*) and the total sizes by document type as 2 stacked bar graphs plus 2 line graphs of the cumulated monthly created documents by document type. The line graphs show the total number of created documents at the end of a month, not the total number of existing documents as of the end of each month; to have those numbers (an historical view of query 1), one needs to query the docbase monthly and store the numbers somewhere. Query 1 returns this status but only at the time it was run.
Discussing Plotly is out of the scope of this article but suffice it to say that it allows one to work the chart interactively, e.g. removing variables (aka traces in Plotly’s parlance) so the rest of them are re-scaled for better readability, zoom parts of the charts, and scroll the graphs horizontally or vertically. A nice tip on hover is also available to show the exact values of a point, e.g.:
Screenshot from 2018-11-03 19-07-28
Plotly even lets one completely edit the chart from within a js application hosted at Plotly, so this may not suit anybody if confidentiality is required; moreover, the graph can only be saved in a public area on their site, and subsequently downloaded from there, unless a per annum paying upgrade is subscribed. Nonetheless, the free features are well enough for most needs.

Leveraging Plot2HTML()

The script’s Plot2HTML() function can be leveraged so it is used to plot data from any source, even from a text file so the query is separated from its result’s graphical representation, which is useful if a direct access to the system to query is not possible. To do this, an easy to parse file format is necessary, e.g.:

# this is a commented line;
# line_no   variable1    variable2 .....
0          type of graph                                   # use B for bars, S for stacked bars, L for lines, P for pie or any combination of them (BL, LB, BP, PB, LP, PL, BPL);
                                                           # one graph for each combination will be put on the html page in a nx2 cell grid, i.e. graphs  1 and 2 horizontally and graph 3 below,
                                                           # in the same order as the graph type letters;
1          graph title                                     # can be on several lines separated by html <br> for sub-titles;
2          Xaxis_name     Yaxis1_name    Yaxis2_name       # names of axis, one X and up to 2 Y;
3          Xaxis_values1  value1         value2            # first line of (x,y) pairs or (x,y1,y2) triplets if a second axis was specified;
4          Xaxis_values2  value1         value2            # second line of data;
...                                                        # the other lines;
n          Xaxis_valuesn  valuen         valuen            # last line of data;

And that script could be invoked like this:

Usage:
       ./graph-it.py -h|--help | -f|--input_file <filename>{,<filename>} [-o|--output_file <output_file>]

The script itself just needs to parse and load the data from the input file(s) into an array of dictionaries and pass it to the plotting function Plot2HTML(). Of course, if the data is already in json format, the parsing step will be greatly simplified.
Time and mood permitting, I’ll do it in a future article.

Useful tools

The color of the many chart’s option (e.g. on line 87) can be specified through an rgb triplet or an HTML code. There are many color pickers on the web, e.g. this one. Here, I let Plotly chose the colors as it sees fit, hence the not so attractive results.
If you need to paste HTML code into an HTML page without it being interpreted by the browser, e.g. the code generated by graph-stats.py, some transformations are required, e.g. all the angle brackets need be replaced by their HTML entities representations. If that code does not contain confidential information, an on-line service can be used for that, such as this one, but it shouldn’t be too difficult to write a list of sed expressions that do the substitutions (e.g. this gawk one-liner takes care of the characters < and >:

gawk -c '{gsub(/>/, "\\>"); gsub(/</, "\\<"); print}' myfile.html).

Other replacements can be added as the need arises.

Conclusion

All the above confirms yet again that it is possible to create cheap but useful, custom tools with almost no installation, except here a textual javascript library. All the rest is already available on any administrator’s decent system, command-line and python interpreters, and a browser which, by the way, is taking more and more importance as a tiers because of the wide spreading of javascript libraries that execute locally and the ever increasing performance of javascript engines. By the way, chromium is highly recommended here for visualizing those complex charts from real production repositories.
As the source code is provided, it is possible to modify it in order to produce better looking graphs. This is the real arduous and time-consuming part in my humble opinion as it requires some artistic inclination and many experimentations. If you find a nicer color scheme, please feel free to contribute in the comments.