This article is part of a series that includes SQLite, Postgresql, Firebird, Oracle RDBMS, Microsoft SQL Server, HSQLDB, Excel, and MariaDB. The goal is to set up a self-standing environment for testing an ODBC extension for gawk presented here to be completed. Refer to SQLite for installing the required ODBC Driver Manager.
The test system is a debian v11 (bullseye).
MongoDB is a noSQL database but the ODBC API will hide this fact and allow to access its data using SQL statements. This is to show that with the appropriate ODBC drivers doing the translation, any data source can be accessed using SQL. Not all native operations are rendered correctly of course (notably SELECT’s JOIN clauses, see below) but the main ones, the CRUD, such as INSERT, SELECT, UPDATE and DELETE are all available.
As root, install MongoDB Community Edition and its command-line tool from the official package repositories:

# apt install mongodb
# apt install mongodb-mongosh

Follow the instructions here to configure and start mongodb.
A systemd service has been created. Still as root, start the service as shown below:

# ulimit -n 64000
# chown -R mongodb:mongodb /var/lib/mongodb
# systemctl stop mongod
# systemctl start mongod
# systemctl status mongod
● mongod.service - MongoDB Database Server
     Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-09-21 09:27:47 CEST; 3s ago
       Docs: https://docs.mongodb.org/manual
   Main PID: 38200 (mongod)
     Memory: 9.8M
        CPU: 26ms
     CGroup: /system.slice/mongod.service
             └─38200 /usr/bin/mongod --config /etc/mongod.conf

Check the MongoDB server availability by connecting to the server through the mongosh tool documented here:

# mongosh
Current Mongosh Log ID:	6149ba4e6d2a951c4e869dac
Connecting to:		mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000
Using MongoDB:		5.0.3
Using Mongosh:		1.0.6
test> quit

The data sample site does not provide scripts for creating and populating MongoDB collections. There are 2 ways to work around this:
1. use isql and ODBC to feed SQL statements to MongoDB;
2. translate SQL statements to javascripts ones suitable for MongoDB;
At this stage, ODBC drivers are not installed yet and therefore isql cannot work so we will select the 2nd alternative. The script below will do just that. As a Mongo database is schema-less, there is no need to create collections beforehand, they will be implicitely created when populated with documents. Any sample populating script will do so we’ll take the one for mysql here and save it to the file data4mongodb.sql. The script pop-mongodb.awk will convert SQL insert statements into MongoDB statements. Here it is:

# format of the input lines, e.g:
#   INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
# or:
#   INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2500,'Magdalen Centre, The Oxford Science Park','OX9 9ZB','Oxford','Oxford','UK');
# Output:
#    db.regions.insert({
#    region_id: 1,
#    region_name: "Europe"
#    })
# or:
#    db.locations.insert({
#    location_id: 2500,
#    street_address: "Magdalen Centre, The Oxford Science Park",
#    postal_code: "OX9 9ZB",
#    city: "Oxford",
#    state_province: "Oxford",
#    country_id: "UK"
#    })
# Usage:
#    gawk -v Apostrophe="'" -f pop-mongodb.awk data4mongodb.sql > data4mongodb.json
BEGIN {
   ApoRE="(([^,]+)|(" Apostrophe "([^" Apostrophe "]+)" Apostrophe ")),?" # i.e. (([^,]+)|('([^']+)')),?
   print "use sampledb"
}
{
   if (!$0 || match($0, /^/*/)) next

   match($0, /INSERT INTO ([^(]+)(([^)]+))/, stmt)
   table_name = stmt[1]
   if (!bemptied[table_name]) {
      print "db." table_name ".deleteMany({})"
      bemptied[table_name] = 1
   }

   nb_columns = split(stmt[2], columns, ",")

   nb = match($0, /VALUES (([^)]+))/, stmt)
   S = stmt[1]
   nb_values = 0
   while (match(S, ApoRE, res)) {
      values[++nb_values] = res[1]
      S = substr(S, RLENGTH + 1)
   }

   print "db." table_name ".insert({"
   for (i = 1; i <= nb_columns; i++) {
      if ("NULL" == values[i])
         values[i] = "null"
      gsub(Apostrophe, """, values[i])
      print columns[i] ": " values[i] (i < nb_columns ? "," : "")
   }
   print "})"
   printf "n"
}

Invoke it:

gawk -v Apostrophe="'" -f pop-mongodb.awk data4mongodb.sql > data4mongodb.json

Example of input:

INSERT INTO regions(region_id,region_name) VALUES (1,'Europe');
...
INSERT INTO countries(country_id,country_name,region_id) VALUES ('AR','Argentina',2);
...
INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1400,'2014 Jabberwocky Rd','26192','Southlake','Texas','US');
...
INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (1,'Public Accountant',4200.00,9000.00);
...
INSERT INTO departments(department_id,department_name,location_id) VALUES (1,'Administration',1700);
...
INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (100,'Steven','King','[email protected]','515.123.4567','1987-06-17',4,24000.00,NULL,9);
...
INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (1,'Penelope','Gietz','Child',206);
...

and their corresponding generated json statements:

use sampledb
db.regions.deleteMany({})
db.regions.insert({
region_id: 1,
region_name: "Europe",
})
...
db.countries.deleteMany({})
db.countries.insert({
country_id: "AR",
country_name: "Argentina",
region_id: 2,
})
...
db.locations.deleteMany({})
db.locations.insert({
location_id: 1400,
street_address: "2014 Jabberwocky Rd",
postal_code: "26192",
city: "Southlake",
state_province: "Texas",
country_id: "US",
})
...
db.jobs.deleteMany({})
db.jobs.insert({
job_id: 1,
job_title: "Public Accountant",
min_salary: 4200.00,
max_salary: 9000.00,
})
...
db.departments.deleteMany({})
db.departments.insert({
department_id: 1,
department_name: "Administration",
location_id: 1700,
})
...
db.employees.deleteMany({})
db.employees.insert({
employee_id: 100,
first_name: "Steven",
last_name: "King",
email: "[email protected]",
phone_number: "515.123.4567",
hire_date: "1987-06-17",
job_id: 4,
salary: 24000.00,
manager_id: null,
department_id: 9,
})
...
db.dependents.deleteMany({})
db.dependents.insert({
dependent_id: 1,
first_name: "Penelope",
last_name: "Gietz",
relationship: "Child",
employee_id: 206,
})

To be sure the populating step is idempotent, the collections are emptied each time the script is run.
From mongosh, populate the tables as shown:

mongosh < data4mongodb.isql

Back in mongosh, check the data:

mongosh  show databases
admin       41 kB
config     111 kB
local     81.9 kB
sampledb   516 kB

test> use sampledb
switched to db sampledb

sampledb> show collections
countries
departments
dependents
employees
jobs
locations
regions

sampledb> db.regions.find()
[
  {
    _id: ObjectId("616eef8e230e4e4893edd45f"),
    region_id: 1,
    region_name: 'Europe'
  },
...
sampledb> db.countries.find()
[
  {
    _id: ObjectId("616eef8f230e4e4893edd463"),
    country_id: 'AR',
    country_name: 'Argentina',
    region_id: 2
  },
...
sampledb> db.locations.find()
[
  {
    _id: ObjectId("616eef91230e4e4893edd47c"),
    location_id: 1400,
    street_address: '2014 Jabberwocky Rd',
    postal_code: '26192',
    city: 'Southlake',
    state_province: 'Texas',
    country_id: 'US'
  },
...
sampledb> db.jobs.find()
[
  {
    _id: ObjectId("616eef92230e4e4893edd483"),
    job_id: 1,
    job_title: 'Public Accountant',
    min_salary: 4200,
    max_salary: 9000
  },
...
sampledb> db.departments.find()
[
  {
    _id: ObjectId("616eef94230e4e4893edd496"),
    department_id: 1,
    department_name: 'Administration',
    location_id: 1700
  },
...
sampledb> db.employees.find()
[
  {
    _id: ObjectId("616eef95230e4e4893edd4a1"),
    employee_id: 100,
    first_name: 'Steven',
    last_name: 'King',
    email: '[email protected]',
    phone_number: '515.123.4567',
    hire_date: '1987-06-17',
    job_id: 4,
    salary: 24000,
    manager_id: null,
    department_id: 9
  },
...
sampledb> db.dependents.find()
[
  {
    _id: ObjectId("616eef9c230e4e4893edd4c9"),
    dependent_id: 1,
    first_name: 'Penelope',
    last_name: 'Gietz',
    relationship: 'Child',
    employee_id: 206
  },
...
sampledb> quit

The MongoDB ODBC drivers are available here. They are a modified version of MySQL ODBC driver. However, they don’t work on my test machine running Debian Linux v11 because of a missing openssl v1.0.2 library, which was predictable because those drivers are several years old and my test machine runs the latest Debian linux. For this reason, a commercial replacement from Devart has been installed; it comes with a one-month evaluation period. Once registered, it can be downloaded and installed as root as follows:

# wget https://www.devart.com/odbc/mongodb/devartodbcmongo_amd64.deb
# apt install /home/debian/Downloads/devartodbcmongo_amd64.deb

Check the system-wide installed drivers:

# odbcinst -q -d
...
[MongoDB Unicode]
[MongoDB ANSI]
...
[Devart ODBC Driver for MongoDB]
# odbcinst -q -d -n "Devart ODBC Driver for MongoDB"
[Devart ODBC Driver for MongoDB]
Driver=/usr/local/lib/libdevartodbcmongo.so

Check the system-wide DSN in /etc/odbc.ini:

# odbcinst -q -s
...
[DEVART_MONGODB]
#  odbcinst -q -s -n "DEVART_MONGODB"
[DEVART_MONGODB]
Description=My MongoDB sample database
Driver=Devart ODBC Driver for MongoDB
Data Source=
Port=27017
Database=sampledb
User ID=
Password=
Client Library=
BSON Library=
Additional Servers=
Connection Options=

Copy the newly inserted DSN into debian’s ~/.odbc.ini file:

$ vi ~/.odbc.ini
...
[DEVART_MONGODB]
Description=My MongoDB sample database
Driver=Devart ODBC Driver for MongoDB
Data Source=
Port=27017
Database=sampledb
User ID=
Password=
Client Library=
BSON Library=
Additional Servers=
Connection Options=

As debian, check its DSN so far:

$ odbcinst -q -s
...
[DEVART_MONGODB]
...

$ odbcinst -q -s -n DEVART_MONGODB
[DEVART_MONGODB]
Description=My MongoDB sample database
Driver=Devart ODBC Driver for MongoDB
Data Source=
Port=27017
Database=sampledb
User ID=
Password=
Client Library=
BSON Library=
Additional Servers=
Connection Options=

Try a connection to the MongoDB database via ODBC:

isql -v DEVART_MONGODB
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> help
+-----------+-------------+-------------+------------+---------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME  | TABLE_TYPE | REMARKS |
+-----------+-------------+-------------+------------+---------+
| sampledb  |             | countries   | TABLE      |         |
| sampledb  |             | dependents  | TABLE      |         |
| sampledb  |             | regions     | TABLE      |         |
| sampledb  |             | locations   | TABLE      |         |
| sampledb  |             | departments | TABLE      |         |
| sampledb  |             | jobs        | TABLE      |         |
| sampledb  |             | employees   | TABLE      |         |
+-----------+-------------+-------------+------------+---------+ 
SQLRowCount returns -1
7 rows fetched

It looks good. Run the test query now:

SQL> SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries c LEFT JOIN locations l ON l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN');
Output:
+--------------+-----------------+-----------------+----------------+------+
| country_name | country_id      | country_id_1    | street_address | city |
+--------------+-----------------+-----------------+----------------+------+
| China        | CN              |                 |                |      |
+--------------+-----------------+-----------------+----------------+------+ 
SQLRowCount returns -1
1 rows fetched

Only one row is returned because the left join is not implemented in the combo MongoDB/ODBC driver. In effect, joins in relational database are a necessity due to the data normalization; as MongoDB does not care about data duplication and works with complete, self-standing documents, no joins are necessary although inner queries can be simulated with aggregate() and lookup() as illustrated by the following equivalent query:

db.countries.aggregate([
   {
      $match:{$or:[{"country_id" : "US"}, {"country_id" : "UK"}, {"country_id" : "CN"}]}
   },
   {
     $lookup:
       {
         from: "locations",
         localField: "country_id",
         foreignField: "country_id",
         as: "r"
       }
  }
])

with its result:

[
  {
    _id: ObjectId("616eef90230e4e4893edd469"),
    country_id: 'CN',
    country_name: 'China',
    region_id: 3,
    r: []
  },
  {
    _id: ObjectId("616eef91230e4e4893edd478"),
    country_id: 'UK',
    country_name: 'United Kingdom',
    region_id: 1,
    r: [
      {
        _id: ObjectId("616eef92230e4e4893edd480"),
        location_id: 2400,
        street_address: '8204 Arthur St',
        postal_code: null,
        city: 'London',
        state_province: null,
        country_id: 'UK'
      },
      {
        _id: ObjectId("616eef92230e4e4893edd481"),
        location_id: 2500,
        street_address: 'Magdalen Centre, The Oxford Science Park',
        postal_code: 'OX9 9ZB',
        city: 'Oxford',
        state_province: 'Oxford',
        country_id: 'UK'
      }
    ]
  },
  {
    _id: ObjectId("616eef91230e4e4893edd479"),
    country_id: 'US',
    country_name: 'United States of America',
    region_id: 2,
    r: [
      {
        _id: ObjectId("616eef91230e4e4893edd47c"),
        location_id: 1400,
        street_address: '2014 Jabberwocky Rd',
        postal_code: '26192',
        city: 'Southlake',
        state_province: 'Texas',
        country_id: 'US'
      },
      {
        _id: ObjectId("616eef91230e4e4893edd47d"),
        location_id: 1500,
        street_address: '2011 Interiors Blvd',
        postal_code: '99236',
        city: 'South San Francisco',
        state_province: 'California',
        country_id: 'US'
      },
      {
        _id: ObjectId("616eef91230e4e4893edd47e"),
        location_id: 1700,
        street_address: '2004 Charade Rd',
        postal_code: '98199',
        city: 'Seattle',
        state_province: 'Washington',
        country_id: 'US'
      }
    ]
  }
]

To easy up the comparaison, we save that output to the file mongodb.out and run the following commands to reformat it into a tabular presentation.
First, convert mongodb.out to correct json syntax:

$ gawk -v Apo="'" '{
   gsub(Apo, """, $0)
   if (match($1, /^_id/)) next
   print gensub(/^( +)([^:]+)(:.+$)/, "\1"\2"\3", "g", $0)
}' mongodb.out > mongodb.json

Now, save the following python script into the python script mongodb2tab.py:

$ cat - <<eop mongodb2tab.py
import os
import json
with open("./mongodb.json") as json_file:
   mongodb_out = json.load(json_file)
widths = [25, 10, 10, 42, 10]
print(f"{'country_name': <{widths[0]}}  {'country_id': <{widths[1]}}  {'country_id': <{widths[2]}}  {'street_address': <{widths[3]}}  {'city': <{widths[4]}}")
for row in mongodb_out:
   if row['r']:
      for l in row['r']:
         print(f"{row['country_name']: <{widths[0]}}  {row['country_id']: <{widths[1]}}  {l['country_id']: <{widths[2]}}  {l['street_address']: <{widths[3]}}  {l['city']: <{widths[4]}}")
   else:
      print(f"{row['country_name']: <{widths[0]}}  {row['country_id']: <{widths[1]}}")
eop

Finally, execute it:

$ python3 mongodb2tab.py
Output:
country_name               country_id  country_id  street_address                              city      
China                      CN        
United Kingdom             UK          UK          8204 Arthur St                              London    
United Kingdom             UK          UK          Magdalen Centre, The Oxford Science Park    Oxford    
United States of America   US          US          2014 Jabberwocky Rd                         Southlake 
United States of America   US          US          2011 Interiors Blvd                         South San Francisco
United States of America   US          US          2004 Charade Rd                             Seattle   

Which shows that the output of the MongoDb query to simulate the left outer join was correct.
Let’s now test the DSN with pyodbc:

$ python3
import pyodbc 
cnxn = pyodbc.connect(DSN='mymongodb_Devart')
cursor = cnxn.cursor()	
cursor.execute("""SELECT
        c.country_name,
        c.country_id,
        l.country_id,
        l.street_address,
        l.city
FROM
        countries c
LEFT JOIN locations l ON l.country_id = c.country_id
WHERE
        c.country_id IN ('US', 'UK', 'CN')""")
row = cursor.fetchone() 
while row:
    print (row) 
    row = cursor.fetchone()
Output:
('China', 'CN', None, None, None)

Here too, there are missing values as expected.
MongoDB is now accessible from ODBC under the debian account, albeit not all SQL statements are fully supported, which is understandable with a NoSQL database.
Instructions for the other data sources can be accessed through the following links:
SQLite
Firebird
HSQLDB
MariaDB
PostgreSQL
Oracle
Microsoft SQLServer for Linux
Excel


Thumbnail [60x60]
by
Cesare Cervini