Blog - comments

Hi Olivier,DELAYED_DURABILITY doesn't change the SQL Server behavior of opening the data and log fil...
Hi David,Thanks for this demo with write functions. As much I know, SQL Server always access files (...
Olivier Berthommé
Thank you for your feedback. I will build a test environment and realize a guide to deploy Grid Infr...

Hi Greg,

Thanks

great job Dave ! thanks

greg
Blog Michael Schwalm Oracle LOBs: Infer the file type with dbms_lob package

dbi services Blog

Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle LOBs: Infer the file type with dbms_lob package

LOBs (Large OBjects) first appeared in Oracle 7 and were created to store large amount of data such as document files, video, pictures, etc. Today, we can store up to 128 TB of data in a LOB, depending on the DB_BLOCK_SIZE parameter settings. In this posting, I will show you how to load LOB files into the Oracle database and will present a way to identify the file type of the LOB stored in the database, based on the LOB value column only.

Each file type is associated to a "file signature". This signature is composed of several bytes (i. e. 00 01), and allows to uniquely identify a file type (i.e. zip file, png file, etc). The signature corresponds to the first bytes of the file. To identify the file type of a LOB stored in a database, we just have to extract the first bytes of the LOB value and then compare them to a list of known bytes. I have created a PL/SQL script for that purpose.

The first step is to create a table containing LOB files. Here, I will create a simple table to store internal Binary objects (BLOB). The statement used is:

 

CREATE TABLE app_doc (  doc_id NUMBER,  doc_value BLOB);

 

SQL> desc app_doc;
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
DOC_ID                                             NUMBER
DOC_VALUE                                          BLOB

 

Now, I want to load a file from the operating file system to the table APP_DOC. The file to load is a picture saved as PNG format. The path is "/oracle/documents/my_pic.png".

First, we must create an Oracle directory in order to access the file:

 

SQL> CREATE OR REPLACE DIRECTORY DIR_DOCUMENTS AS '/oracle/documents';

 

We then use the DBMS_LOB package through a PL/SQL block to load a file in the database:

 

DECLARE
  src_lob  BFILE := BFILENAME('DIR_DOCUMENTS', 'my_pic.png');
  dest_lob BLOB;
BEGIN
  INSERT INTO app_doc VALUES(1, EMPTY_BLOB())
     RETURNING doc_value INTO dest_lob;
  DBMS_LOB.OPEN(src_lob, DBMS_LOB.LOB_READONLY);
  DBMS_LOB.LoadFromFile( DEST_LOB => dest_lob,
                         SRC_LOB  => src_lob,
                         AMOUNT   => DBMS_LOB.GETLENGTH(src_lob) );
  DBMS_LOB.CLOSE(src_lob);
  COMMIT;
END;
/

 

The table has now a new record:

 

SQL> select count (*) from app_doc;
  COUNT(*)
----------
         1

 

I have populated the table with 28 files of several types (png, pdf, bmp, etc.) and the table has 28 records. Here is the list of files loaded into the table:

 

oracle@vmtest:/oracle/documents/ [rdbms11203] ls -1
001.png
002.png
003.png
004.png
005.png
006.png
007.png
008.png
009.png
010.png
011.png
012.png
013.png
014.png
015.png
016.png
017.png
018.png
100021667-logo-dbi-services-sa.jpg
dbi_twitter_bigger.jpg
e18294-SecureFiles_and_Large_Objects_Developers_Guide.pdf
e25523-VLDB_and_Partitioning_Guide.pdf
file.zip
Mybmp2.bmp
Mybmp.bmp
Mydoc1.doc
mypdf.pdf
text3.txt

 

Here is an extract of the PL/SQL written to identify the LOB file type:

 

-- Create a temporary table to store known document types
CREATE GLOBAL TEMPORARY TABLE temp_doc_types (id NUMBER, type VARCHAR(5), sign VARCHAR(4)) ON COMMIT DELETE ROWS;

 

-- Populate the temporary table
INSERT INTO temp_doc_types VALUES (1,'jpeg','FFD8');
INSERT INTO temp_doc_types VALUES (2,'gif','4749');
INSERT INTO temp_doc_types VALUES (3,'png','8950');
INSERT INTO temp_doc_types VALUES (4,'bmp','424D');
INSERT INTO temp_doc_types VALUES (5,'pdf','2550');
INSERT INTO temp_doc_types VALUES (6,'doc','D0CF');
INSERT INTO temp_doc_types VALUES (7,'zip','504B');
INSERT INTO temp_doc_types VALUES (8,'rar','5261');

 

In this example, I chose to compare only the four first bytes of the LOB value. The number of bytes composing a file signature can be larger, but to simplify the example, I only used file signatures that have four bytes.

The number of files of each extension is returned by this statement:

 

SQL> select decode(type,NULL,'Unknown document type',type) as "TYPE OF DOCUMENT",
  2  count(*) as "NUMBER OF DOCUMENTS"
  3  from temp_doc_types a,
  4  (select (dbms_lob.substr(doc_value,2,1)) as FILE_TYPE from app_doc) b
  5  where a.sign(+)=b.FILE_TYPE group by a.type;

 

TYPE OF DOCUMENT      NUMBER OF DOCUMENTS
--------------------- -------------------
Unknown document type                   1
doc                                     1
bmp                                     2
png                                    18
pdf                                     3
jpeg                                    2
zip                                     1

 

The statement gets the first four bytes of each LOB and compares them to the temporary table, matching the LOB to a type.

  • The decode function is used to display 'Unknown document type' when NULL is returned and no extension matches between the LOB and the temporary table. This is the case for text files which have no file signature.
  • The DBMS_LOB.SUBSTR function is used to retrieve only the first four bytes from the LOB value. Parameters are the source LOB value, the amount of bytes to read (4), and the starting position (1).
  • An outer join is used to count LOBs not matching the temporary table. This will correspond to unidentified files types.

 

To finish, I drop the temporary table before exiting the job:

 

-- The temporary table is dropped before exit
DROP TABLE temp_doc_types;
exit;

 

And just for the eyes, this is the result in a HTML fashion, generated directly from sqlplus using the MARKUP option and an css style sheet. I let you read the Oracle documentation in order to know how to use the MARKUP option ;-)

 

sql_report

Rate this blog entry:
4

Michael Schwalm is Consultant at dbi Services and has more than two years of experience in Oracle database administration. He has a broad knowledge in the realization of virtualization infrastructures such as vMware vSphere. He took his first steps in database administration as an integrator of a web applications on Unix, Oracle, and Websphere environments. Michael Schwalm is Oracle Certified Professional 11g and RAC Implementation Specialist 11g. Prior to joining dbi services, Michael Schwalm was application administrator at SOGETI Est (F) on behalf of PSA Peugeot Citroen and responsible for the realization and managing of Unix environments and Oracle databases in the context of migration projects. Michael Schwalm holds a BTS diploma in Information System Management from Belfort (F) and a TSAR diploma in advanced network administration from Strasbourg (F). His branch-related experience covers Automotive, Software industry, Financial Services / Banking, etc.

Comments

  • No comments made yet. Be the first to submit a comment

Leave your comment

Guest Tuesday, 30 September 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter