Blog - comments

Hi Christopher, It's there. I't not an option that you check at install. Just use it by setting inme...
Hello, Thanks for the nice blog. I tried the latest 12c download available in Oracle's website and I...
Christopher Bernard
-- Here is a quick script to display which objects are locked in Share. Parameters: owner tablename....
Hey...I think you forgot that Hotspot have a JIT compiler too. The difference is in the time wherer ...
Anderson

Thanks for the content..

vani
Blog Franck Pachot Oracle 12c In-Memory option: waiting for 12.1.0.2

dbi services Blog

Welcome to the dbi services Blog! This blog focuses on IT infrastructure - featuring news, troubleshooting, and tips & tricks. It covers database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on the postings!

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Oracle 12c In-Memory option: waiting for 12.1.0.2

Oracle has announced a big feature that should come in the next 12c patch set: the In-Memory option. It will soon be in beta version, so nothing will be published about it until it comes to production. Before that phase, I'm going to explain what we can expect from that feature, besides the 'In-Memory','columnar storage', and 'hybrid' buzzwords.

First, it is a redundant storage that is aimed to improve the query performance.
But this is not new: indexes and materialized views have the same goal.

Second: Redundant data will be stored In-Memory and will be optimized for it.
That is not new either if we consider TimesTen. TimesTen is an In-Memory relational database that can also be used in front of Oracle Database when used as an In-Memory Database Cache.

What is new is that it is a columnar storage. Columnar storage is already used by Oracle for compression (Hybrid Columnar Compression) but here, the goal is different. Very different. HCC uses CPU to save I/O and disk, and is mainly for read-only data. In-Memory option is for actively updated data residing in memory and the goal is to use less CPU. Let's explain this.

I'll use a well known relational table: SCOTT.EMP where I highligh the columns and rows that I'll need for my query:

 

EMPNOENAMEJOBMGRHIREDATESALCOMMDEPTNO
7369 SMITH CLERK 7902 17-DEC-80 800   20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975   20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850   30
7782 CLARK MANAGER 7839 09-JUN-81 2450   10
7788 SCOTT ANALYST 7566 19-APR-87 3000   20
7839 KING PRESIDENT   17-NOV-81 5000   10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100   20
7900 JAMES CLERK 7698 03-DEC-81 950   30
7902 FORD ANALYST 7566 03-DEC-81 3000   20
7934 MILLER CLERK 7782 23-JAN-82 1300   10

 

This is a table. It has two dimensions: rows and columns. How do you read it ? There are two ways:

  • Read it by column: I have employee numbers 7369, 7499, 7521, ... Their names are respectively SMITH, ALLEN, WARD, ... The first one is a CLERK, the second and third ones are SALESMEN, ...
  • Read it by row: I have an employee that I identify as number 7369, his name is SMITH, he is a CLERK, his manager is... I have another employee which is number 7499, his name is ALLEN...

Which way do you choose ? You probably read it by row. Because it is easier to think about it when we group the entities we are talking about. When you have a new employee, you fill a form with his name, job, hiredate, etc. When you update some information, you lock the row, saying 'I'm working on employee 7369', and update the attributes you want to change. You do that because you prefer to cluster together the information that are related, it's easier to manage for our mind. Note that when you program, you often have the same logic. You manage objects, you implement CRUD operation to change their attributes. And your computer knows that, so it tries to be more efficient on clustered data, with prefetching or cache.

Relational Database did the same choice when physically manipulating data.
The SQL language let you have both approaches. You update one row, such as:
   UPDATE EMP SET ... WHERE EMPNO=7369;
or you can update one column, such as
   UPDATE EMP SET SAL=SAL*1.1;


But physically, data is stored by row and is manipulated by row, as when you read the table row by row. And it is a very efficient approach when you manipulate several attributes from one row: you do only one I/O to get the block from disk, you store only one block in cache, and the CPU is able to process everything in one block. Great.

But now let's see what happens when you don't need all the columns. Let's say you want to find who has a salary higher than 3000.

SELECT ENAME from EMP where SAL>=3000;

You have to read the whole table (full table scan) and read each block, and in each block read each row, and for each row find the salary, compare it to 3000 and if it is higher, find the name.

Note that I'm not talking about I/O. Even when my table fit in memory I have a lot of information to read, I have to transfert each value from memory to CPU, and use CPU cycles to process each value.

The first problem is that you have read information about 14 employees where you're interrested only in 3 of them. So you build an index on SAL. From the index you get the rowid for the employees that have SAL>=3000. And then for each of them you go to the row to get its name. Ok, that's not bad, but when you have a lot of queries like that, you will have to create a lot of indexes. You can create an index for each column but then there is the cost to combine to result. We are on OLTP actively updated data, so bitmap index is not a solution here. Or you can create an index for each kind of filtering you can do, but that's a lot of indexes to maintain.

But then there is the second problem. Even if you have to process only the 3 employees you are interrested in, you will have to read the whole row, which is stored in variable length, from begining to end, in order to find the only columns that you need. This is: read row directory, follow pointer to row, read EMPNO length, bypass it, read ENAME, get it, ... This is in memory, and has to be transferred to CPU to be processed, word by word.

To address that problem, you can add to your index all the columns you're interrested in. But then you will have too many indexes to maintain. And you know how hard it is to determine which indexes can be useful or not.

Now let's see how that works with the columnar approach we have seen above. Without any indexes, we will read the SAL column and the result is that item 8,9 and 13 are above 3000. Then we read the ENAME column and just have to get the 8,9 and 13 items. And that's all!

Without any additional structure we had to read only what we are interrested in. And there is more. On current processors when we have the same operation to do on a several values we can do in in a whole with a vector operation. Just pass a vector of salaries to the CPU and ask it to compute the '>=3000' on all of them. This is known as 'Single Instruction Multiple Data'. This is less roundtrips between memory and CPU.

The first CPU operation is a 'find >=3000' on the following vector:

800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300

The second CPU operation is a 'find values from previous result index' on the following vector:

SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER

Compare that with the row approach, navigating within the block, each pointer or each column value being a CPU operation... It's hundreds of CPU cycles that will be saved. This is something that was possible before only on supercomputers. Today our processors can do Single Instruction Multiple Data vector processing, and this is why in-memory columnar storage is coming everywhere.

And the volume to transfer can be decreased when we compress those vectors. Remember when I've read the ENAME in column: the second and third employees are SALESMAN. This is RLE compression where you don't repeat previous values. You can do that with columns, probably not within a row (or your data model is not that relational I guess).

This is the goal of columnar storage: increase efficiency of memory-cpu processing for queries that don't need to read the whole row, and without having to create indexes. Oracle has always been able to handle mixed workload quite well (OLTP and reporting in the same database), thanks to the read consistency implementation. So it is a natural evolution to go to the 'hybrid' approach that adds analytic capabilities to pure OLTP data models. Of course, this is not the kind of thing that is easy to implement. The columnar storage must be maintained, and must be optimized for an efficient CPU processing. The values must be easy to compare byte per byte in a single CPU operation, even when compressed. And CPU vector operation often requires that the values have the same size, so this may require fixed size storage instead of the usual variable length storage that reduces space.

So behind the feature that is supposed to accelerate our application just by switching it on, we can expect some complex technology that is able to bring In-Memory capabilities while staying in our Oracle OLTP database. In relational databases, the first goal for years was to optimize the access to the storage. And this is done by regrouping all information related to the same object, in order to get them in one I/O. Now, the goal is to improve the transfer between memory and CPU by regrouping similar information from several objects, in order to process them in vectors by the CPU. This is an orthogonal approach: row storage vs. columnar storage. Oracle will implement both. And we, the DBAs, have to understand how it works in order to use it in the right place.

Tagged in: In-memory Oracle 12c
Rate this blog entry:
4

Franck Pachot is Consultant at dbi services. He has 20 years of experience in Oracle databases. Through his expertise as a DBA, Oracle expert, data architect, and performance specialist, he is able to cover all database areas: architecture, data modeling, database design, tuning, operation, and training. Franck Pachot knows how to enable an efficient collaboration between the developers and the operational team when it comes to troubleshooting issues or performance tuning. He has passed the OCP certifications from 8i to 12c, is also Certified Expert for Oracle Database 11g Performance Tuning, and now achived the highest level of certification: Oracle Master Certified OCM 11g. Prior to joining dbi services, Franck Pachot was Oracle Consultant at Trivadis in Lausanne. Previously, he worked in several countries and environements, always as a consultant. Franck Pachot holds a Master of Business Informatics from the University of Paris-Sud. His branch-related experience covers Financial Services / Banking, Public Sector, Food, Transport and Logistics, Pharma, etc.


    O_Database12c_Admin_Professional_clrOCE_ODb11gPerfTun_clr11gocm_logo

Comments

  • Guest
    Christopher Bernard Tuesday, 22 July 2014

    Hello, Thanks for the nice blog. I tried the latest 12c download available in Oracle's website and I didn't see the In-Memory option. Do you know which version of the software is going to have this option? Can you please direct us to the link where we can download the right software which has In-Memory option.

  • Franck Pachot
    Franck Pachot Tuesday, 22 July 2014

    Hi Christopher,
    It's there. I't not an option that you check at install. Just use it by setting inmemory_size and alter table INMEMORY
    Regards,
    Franck.

Leave your comment

Guest Saturday, 26 July 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter