Blog - comments

I don't know any documentation about those EC and ECJ. And I'm sorry I don't know the consequence of...
Hi Franck thanks for clarifying this. I was already wondering about the difference between EC and EC...
Reiner
Sometimes with a group of transactions generating many archived logs, shipped and applied on standby...
Rick Chen
Thanks Franck, Let's change active/passive solution to other words. how about "RMAN script manual ma...
Rick Chen
Hi Franck, almost missed that article... - thank you! Having studied history for a lot of years I li...
Martin Preiss
Blog Franck Pachot Sockets, Cores, Virtual CPU, Logical CPU, Hyper-Threading: What is a CPU nowadays?

dbi services Blog

Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Sockets, Cores, Virtual CPU, Logical CPU, Hyper-Threading: What is a CPU nowadays?

Because people know that I really like Oracle AWR reports, they send them to me from time to time. Here is one I have received for Christmas with the following question: 'Since our AIX server is virtualized, the response times are 4x longer. I feel that we are CPU bound, but cpu usage is mostly idle. Is there something hidden?' It is a question that is frequently raised with virtualized environments. Here, we will see that the system reports the CPU utilization as being only 30% busy. However, despite appearances, the report is coming from a very busy system suffering from CPU starvation. So, yes, something is hidden and we will see how we can reveal it. 

A CPU nowadays is not the physical processor we used to have. The example I have here comes from an IBM LPAR running AIX and I will explain Virtual CPU, Logical CPU and CPU multi-threading. Even if it's not Oracle specific, I'll use the figures from the AWR report which show the statistics gathered from the OS.

The AWR report covers 5 hours of activity where users are experiencing long response times:

 Snap IdSnap TimeSessionsCursors/Session
Begin Snap: 27460 20-Dec-13 15:00:07 142 2.8
End Snap: 27465 20-Dec-13 20:00:49 101 2.2
Elapsed:   300.72 (mins)    
DB Time:   855.93 (mins)    

Here is the average CPU load gathered from the OS:

Host CPU (CPUs: 20 Cores: 5 Sockets: )

Load Average BeginLoad Average End%User%System%WIO%Idle
9.82 9.26 18.7 12.0 4.4 69.3

 And because the report covers 5 hours we have an hourly detail in 'Operating System Statistics - Detail':

Snap TimeLoad%busy%user%sys%idle%iowait
20-Dec 15:00:07 9.82          
20-Dec 16:00:15 12.50 26.67 14.09 12.58 73.33 6.21
20-Dec 17:00:26 15.53 41.47 24.44 17.03 58.53 5.84
20-Dec 18:00:34 6.44 36.91 25.20 11.70 63.09 4.55
20-Dec 19:00:40 6.52 24.39 15.00 9.40 75.61 3.35
20-Dec 20:00:49 9.26 24.34 14.99 9.35 75.66 1.88

The point is that when you look at that you think that the system is more than 50% idle. But that's wrong.

Look at the second line. Being only 26.67% busy with a load of 12.5 running processes would mean that the system is able to run 12.50/0.2667=47 processes concurrently. But that's wrong again: we don't have 47 CPU on the whole system.

Ratios are evil, because they hide the real numbers they come from. Let's see how the %cpu utilization is calculated.

StatisticValueEnd Value
BUSY_TIME 11,162,236  
IDLE_TIME 25,169,672  
IOWAIT_TIME 1,584,726  
SYS_TIME 4,360,698  
USER_TIME 6,801,538  
LOAD 10 9
NUM_CPUS 20  
NUM_CPU_CORES 5  
NUM_LCPUS 20  
NUM_VCPUS 5  

During the 5 hours where the statistics were collected, we had 11,162,236 centiseconds of CPU that were used by running processes,  and 25,169,672+1,584,726 centiseconds of CPU resources were not used. Which supposes that we have in total 11,162,236+25,169,672+1,584,726 centiseconds, and that is about 100 hours of CPU time available. 100 hours of CPU during 5 hours of elapsed time supposes that we have 100/5=20 CPU in the system.

But that's wrong. We cannot run 20 processes concurrently. And the %cpu that is calculated from that is a lie because we cannot reach 100%.

This is virtualization: it lies about the available CPU power. So, do we have to calculate the %cpu from the number of cores that is 5 here ? Let's do it. We used 11,162,236 centiseconds of CPU, that is 31 hours. If we have 5 cpu during 5 hours then the cpu utilization is 31/(5x5)=124%

This is hyper-threading: we have more CPU than the number of cores.

Now it's time to define what is a CPU if you want to understand the meaning of the numbers.

The CPU is the hardware that executes the instructions of a running process. One CPU can cope with one process which is always working on CPU, or with two processes that spend half of their time outside of CPU (waiting for I/O for example), etc.

With multi-core processors, we have a socket that has several CPU cores in it. Cores in a socket can share some resources, but the core is processing the process instructions. So the number of CPU we're interrested in is the number of cores. The OS will calculate the %cpu from the number of cores, and that's the right thing to do. And Oracle uses the number of cores to calculate the license that you have to buy, and that's fair: it's the number of instances of their software that can run concurrently.

Now comes virtualization. In our exemple, we are on an IBM LPAR with 5 virtual CPU (the NUM_VCPU from the V$OSSTAT). What does that mean ? Can we have 5 processes running on CPU efficiently ? Yes if the other LPARs of the same hardware are not too busy. Then our 5 virtual CPU will actually run on 5 physical cores. But if all the LPAR are demanding CPU resources, and there's not enough physical CPU then the physical CPU resources will be shared. We will run on 5 virtual CPU, but those VCPU will be slower than physical ones, because there are shared.

Besides that, when the CPU has to access to RAM, there is a latency where the CPU is waiting before executing the next instruction. So the processor manufacturers introduced the ability to run another thread of process during that time (and without any context switch). This is called hyper-threading by Intel, or Symmetric Multiprocessing (SMP) by IBM. In our exemple we are on POWER7 processors that can do 4-way SMT. So theorically each of our 5 VCPU can run 4 threads: here are the 20 logical CPU that are reported. But we will never be able to run 4x more concurrent processes, so calculating %cpu utilization from that is misleading.

Then, how to get the real picture ? My favorite numbers come from load average and runqueue. 

Look at the line where load average was 15.53 and cpu utilization shows 41.47%. Because the 41.47% was calculated from the 20 CPU hypothesis, we know that we had on average 0.4147*20=8.294 process threads being running on CPU. And then among the 15.53 processes willing to run, 15.53-8.294=7.236 were waiting in the runqueue. When you wait nearly as much as you work you know that you are lacking resources: this is CPU starvation.

So hyperthreading is a nice feature. We can run 8 threads on 5 CPU. But don't be fooled by virtualization or hyper-threading. You should always try to know the physical resources that are behind it, and that usually requires the help of your system admin.

Tagged in: AIX AWR CPU Oracle
Rate this blog entry:
6

I'm a Senior Consultant, and Oracle Technology Leader at dbi services (Switzerland).

Certified DBA (OCM 11g, OCP 12c, Performance Tuning Expert, Exadata Implementation) I cover all database areas: architecture, data modeling, database design, tuning, operation, and training.
My preferred area is troubleshooting oracle and performance tuning, especially when I acheive to enable an efficient collaboration between the developers and the
operational team.

Besides this blog, I participate in the Oracle Community in forums, blogs, articles and presentation.
You can follow my activity on this blog: RSS and my twitter account:  
 


O_Database12c_Admin_Professional_clrOCE_ODb11gPerfTun_clr11gocm_logoalt

Comments

  • Guest
    Johnny Wednesday, 12 March 2014

    Very interesting
    I would like to know your thoughts about Oracle database running with HyperThreaded CPUs. Do you think that is correct to consider how much more avaiable "horse power"?
    Thanks in advance

    Reply Cancel
  • Franck Pachot
    Franck Pachot Thursday, 13 March 2014

    Johnny,
    It's very difficult to answer that because hyper-threading efficiency depends a lot on what is running on the threads of the core. Any 'benchmark' stating that the performance improves by xx% is only related to one specific workload. Note that xx% is not above 20-30 % in any case.
    First in order to see hyper-threading in action, you need to have high CPU utilization: the number of running processes reaching the number of core.
    Then, you may see an improvement because another thread can run while you're accessing memory. And that happens a lot when reading buffers. But then there is the problem with the L1 CPU cache that is shared (L2 is often shared by cores anyway). When you have only one thread, the processes takes benefit of that cache where the oracle block should fit. But with several threads, when the L1 cache is shared, then one thread will have its data evicted from memory cache by the other thread.
    So personally I don't consider hyper-threading as more "horse power" and I try to keep the number of processes running in cpu lower than the number of available cores. Hyper-threading will come into play during a cpu usage peak that goes higher than the number of cores: it will lower a bit the bad effect of cpu starvation by avoiding too frequent context switches.
    But you can test it on your machine if you have a stress test workload. Set instance caging (activate a resource manager plan and set cpu_count) to the number of cores and see if the performance improves when increasing cpu_count.
    Regards,
    Franck.

  • Guest
    Johnny Thursday, 13 March 2014

    Great
    I understood your point, very reasonable. What let me crazy is the fact how numbers about CPU utilizations are reported by tools like TOP or SAR.
    For example, using a Intel CPU with 2 cores (and HT) without consider queueing, if we have the folowing utilizations per core:

    vCPU0 utilization 100% (real core)
    vCPU1 utilization 30%
    vCPU3 utilization 100% (real core)
    vCPU4 utilization 30%

    Top could show us an average utilization of about 65%. Could we to say
    that we are near real total capacity of CPU? Of course, roughly speaking.

    I don't know if I am being clear enough Its more or less about what Adrian wrote here:
    http://www.hpts.ws/papers/2007/Cockcroft_CMG06-utilization.pdf

    Thanks

Leave your comment

Guest Sunday, 23 November 2014
AddThis Social Bookmark Button
Deutsch (DE-CH-AT)   French (Fr)

Contact

Contact us now!

Send us your request!

Our workshops

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

Expert insight from insiders!

Fixed Price Services

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

A safe investment: our IT services at fixed prices!

Your flexible SLA

dbi FlexService SLA - ISO 20000 certified.

dbi FlexService SLA ISO 20000

ISO 20000 certified & freely customizable!

dbi services Newsletter