By Franck Pachot

.
In previous post I explained that I’ve observed a 2x CPU usage when running my insert test case in 11.2.0.4 vs. 11.2.0.2 I discussed that with @martinberx @OracleSK @TanelPoder @xtner @Ycolin @fritshoogland @lleturgez @LucaCanaliDB on twitter and it became clear that:

  • ‘perf’ is the tool to use when investigating CPU usage.
  • we must be use hardware event ‘cpu-cycles’ and not ‘cpu-clock’ and they are not available in my VirtualBox VM

So I installed a VMware VM with OEL7 and the following two versions of Oracle: 11.2.0.3 and 11.2.0.4

Elapsed Time

On my VirtualBox VMs my test case elapsed time was 77 seconds on 12.1.0.2 instead of 35 seconds on 11.2.0.2. I’ve tested it in different situations: 32-bit and 64-bit linux, 11.2.0.1, 11.2.0.2, 11.2.0.4 and it was clear that the CPU time used to execute the test case is consistently about 35 seconds in version 11.2.0.2 and before. And the double on versions 11.2.0.4 and 12c.

I tried perf but it wasn’t possible to get cpu-cycles samples even when setting SSE4 passthrough. So I installed a VMware environment (and I’ve enabled ‘Virtualize CPU performance counters in VM sessions’) and tried the buk insert version.

Here are a few runs on 11.2.0.4

Elapsed: 00:00:18.01
Elapsed: 00:00:16.40
Elapsed: 00:00:15.61
Elapsed: 00:00:15.54

Here is the elapsed time on 11.2.0.3

Elapsed: 00:00:08.34
Elapsed: 00:00:08.35
Elapsed: 00:00:09.87
Elapsed: 00:00:09.25

So I’ve two conclusions for the moment:

  • It’s not the same time in VMWare and VirtualBox
  • I’ve reproduced my issue where elapsed time is x2 11.2.0.4

cpuinfo

The performance difference probably comes from the CPU features that are enabled in the VMware VM:

[oracle@vmware home]$ grep ^flags /proc/cpuinfo
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm fsgsbase smep

and that are not in the VBox VM:

[oracle@vbox home]$ grep ^flags /proc/cpuinfo
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl pni monitor ssse3 lahf_lm

perf

In order to use perf, I’ve read Stefan Koehler blog about it.

First, I get the process id for my session:

SQL> select spid from v$process 
where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));
SPID
------------------------
55597

Then I record the sampling of cpu-cycles:

perf record -e cpu-cycles -o /tmp/perf.out -g -p 55597

Then I run my PL/SQL block inserting 1M rows, bulk insert, commit only at the end. And I stop the sampling with ^c.

I show the report with:

perf report -i /tmp/perf.out -g none -n --stdio

Perf report for run on 11.2.0.3

# ========
# captured on: Sat May 16 21:59:53 2015
# hostname : localhost.localdomain
# os release : 3.8.13-35.3.1.el7uek.x86_64
# perf version : 3.8.13-35.3.1.el7uek.x86_64
# arch : x86_64
# nrcpus online : 1
# nrcpus avail : 1
# cpudesc : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 1780608 kB
# cmdline : /usr/libexec/perf.3.8.13-35.3.1.el7uek.x86_64 record -e cpu-cycles -o /tmp/perf.out -g -p 55256
# event : name = cpu-cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, id = { 10 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
# Samples: 28K of event 'cpu-cycles'
# Event count (approx.): 18991352213
#
# Overhead      Samples  Command       Shared Object                                       Symbol
# ........  ...........  .......  ..................  ...........................................
#
    13.11%         3763   oracle  oracle              [.] kdiins0
     3.43%          958   oracle  oracle              [.] lnxadd
     2.73%          761   oracle  oracle              [.] pfrfd1_init_locals
     2.47%          688   oracle  oracle              [.] lnxmul
     2.23%          635   oracle  oracle              [.] __intel_new_memcpy
     2.18%          608   oracle  oracle              [.] pevm_ENTER
     1.90%          529   oracle  oracle              [.] lnxsub
     1.84%          519   oracle  oracle              [.] pmusgel_Get_Element
     1.75%          500   oracle  oracle              [.] kdkcmp1
     1.68%          467   oracle  oracle              [.] pfrrun_no_tool
     1.58%          440   oracle  oracle              [.] pfrust
     1.54%          429   oracle  oracle              [.] lnxmin
     1.22%          349   oracle  oracle              [.] kauxsin
     1.07%          297   oracle  oracle              [.] pfrinstr_INMDH
     0.94%          269   oracle  [kernel.kallsyms]   [k] native_apic_mem_write
     0.88%          244   oracle  oracle              [.] pevm_RET
     0.87%          248   oracle  oracle              [.] pfsabrc
     0.85%          237   oracle  oracle              [.] pfrinstr_ADDN
     0.83%          231   oracle  oracle              [.] pfrxca
     0.78%          217   oracle  oracle              [.] pfrinstr_INHFA1
     0.75%          209   oracle  oracle              [.] pfrinstr_SUBN
     0.73%          204   oracle  oracle              [.] kgiPinObject
     0.70%          200   oracle  oracle              [.] pfrb2_convert_var2aad
     0.69%          191   oracle  oracle              [.] pmusise_Insert_Element
     0.67%          187   oracle  oracle              [.] pfrinstr_INFR
     0.64%          179   oracle  libc-2.17.so        [.] __memmove_ssse3_back

Perf report for run on 11.2.0.4

# ========
# captured on: Sat May 16 19:46:51 2015
# hostname : localhost.localdomain
# os release : 3.8.13-35.3.1.el7uek.x86_64
# perf version : 3.8.13-35.3.1.el7uek.x86_64
# arch : x86_64
# nrcpus online : 1
# nrcpus avail : 1
# cpudesc : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 1780608 kB
# cmdline : /usr/libexec/perf.3.8.13-35.3.1.el7uek.x86_64 record -e cpu-cycles -o /tmp/perf.out -g -p 35344
# event : name = cpu-cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, id = { 5 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
# ========
#
# Samples: 25K of event 'cpu-cycles'
# Event count (approx.): 17296175886
#
# Overhead      Samples  Command      Shared Object                                       Symbol
# ........  ...........  .......  .................  ...........................................
#
    14.20%         3646   oracle  oracle             [.] kdiins0
     3.61%          918   oracle  oracle             [.] lnxadd
     3.20%          814   oracle  oracle             [.] pfrfd1_init_locals
     2.49%          637   oracle  oracle             [.] __intel_new_memcpy
     2.43%          618   oracle  oracle             [.] lnxmul
     2.34%          596   oracle  oracle             [.] pevm_ENTER
     2.05%          527   oracle  oracle             [.] kdkcmp1
     2.01%          513   oracle  oracle             [.] pmusgel_Get_Element
     2.00%          507   oracle  oracle             [.] lnxsub
     1.99%          505   oracle  oracle             [.] pfrrun_no_tool
     1.74%          443   oracle  oracle             [.] lnxmin
     1.74%          441   oracle  oracle             [.] pfrust
     1.72%          434   oracle  libc-2.17.so       [.] __memmove_ssse3_back
     1.25%          321   oracle  oracle             [.] kauxsin
     1.22%          311   oracle  oracle             [.] pfrfd_init_frame
     1.17%          299   oracle  oracle             [.] pfrinstr_INMDH
     1.09%          276   oracle  oracle             [.] kglpnp
     0.96%          243   oracle  oracle             [.] pfrinstr_ADDN
     0.94%          239   oracle  oracle             [.] pfrxca
     0.93%          239   oracle  oracle             [.] pfsabrc
     0.91%          230   oracle  oracle             [.] pmusise_Insert_Element
     0.90%          228   oracle  oracle             [.] __PGOSF347_pfrinstr_INHFA1
     0.81%          206   oracle  oracle             [.] kss_get_owner
     0.80%          204   oracle  oracle             [.] pfrinstr_XCAL
     0.72%          182   oracle  oracle             [.] pevm_RET
     0.70%          177   oracle  oracle             [.] pevm_ARGEIBBI
     0.70%          178   oracle  oracle             [.] pfrb2_convert_var2aad

I don’t see many differences here, so it does not seem that there is an additional code path that is run.

So the question is still opened… But ‘perf’ is really a great tool to get the detail about the database time with is not accounted in wait events. There are a lot of references about it at the end of Stefan Koehler’s blog.

Update May 17th: It was not clear when I posted that yesterday, but all tests on that VMware environment are done using the bulk version with forall but no append_values hint.