By Franck Pachot
.
In a previous post I tried to compare a single thread workload between Exadata X5 Bare Metal and Virtualized. The conclusions were that there is no huge differences, and that this kind of comparison is not easy.
About the comparison is not easy, some reasons have been nicely detailed by Lonny Niederstadt in this twitter thread
Besides the single thread tests, I did a test with 50 sessions doing updates on a small data set. It’s a 50 session SLOB with SCALE=100M WORK_UNIT=64 UPDATE_PCT=100 and a small buffer cache.
Here are the load profiles, side by side:
Bare Metal Virtualized
Load Profile Per Second Per Transaction Per Second Per Transaction
~~~~~~~~~~~~~~~ --------------- --------------- --------------- --------------- -
DB Time(s): 42.9 0.0 48.6 0.1
DB CPU(s): 5.6 0.0 6.8 0.0
Background CPU(s): 3.4 0.0 3.8 0.0
Redo size (bytes): 58,364,739.1 52,693.1 52,350,760.2 52,555.7
Logical read (blocks): 83,306.4 75.2 73,826.5 74.1
Block changes: 145,360.7 131.2 130,416.6 130.9
Physical read (blocks): 66,038.6 59.6 60,512.7 60.8
Physical write (blocks): 69,121.8 62.4 62,962.0 63.2
Read IO requests: 65,944.8 59.5 60,448.0 60.7
Write IO requests: 59,618.4 53.8 55,883.5 56.1
Read IO (MB): 515.9 0.5 472.8 0.5
Write IO (MB): 540.0 0.5 491.9 0.5
Executes (SQL): 1,170.7 1.1 1,045.6 1.1
Rollbacks: 0.0 0.0 0.0 0.0
Transactions: 1,107.6 996.1
and I/O profile:
Bare Metal Virtualized
IO Profile Read+Write/Second Read/Second Write/Second Read+Write/Second Read/Second Write/Second
~~~~~~~~~~ ----------------- --------------- --------------- ----------------- --------------- ---------------
Total Requests: 126,471.0 65,950.0 60,521.0 117,014.4 60,452.8 56,561.6
Database Requests: 125,563.2 65,944.8 59,618.4 116,331.5 60,448.0 55,883.5
Optimized Requests: 125,543.0 65,941.1 59,601.9 116,130.7 60,439.9 55,690.7
Redo Requests: 902.2 0.1 902.1 677.1 0.1 677.0
Total (MB): 1,114.0 516.0 598.0 1,016.5 472.8 543.6
Database (MB): 1,055.9 515.9 540.0 964.7 472.8 491.9
Optimized Total (MB): 1,043.1 515.9 527.2 942.6 472.7 469.8
Redo (MB): 57.7 0.0 57.7 51.7 0.0 51.7
Database (blocks): 135,160.4 66,038.6 69,121.8 123,474.7 60,512.7 62,962.0
Via Buffer Cache (blocks): 135,159.8 66,038.5 69,121.3 123,474.0 60,512.6 62,961.4
Direct (blocks): 0.6 0.2 0.5 0.7 0.1 0.5
This is roughly what you can expect from OLTP workload: small data set that fits in flash cache, high redo rate. Of course in OLTP you will have a higher buffer cache, but this is not what I wanted to measure here. It seems that the I/O performance is slightly better in bare metal. This is what we also see on averages:
Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Wait % DB Wait
Event Waits Time (sec) Avg(ms) time Class
------------------------------ ----------- ---------- ---------- ------ --------
cell single block physical rea 9,272,548 4182.4 0.45 69.3 User I/O
free buffer waits 152,043 1511.5 9.94 25.1 Configur
DB CPU 791.4 13.1
Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Wait % DB Wait
Event Waits Time (sec) Avg(ms) time Class
------------------------------ ----------- ---------- ---------- ------ --------
cell single block physical rea 7,486,727 3836.2 0.51 63.7 User I/O
free buffer waits 208,658 1840.9 8.82 30.6 Configur
DB CPU 845.9 14.1
It’s interesting to see that even when on I/O bound system there are no significant waits on log file sync.
I’ll focus on ‘log file paralle writes’:
Bare Metal
EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
log file parallel write 1 0 1 microsecond
log file parallel write 2 0 2 microseconds
log file parallel write 4 0 4 microseconds
log file parallel write 8 0 8 microseconds
log file parallel write 16 0 16 microseconds
log file parallel write 32 0 32 microseconds
log file parallel write 64 0 64 microseconds
log file parallel write 128 0 128 microseconds
log file parallel write 256 8244 256 microseconds
log file parallel write 512 102771 512 microseconds
log file parallel write 1024 14812 1 millisecond
log file parallel write 2048 444 2 milliseconds
log file parallel write 4096 42 4 milliseconds
log file parallel write 8192 11 8 milliseconds
log file parallel write 16384 3 16 milliseconds
log file parallel write 32768 1 32 milliseconds
Virtualized
EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
log file parallel write 1 0 1 microsecond
log file parallel write 2 0 2 microseconds
log file parallel write 4 0 4 microseconds
log file parallel write 8 0 8 microseconds
log file parallel write 16 0 16 microseconds
log file parallel write 32 0 32 microseconds
log file parallel write 64 0 64 microseconds
log file parallel write 128 0 128 microseconds
log file parallel write 256 723 256 microseconds
log file parallel write 512 33847 512 microseconds
log file parallel write 1024 41262 1 millisecond
log file parallel write 2048 6483 2 milliseconds
log file parallel write 4096 805 4 milliseconds
log file parallel write 8192 341 8 milliseconds
log file parallel write 16384 70 16 milliseconds
log file parallel write 32768 10 32 milliseconds
As I’ve seen in previous tests, most of the writes where below 512 microseconds in bare metal, and above in virtualized.
And here are the histograms for the single block reads:
Bare Metal
EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
cell single block physical read 1 0 1 microsecond
cell single block physical read 2 0 2 microseconds
cell single block physical read 4 0 4 microseconds
cell single block physical read 8 0 8 microseconds
cell single block physical read 16 0 16 microseconds
cell single block physical read 32 0 32 microseconds
cell single block physical read 64 0 64 microseconds
cell single block physical read 128 432 128 microseconds
cell single block physical read 256 2569835 256 microseconds
cell single block physical read 512 5275814 512 microseconds
cell single block physical read 1024 837402 1 millisecond
cell single block physical read 2048 275112 2 milliseconds
cell single block physical read 4096 297320 4 milliseconds
cell single block physical read 8192 4550 8 milliseconds
cell single block physical read 16384 1485 16 milliseconds
cell single block physical read 32768 99 32 milliseconds
cell single block physical read 65536 24 65 milliseconds
cell single block physical read 131072 11 131 milliseconds
cell single block physical read 262144 14 262 milliseconds
cell single block physical read 524288 7 524 milliseconds
cell single block physical read 1048576 4 1 second
cell single block physical read 2097152 1 2 seconds
Virtualized
EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
cell single block physical read 1 0 1 microsecond
cell single block physical read 2 0 2 microseconds
cell single block physical read 4 0 4 microseconds
cell single block physical read 8 0 8 microseconds
cell single block physical read 16 0 16 microseconds
cell single block physical read 32 0 32 microseconds
cell single block physical read 64 0 64 microseconds
cell single block physical read 128 0 128 microseconds
cell single block physical read 256 518447 256 microseconds
cell single block physical read 512 5371496 512 microseconds
cell single block physical read 1024 1063689 1 millisecond
cell single block physical read 2048 284640 2 milliseconds
cell single block physical read 4096 226581 4 milliseconds
cell single block physical read 8192 16292 8 milliseconds
cell single block physical read 16384 3191 16 milliseconds
cell single block physical read 32768 474 32 milliseconds
cell single block physical read 65536 62 65 milliseconds
cell single block physical read 131072 2 131 milliseconds
Same conclusions here: the ‘less than 256 microseconds’ occurs more frequently in bare metal than virtualized.
For the reference, those tests wer done on similar configuration (except virtualization): X5-2L High Capacity with 3 storage cells, version cell-12.1.2.3.1_LINUX.X64_160411-1.x86_64, flashcache in writeback. Whole system started before the test. This test is the only thing running on the whole database machine.