I would like to share some personal thoughts. September 28th 2021, Oracle released the eleventh Exadata data machine called “X9M-2” (2 CPU sockets), X9M-8 (8 sockets) and ZDLRA X9M. Exadata is a computing platform to run Oracle RDBMS, zero data loss recovery appliance (ZDLRA) is a platform to backup Oracle RDBMS and based on Exadata hardware.
Combining hard- and software and importance of documentation
Oracle Exadata is undoubtedly an interesting platform to run Oracle RDBMS on. It’s the combination of hard- and software that makes the difference to other platforms. Maybe you remember the famous release of Apple iPhone that redefined smartphone market using sweet spots on hard- and software?
For the readers technically interested, I recommend to have a look at documentation. See here for example on hardware details allowing to compare old and new. Please also find a comparison in the next section.
Regarding ZDLRA, one can find site planning, network configuration, hardware and software installation, and maintenance in documentation. To date, there is no documentation available on new features, but you may check all books at a later time.
What changed? A comparison
Please find in documentation hardware details on a single webpage. In whitepapers, Oracle briefly introduces features and list some benchmark results:
Compute nodes hardware
|Metric||# CPU sockets||X8M||X9M||increase %|
|8K database read I/Os per second||2 (X?M-2)||1’500’000||2’800’000||87 %|
|8 (X?M-8)||5’000’000||5’000’000||0 %|
|8K flash write I/Os per second||2 (X?M-2)||980’000||2’000’000||204 %|
|8 (X?M-8)||3’000’000||3’000’000||0 %|
|Max Memory in GB||2 (X?M-2)||1536||2048||33 %|
|8 (X?M-8)||6144||6144||0 %|
|Total CPU cores||2 (X?M-2)||48||64||33 %|
|single core marks||ca. 960||ca. 1200||25 %|
|multi core marks||ca. 24’000||ca. 45’000||87.5 %|
8260, 2.4 Ghz
|Intel® Xeon® 8358, 2.6Ghz||n/a|
|Total CPU cores||8 (X?M-8)||192||
|single core marks||ca. 960||
|multi core marks||ca. 26’000||
8268, 2.9 Ghz
Intel® Xeon® 8268, 2.9 Ghz
CPU scores are taken from geekbench 5 results database, take with caution, may not be reliable. No increase on X?M-8 compute nodes, as same hardware on both versions. Main reason to use X?M-8 is more max memory available per compute node (think of in-memory database option). No average latency given on IOPS, therefore results are not completely documented and one need to interpret them with caution.
Please note benefit using persistent memory on compute node (in memory mode, so to increase total amount of RAM, but no persistence) are not as big as using PMEM on storage node (app-direct mode, persistence). See here the modes explained.
Storage nodes hardware
|8K database read I/Os per second||Extreme Flash (EF)||1’500’000||2’300’000||53 %|
|High capacity (HC)||1’500’000||2’300’000||53 %|
|8K flash write I/Os per second||Extreme Flash (EF)||470’000||614’000||31 %|
|High capacity (HC)||470’000||614’000||31 %|
|Flash raw capacity in TB||Extreme Flash (EF)||51.2, PCIe 3.0||51.2, PCIe4.0||0 %|
|High capacity (HC)||25.6, PCIe 3.0||25.6, PCI 4.0||0 %|
|Disk raw capacity in TB||Extreme Flash (EF)||0||0||0 %|
|High capacity (HC)||168||216||28.6%|
|Persistent memory raw capacity in TB||Extreme Flash (EF)||1.5
32 % bandwidth
|High capacity (HC)||1.5
32 % bandwidth
|Total CPU cores||Extreme Flash (EF)||32
2x Intel Xeon 5218 processors (2.3GHz)
2x Intel® Xeon® 8352Y 2.2 Ghz
|0 % (1)|
|High capacity (HC)||32
2x Intel Xeon 5218 processors (2.3GHz)
2x Intel® Xeon® 8352Y 2.2 Ghz (2)
|0 % (1)|
1x Intel Xeon 5218 processors (2.3GHz)
1x Intel® Xeon® 8352Y 2.2 Ghz
No increase on flash storage and persistent memory capacity. PMEM bandwidth increase according to Intel. No average latency given on IOPS, therefore results are not completely documented and one need to interpret them with caution.
(1) Comparison on base frequency must be taken with caution as there are more factors on a CPU to influence overall system performance. Other factors are CPU quantity, max turbo frequency, cache sizes, number of Ultra Path Interconnects (UPI), thermal design power (TDP), PCIe version and root complex, number of memory channels, etc. (list not complete).
(2) Xeon 8352Y was set to use only 16 out of 32 available cores using Intel Speed Select Technology – Performance Profile. This technology may not be used to lower RDBMS license costs. On Exadata, use Capacity on Demand feature instead. Scale up possible, scale down not. Similar to Oracle Database Appliance. One may also use hard partitioning.
Full rack configuration
Performance increase between X8M and X9M platform according to Oracle tests (we don’t know how Oracle is testing and if test are consistent) are quite impressive. Interestingly, x9M full rack with high capacity storage uses 14 instead of 12 storage nodes. Besides taht, there ist more CPU power as more modern hardware available both on compute and storage nodes. No average latency given on IOPS, therefore results are not completely documented and one need to interpret them with caution.
|8K database read I/Os per second||12’000’000||22’400’000||86.7%|
|8K flash write I/Os per second||5’640’000||8’596’000||52.4 %|
|8 x X8M-2
|8 x X9M-2
|12 x High capacity
|14 x High capacity
|Disk raw capacity in TB||2016||
|Flash raw capacity in TB||307.2||358.4||16.7 %|
|Persistent Memory raw capacity in TB||18||21|
Please note there are also other configurations available:
||Compute nodes||Storage nodes||Remarks|
|Full rack||8 X8M-2
|Elastic configuration||single rack||up to 19,
912 cores max
|up to 19,
1216 cores max
|up to 19,
608 cores max
|up to 18,
576 cores max
|multi rack||up to 32 compute nodes||?, but possibly higher than X8M||up to 64 storage nodes||?, but possibly higher than X8M||
Oracle Exadata Configuration Assistant (OECA) for Exadata simplifies the elastic configuration process. OECA helps you to investigate and plan a variety of elastic configuration scenarios. Yet not updated with X9M platform data, but certainly in future. Download it here, an Oracle account is needed.
Some words on performance
Please be advised: Performance increase is not only due to hardware change (latest Intel Xeon Ice lake CPUs, PCIe 4.0 instead 3.0, series 200 persistent memory instead 100, … you name it), but also to software which may be used on both X8 and X9M platforms. It’s interesting to see what implications there are when choosing a processor platform. Maybe to see AMD and/or even a port to ARM CPUs in future on Exadata?
Furthermore, I believe hardware tuning is the easiest of all methods to tune Oracle RDBMS. But not the one with biggest effects. There can be a bigger impact by tuning on SQL level or simply keep active data sets in databases small (e.g. one uses partitioning and offload data on a regular basis). Both SQL tuning and data archiving means that a tuning specialist needs to understand how end users use a service. And how the service is meant to be used. Unfortunately there is a difference between the two… Therefore tuning is individual in every company even using the exact same software. Writing this I got remembered to the 2009 german movie “Same Same But Different“…
To sum it up: Tuning is hard work, a true craft, sometimes even art… It’s advisable to tune with a scientific approach (knowing the effects before implementing them) and end tuning efforts when impact on performance from an end user perspective is too low. So get end users in the tuning boat.
What can be learned from cloud
Speaking on performance tuning, one can think of services that can not be run on public clouds because of network latency. That can be a reason to have an Exadata on customer side (Exadata [email protected] offering). But one can learn from cloud services to engineer environments on-premises. I see at least two interesting aspects:
Saves time and ensures quality. DBAs end up becoming developers. Running services autonomously by data mine data ditctionary (or simple use “statistics”, but this term is not as modern as “data mining”) is also part of automatisation.
Many database workloads can fit in a few standards. Maybe you start to set them like T-shirt sizes S, M, L, XL on your oracle RDBMS offerings? No rule without exception, there are always and will be services demanding more or less. But don’t underestimate the power of Standardisation.
T-shirt sizes are usefull to define limits when consolidate a large number of database on same hardware. Exadata offers Ressource Manager to limit CPU, memory and disk usage.
To me it’s important to understand that both automatisation and standardisation can be achieved running services on premises as well. But probably at higher costs and time efforts. Data can be the second valuable asset in a company after employees and must be protected (see section “On trust”), so money is not important. Time and agility are some main reasons why to move workload to cloud. But a supplier should not take decisions for the customer, it’s always him to decide.
At the end, trust between customer and supplier is an important factor to choose a solution. Trust is a cultural thing, not everybody act the same. This makes live quite interesting, don’t you think? Regarding Exadata [email protected] service, customers trust Oracle corporation to run their workloads and deal with their data even hardware stays in customers data center. Not every supplier is capable protecting data as many incidents showed in the past, and risks will increase with standardisation (less variants to choose from) and monopolisation (fewer vendors). There would be an interesting experiment on [email protected] services: What happens if one caps internet access to the Exadata hardware? Will databases continue to run? Yes, according to Thomas Teske (see comments below), but not suggested.
A vendor may know it’s solution best on capabilities and limits, so it’s a logic step to also offer appliances. The more complex a solution, the more this can become true. Oracle RDBMS is a complex piece of software. Appliances and services have to fit in an existing environment. My experience is: The bigger the company, the bigger the environment, the more vendors/silos there are, the less standards used. Quite an impressive amount of people are dealing to migrate data and services from one platform to another. This is by far not every time successful. Furthermore, legacy platforms stay longer as expected. It’s easy to end up as a zoo keeper.
Time to refactor?
On the other hand: if only the vendor is able to build meaningful appliances to its software, did the vendor reach or even miss the point in time to refactor its software products? Oracle RDBMS was first commercially available in 1979. Honestly, I have no idea how much code and ideas still exist from older times in today versions. And personally, I think Oracle strategy is mainly on introducing new features (cloud drives development) and not robustness and fixing bugs. It’s also quite hard for developers to keep up with what hardware and software today are offering. Modern programming languages and remote direct memory access (RDMA) and persistent memory are just some examples. One goal is that hardware interacts the most direct way with hardware using the least amount of software (and main CPU cycles), so overall performance will improve. Dealing with Oracle RDBMS, licence costs that are CPU based may be lowered. Maybe one time you pay for payload and not CPU power?
But maybe at the end the most fundamental steps are to
- release software using an appropriate open source licence
- work constantly on simplicity and close with end customers
- identify and lowering technical depths before implementing new features
- monetise not by selling licences but services
Back to reality
I think most developers, C-Managers and end users don’t care about databases at all. If it comes to Oracle RDBMS, few people understand the complex engine. Most workloads fit in standard configuration, but there are huge risk for misconfiguration. If in doubt, leave standard values and avoid to use hidden parameters. PL/SQL is far too complex, as a result most payload is simply being processed on application server or client level. To transport data has negative effects on costs, energy consumption and performance. On the other hand processing data outside database level may avoid database vendor lock-in. Application come and go, database engines stay.
An optimal service
- may be available promptly,
- usage may be self explaining (focus on usability),
- payload may stay under end user control (with the basic feature to erase) and
- it’s performance may scale automatically up and down according to current workload while maintaining low end users response times.
Easy requirements, hard to materialise.
Thinking out loud. What is your opinion?
- Intel® Xeon® 8352Y is 32 cores, not 16
It’s a typo in Oracle X9M-x Whitepapers and documentation, I informed some people working for Oracle.
Thank you, nirwander
- [email protected] Experiment: Services will continue if you cap internet access, according to Thomas Teske.
Thank you, Thomas
- Cloud learnings: [email protected] mentioned as possible reason to minimise network latency
- Total cores comparison: Added storage cell XT
- Hint why not persistent memory used on compute nodes.
- Mentioned other factors on CPUs to influence overall system performance
- X9 storage nodes: Xeon 8352Y uses 16 instead 32 cores as an Intel Speed Select Technology – Performance Profile is in place. May not be used to lower RDBMS license cost, mentioned other possibilities.
Thanks for explanation, Gavin Parish and Thomas Teske