Every DBA has at one point or another been under the gun for achieving faster database performance. It reminds me of the light speed vs. ludicrous speed dialog from the 1987 Mel Brook’s movie “Spaceballs”:
Colonel Sandurz: Prepare ship for light speed.
Dark Helmet: No, no, no, light speed is too slow.
Colonel Sandurz: Light speed, too slow?
Dark Helmet: Yes, we're gonna have to go right to ludicrous speed.
And that’s exactly what most database users want – ludicrous speed. DBA’s cannot assuage those ever growing performance demands. Even though databases are growing ever larger due to cheap disk space costs, the users simply expect everything to run faster. As such, some DBA’s turn to the high availability (HA) solution: Oracle Real Application Cluster (RAC) – which can scale database performance by adding mode nodes to better load balance workloads and even perform operations in parallel across nodes.
Let’s review the basic architecture for a RAC database as shown below here in Figure 1. There are two key observations: there is just one logical database spread across the nodes, and all the nodes require access to shared storage [I am purposefully ignoring Oracle 12c’s new RAC features Flex ASM and Flex Clusters – i.e. hub & leaf nodes]. It’s this second aspect of shared storage that I want to focus on today. Most DBA’s tend to choose from one of three options: Storage Area Networks (SAN), Network Attached Storage (NAS) or iSCSI. There are others, but these three are very representative of the various options.
Figure 1: Traditional RAC Database Architecture
SAN, NAS and iSCSI servers can house traditional Hard Disk Drives (HDDs = spinning magnetic media), Solid State Disks (SSDs), or even flash cards. In fact it’s possible to have all HDDs, combination of HDDs and SSDs, and the special, expensive class known as All Flash Arrays (AFAs). While they all have various pros and cons, they all share one significant shortcoming: they generally all require that all actual reads and writes traverse the network and thus experience network latency. According to Wikipedia.org, the definition is as follows:
Network latency in a packet-switched network is measured either one-way (the time from the source sending a packet to the destination receiving it), or round-trip delay time (the one-way latency from source to destination plus the one-way latency from the destination back to the source). Round-trip latency is more often quoted, because it can be measured from a single point. Note that round trip latency excludes the amount of time that a destination system spends processing the packet.
A simple example would be how we would communicate if you were standing on one side of the Grand Canyon and I was on the other side. We would each have to first wait for the sound of our voice carry across the distance between us (i.e. one-way latency), be heard by the other person, comprehended, replied to, and then carried back across again (i.e. round-trip latency). For those of us with more than a few gray hairs it would also be much like how old transatlantic long-distance calls used to work – with delays between speaking each way. How effective and/or efficient would communicating via these two examples really be? By today’s technology standard we might not even consider the old transatlantic long-distance calls to even work given the delays. So latency does matter!
Now refer back to Figure 1 and imagine a very heavily utilized web-server (e.g. Ebay.com, Amazon.com, Google.com, etc.) where many tens of thousands or even hundreds of thousands of concurrent users are sending requests to the RAC database nodes – which then send network IO requests across the shared storage network. There could be many millions or even billions of network packets flying back and forth between the database and network shared storage. Even with 10 Gigabit Ethernet (10GbE) or Infiniband that many packets all suffering one-way and/or round-trip latencies could well add up and translate into poor database performance. So not only does latency matter, its cumulative effects also matter!
So I decided to compare some key iSCSI based shared storage options vs. a radically new and disruptive technology from HGST – Oracle RAC server node based flash cards visible across the network via custom software (i.e. shared storage). In fact this novel solution qualifies as a (hyper-) converged system, which Wikipredia.org defines as:
Converged infrastructure operates by grouping multiple information technology (IT) components into a single, optimized computing package. Components of a converged infrastructure may include servers, data storage devices, networking equipment and software for IT infrastructure management, automation and orchestration.
Let’s review the basic architecture for the HGST solution as shown below here in Figure 2. In short this solution has just two components: the flash cards and a Linux driver that makes the flash available and optimized for Oracle Automated Storage Management (ASM). In short the HGST Share driver makes all the flash visible and accessible across all the RAC nodes. Furthermore it utilizes ASM features to mirror the data across all the storage while also providing significant performance gains via “preferred reads”. Under this mode all read database IOs are local, only the dirty writes have to cross the network. Since many systems have far more reads than writes, this gain is further magnified by this skew in IO nature.
Figure 2: Novel New HGST Solution for Oracle RAC
So how do various iSCSI architectures compare to the HGST FlashMAX + Share solution? Look at Figure 3 below. I did a very simple benchmark – I ran a midsized TPC-H data warehousing benchmark (i.e. 22 very complex queries) and recorded the average run time across multiple executions. I also cleared all Oracle SGA and iSCSI memory caches at the start of each test scenario to average in both non-cached and also cached run time results. The iSCSI storage server I used had both HDDs and SDDs. Moreover it allowed for SSDs to be used as an extended iSCSI server side cache. So I compared the following scenarios:
Figure 3: Relative TPC-H Run Times
We can probably agree that this was not the best nor most representative benchmark to run in order to demonstrate these different shared storage options for Oracle RAC. The On-Line Transaction Processing (OLTP) benchmarks such as the TPC-C or TPC-E might better approximate the more common production workloads, but those tests are harder to score. The TPC-H keeps thing simple – how long does it take to execute 22 very complex queries. Plus as read only test it will help to clearly show the benefit of Oracle ASM preferred reads.
These test results surprised me – and possibly even made my latency point. My storage network was an isolated quad 1GbE network segment using IEEE 802.3ad Link Aggregation Control Protocol (i.e. LACP) to combine (aggregate) multiple network connections in parallel in order to increase throughput. The setup was sufficiently fast enough for the database size and workload.
The first surprise was the second column in the chart showed that adding a very fast SSD cache that was large enough to hold most of the data (i.e. 83%) of the TPC-H table data added very little performance improvement (i.e. roughly 3%). So clearly network latency came into play.
The second surprise was the third column in the chart showed that switching to typical SSDs which are rated as 10X faster than the HDDS, yet the improvement once again was below expectation (i.e. 39%). Note too that once again that adding a very fast SSD cache once again yielded substandard results (i.e. 8%). I guess that there’s really not much difference between SSDs no matter their marketing quoted IO rates. Once more network latency helps to explain much of these observations.
The third and final surprise was a very pleasant one – the converged system solution was between 13.88 times faster than just HDDs with no cache, and 7.77 times faster than SSDs with a large, fast cache. Now how much of this difference was due to network latency vs. Oracle ASM preferred reads I cannot say. But it really doesn’t matter. The HGST solution clearly provides ludicrous speed!
I got an excellent question from Roye Avidor, a seasoned systems & database expert at HGST, asking which caching mode did I use: write-back, write-through, write-around? I should have specified this - so apologies. I used write-back hoping to see the largest gain, and yet it was consistently under 10%.
Another question came up - what is FOS flash in diagram 2?
Good question - FOS = Flash On Server vs. FOSC = Flash on Storage Controller.
From following web site: