Monday, December 19, 2011

Flash with Databases-Part 2

In my previous post on Flash with databases, I talked about the upcoming FusionIO tests.

Here's the hardware configuration overview:
Server1          : Dell 610 2X4 (8 cores) 
Server2          : Cisco UC460 4X8 (32 threads, 16 cores)
SSD               : 2 mirrored SSD disks
HBAs            : 2-4GB/s
FusionIO        : IODrive 

First, let's look at the SSD to establish a baseline.  The configuration is 2 SSD disks, mirrored over 2, 4GB HBA's.  The tests were done multiple times, so these results are averages.  For all the Swingbench tests after the first, I reduced the db_cache to just 256MB to reduce the IO to be nearly all physical.  I want to stress the storage, not the cache.  A good storage test must have the storage be the bottleneck.  It might be interesting to compare these results to the Hitachi USP-V tests from last year.  The testing was very similar, but the results were opposites of each other, due to the extreme response time differences.

ORION Swingbench on 11gR2 topas/atop
IOPS MBPS Latency Max TPM Max TPS Avg TPS Avg Resp Disk Busy % CPU%
8303 100.98 0.52 164393 3489 2197 62 100 1.61

54674 1100 872 156 100 1.1

IMHO, these are very nice results...this storage would support a very fast database.  Obviously, more SSD's would mean more throughput (if we have enough HBA's.)

The SSD tests above were done on a Dell 610, dual-socket, 8 core server.  As you can see from the green Disk Busy and CPU columns, atop was reporting 100% disk utilization and 1.1 core utilized (110% cpu used.)

When I first started the tests using FusionIO, I could see a huge speed difference right off.  Here are the Orion results:

33439 750.5 0.08

Compared to the SSD, that's 4X the IOPS, 7.5X the throughput and 6.5X faster response time!  The response time is measured in what Orion is saying is that the response time is .08 milliseconds...which is 80 MICRO seconds. My baseline is very fast SSD give you wouldn't be unusual to see normal FC storage on a SAN show 10-15 ms latency.

Above, I mention that in order to test storage, the storage has to be the bottleneck in the test...but I had a problem when I did the Swingbench test.  No matter what I did, I couldn't cause the storage to be the bottleneck after I moved the indexes, data and redo to flash.  At first I thought there was something wrong and the FusionIO driver was eating up the CPU...but I didn't see the problem in the Orion tests on FusionIO, so that didn't make sense. 

Swingbench on 11gR2 Topas/Atop
MaxTPM Max TPS Avg TPS Avg Resp Disk Busy % CPU% Notes
20726 478 261 597 100/6 6.64 Indexes on Flash;First time using the FusionIO-CPU is very high
189132 3522 2897 47 100/63 4.92 Indexes and Data on Flash
162165 2868 2638 42 97/100 8 Indexes, Data and Redo on Flash;all CPU cores are completely pegged

The problem wasn't the driver...the problem was I was able to generate more load than I had ever done before, because the bottleneck had moved from storage to CPU...even with my 256MB db_cache! 

Notice too that atop was reporting FusionIO was 100% utilized...strange things happen to monitoring when CPU is pegged.  For the Disk Busy% field, the first number is the SAN storage, the second number is % busy reported by atop for the FusionIO card.  This 100% busy is accurate from a certain perspective.  The FusionIO card driver uses the CPU from the server to access the storage, and it couldn't get any cpu cycles to the driver to access the storage.  It appears to atop that its waiting for the storage, and so the storage was reported as 100% busy.  That isn't to say the FusionIO card was maxed out...just that you couldn't access the data any faster...which isn't exactly the same thing in this case.  The CPU bottleneck created what appeared to be a storage bottleneck because FusionIO uses the server CPU cycles, and there weren't any available.  I didn't investigate further, but I would suspect a tweaking of the nice CPU priority settings for the driver's process would allow the FusionIO to perform better and report more accurately.  At any rate, the Dell 610 with 2X4 (2 sockets, 4 cores each) couldn't push the FusionIO card to its limits with Swingbench because it didn't have the CPU cycles needed to make storage the bottleneck.

To deal with this CPU limitation, Cisco was kind enough to let us borrow a UC460, which has the awesome UCS technology and 4 sockets with 8 core processors each.  I only had 2 sockets populated, which more than doubled my compute power, giving me 16 Nahalem-EX cores and 32 threads of power (a huge upgrade from 8 Nahalem-EP cores).  I installed the Fusion IO card and retested.  With everything in the database on FusionIO with my extremely low db_cache to force physical IO instead of logical IO, it took 12.37 Nahalem-EX threads to push the FusionIO card to 100 utilization.  When I first did the test with SSD on the Dell, using a normal db_cache, I was only able to get 167k TPM.  Here, I did 170k.  This means I was able to get more speed with purely physical IO's than I was able to get from physical+logical IO's on the SSD's.

Swingbench on 11gR2 topas/atop Notes
Max TPM Max TPS Avg TPS Avg Resp Disk Busy % CPU%
170036 123438 2749 48 0/100 12.37 Everything (including undo) on flash

This made me wonder...if I didn't hold back the db cache, what could the Cisco UC460 do with FusionIO?  In a normal db configuration, how would it perform?  The answer:

Swingbench on 11gR2 topas/atop Notes
Max TPM Max TPS Avg TPS Avg Resp Disk Busy % CPU%
440489 7599 6888 8 0/60 31.41 Everything on flash, with normal memory settings

It took almost 32 threads, and now I'm once again almost 100% CPU utilized.  At this point with 32 threads on a high-performance Cisco UC460, the FusionIO is only 60% utilized. This is the fastest server I have to test with...there's never an 8X10 laying around when you need one.... :)  That's ok, I have enough information to extrapolate.  If 32 threads can drive this thing to 60% utilization...I can calculate this:

Swingbench on 11gR2 topas/atop Notes
Max TPM Max TPS Avg TPS Avg Resp Disk Busy % CPU
734148 12665 11480
0/100 53 Extrapolating (if we had the CPU to make FIO the bottleneck)

Unless some new bottleneck is introduced, a database on FusionIO, (with the 53 Nahalem-EX threads to drive it), would be the fastest database I've ever seen, according to Swingbench testing.  734k TPM....I'm not talking IOPS...I'm saying achieved transactions, with each one having many IO's.  To put things in perspective, that's 13.6X faster than the SSD. There are 6 PCI-E slots on this we could easily still see the bottleneck move to CPU, even with 4 sockets and 64 threads.

To be fair though, the SSD disk has a lot of things outside of the control of the has to go though SAN hardware and HBA's.  The FusionIO has direct access with the PCI-E bus...which isn't really a bus, if you get into the details of PCI-E.  That's one of the reasons why, as the FusionIO card gets more and more busy, the response time continues to stay low.  In my testing, the worst response time I saw was .13 milliseconds (there's a . before the 13...that's 130 microseconds...not quite RAM speed, but closer to RAM response time than spindle response time.)...when the UC460 was completely pegged (which alone, is a difficult thing to accomplish.)

FusionIO is ridiculously fast and from an IOPS/$ perspective, extremely cheap.  They captured a huge percentage of the flash market in the first year of their public offering, and its easy to see why.  In the techy world we scratch and claw for any technology that can bring a few percentage of performance improvements to our databases.  Its rare we see something that so completely transforms the environment...where we can shift bottlenecks and get 10X+ performance.  Think about what this you likely have CPU monitoring on your servers to make sure a process doesn't spin and take up CPU needlessly...that's going to have to go out the window or become more intelligent...because the CPU will be expected to be the bottleneck.  We'll need to use Oracle Resource Manager more than ever. 

Downside: FusionIO is local storage (for now.)  Although their cost per IOP is low, their cost relative to cheaper storage on SAN is very high from a GB/$ standpoint, which is the traditional way storage is approached from storage administrators.  Its takes a strong database engineer to convince management to approach the issue from an IOPS requirement rather than a GB requirement.  Its a paradigm shift that needs to be made to reach ultimate performance goals.  Kaminario has addressed the local storage requirement issue well, from what I've heard.  Hopefully I'll take a look at that within a few months.

So...flash technology and especially FusionIO can be a game changer...but how can you configure it to be useful while most efficiently using your storage budget for databases?  I'll look at that next...

Flash with Databases - Part 1
Flash with Databases - Part 2
Flash with Databases - Part 3
Flash with Databases Inexpensively 

No comments:

Post a Comment