Thursday, December 22, 2011

Databases on Flash done Inexpensively

If you're looking at putting your database (SAP, Hana, Sql Server, MySQL, Oracle...whatever) on flash, you should *really* take a look at FusionIO.  FusionIO is how Facebook is able to drive its MySQL databases so fast.  The ioDrive card is a single point of failure for your storage, so you need some way to protect it.  Oracle ASM's normal redundancy is effectively software mirroring...and its very simple to use and set up.  If you have the budget, you can get truly extreme performance by using ASM's normal redundancy to mirror two ioDrives, but this method of protection would effectively double the cost of your FusionIO purchase. 

If you need more performance than you currently have, and you can't afford the cost of the storage it would take to put your entire database on flash, there's a different way to get it done, while still protecting the storage from a single point of hardware failure.

In my previous post about databases on flash, I talked about "option #4", which is-set up ASM as if its an extended cluster (when its not) and set it to use the fast, relatively expensive (per GB) FusionIO ioDrive storage as the preferred read failgroup.  The other failgroup is your existing SAN storage.  There are advantages of doing this instead of just getting faster storage on your SAN.  You're more HA than before, because you're protected from a storage failure from either the SAN or from the ioDrive.  If the SAN storage is unavailable, the database will stay up and running, tracking all the changes that are made since the SAN failed.  Depending on how long the outage lasts, and how you configured the diskgroup, when the SAN comes back, it'll apply those changes and after a while the SAN storage will again be in sync with the ioDrive.  If the ioDrive goes down, the database will continue to run completely on SAN storage until the ioDrive is back which point, things can be synced up again.

I wanted to quantify how this performs in a little more detail with a few tests.  Using the uc460 from the previous DB on Flash tests, I added an HBA, set up some mirrored SSD's on a SAN with 2 partitions, configured an ioDrive with 2 partitions and then I created 3 purely on the SSD's, one purely on FusionIO, and one that uses normal redundancy with the preferred reads coming from FusionIO.  All the databases are set up exactly the same, with only 256M of db_cache each.

Here is atop reporting IO while Swingbench is running on a database that's completely on the SSDs (sdb is the SSD presented from the SAN).

Test 1:

This is the same idea, but this time no SSD...with purely the ioDrive (fioa).

Test 2:

To set the preferred read fail group, you simply need to do this:

alter system set asm_preferred_read_failure_groups = 'diskgroup_name.failgroup_name';

This is the combination of the two, letting ASM distribute the write load in normal redundancy, with reads coming from the FusionIO card.

Test 3: you can see:

1. All reads (yes, except for those 4 stray IO's) are coming from fioa, and the writes are distributed pretty much equally (both in IO's and MB/s) between fioa and sdb.

2. Atop is showing that even though all reads are coming from fio and its doing the same amount of writes as the SSDs on the SAN, its still easily able to keep up with the workload...its being throttled by the slower sdb storage.  One more time I have to point out...the ioDrive is sooo fast.  Incidentally, this speed is from the slowest, cheapest ioDrive made...the other models are much faster.

3. The Swingbench workload test is forcing the exact ratio of reads/writes will always happen.  The potential for more than the 200% read performance shown above exists.  What you would see if you logged in to the database while the test is running is that reads are lightning fast, and writes are 30% faster than they've ever been before on the legacy storage.  In this configuration the legacy storage is only required to do writes and no reads, so its able to do the writes faster than before (60,596 vs 46,466 IO's and 57.19 vs 43.92 MB/s).  All this performance boost (200%+ reads, 130% writes), and we now have an HA storage half the cost of moving the database completely to flash.

In the real world, your legacy storage wouldn't be a mirrored SSD on a SAN, it would likely be much faster FC storage in a raid array.  This ioDrive could be a failgroup to san storage 3 times faster than the SSD before you'd get diminishing which point, you could just add a 2nd ioDrive.  Still, I think the approach and the results will be the same.  Unless you have a legacy SAN solution that's faster than a few of these can do together, there are definite performance gains to be had.

No comments:

Post a Comment