Friday, December 16, 2011

Flash with Databases-Part 1

For those (like me) that just want to see the results, go <* HERE *>

I've been asked to review some flash storage from different vendors from a database perspective.  Storage Administrators have run their battery of tests and at this point narrowed down the field to a famous enterprise storage manufacturer's SSD's on SAN storage and FusionIO (320GB first generation).  The difference between these technologies is pretty vast...starting with their interface.  Placing SSD drives in a SAN solution is transparent to the server...it could be any other storage (FC, SATA) connecting to the server from the SAN...it just gets presented as a lun.  There are  a lot of advantages to that approach with ease of administration and scalability.  FusionIO uses PCI-E cards connected locally to the server.  There are obvious pros and cons to this approach too. If a card dies, you have to go to that server and pull the card...also, there are a limited number of PCI-E slots in servers...some of the most popular, dense servers, like Cisco's B230M2 have no PCI-E slots...so they aren't compatible with FusionIO.  The sales guys from FusionIO tell me they're working closely with Cisco to begin selling storage with mezzanine interface, to eliminate the PCI-E slot requirement.


For FusionIO, the speed of a direct PCI-E interface is tremendous, and it really shows up in the response time...which, to a database is hugely important...but being limited to a single server for this storage, IMHO is a limitation that's difficult for a storage admin to accept.  The SAN-based approach taken by the SSD manufacturer allows much more simple, traditional central management of the storage through an HBA to your server.  Likely, if you went with this solution it would be the same as your current FC storage, only faster.  No major changes needed for monitoring, administration, etc.  These advantages of the SSD solution can't be quantified, but they should at least be mentioned.  FusionIO's response is that their cards are expected to last through ~7 years of typical use (depending on how much storage you reserve, which is configurable) so although administration and monitoring can be handled with SNMP traps and hot-swappable interfaces, its likely unnecessary because your server will be life cycled out long before your need to replace your FusionIO card.  That being said, even the most reliable hardware has failures, so your configuration needs to compensate with redundancy.


The plan is to test the storage with Orion and Swingbench, moving different items in the database (data, indexes, redo logs, temp, undo) onto flash to find the best way to use flash and to see the performance benefit.  Much of this testing has already been done by others, but there were some things I saw missing in the flash tests on databases I've seen:


  1. The affect of undo on flash


  2. The resource consumption of FusionIO, which uses a driver to simulate disk storage.  This must introduce overhead, but I've never seen it measured. 




  3. As a flash device fills, it slows down.  To be precise, its write performance slows down.  Performance can also degrade after extended periods of high IO.  How will this affect a database?


The tests will be similar to the tests done in previous posts to compare storage on/behind the Hitachi USP-V. Part 1, Part 2, Part 3, Part 4.




FusionIO uses a driver on the server to access their storage to emulate a disk drive...similar to a RAM disk.  This strategy must introduce some overhead.  Texas Memory Systems considers that a selling point of their flash technology, it offloads the CPU overhead to a chip directly on their flash cards.  Since Oracle licenses by core on the system (yes, that's overly simplified), that means you have to pay Oracle for the expensive cpu cycles used by FusionIO.  This may not turn out to be significant...we'll see in testing.  I may get a look at one of the TMS systems in a few weeks...it'll be an interesting comparison to the FusionIO card, since they're both PCI-E based.  I'm  also thinking about throwing Kaminario in the mix if that's possible.  Kaminario doesn't have the name recognition of EMC, but they have an interesting product, designed for ulta low latency of flash.  They use DRAM for their tier 1 storage and FusionIO for their Tier 2 storage.  A rep from FusionIO told me their cards in Kaminario are nearly as fast as their cards plugged directly into a local PCI-E slot...pretty interesting.


Which is cheaper?


FusionIO and all single SSD drives I've seen are a single point of failure.  For a database that's important enough to your company to justify the cost of ultra-fast storage, its likely important enough to require elimination of a hardware single point of failure.  For an SSD on a SAN or NAS you already own, that might mean you just increased the cost per GB by 25%, (because you'll do 4+1 RAID5.)  For FusionIO, since there's a limited number of PCI-E slots, your best option for a database is to do what Oracle did on Exaadata...use ASM's redundancy features...in this case, that means you probably mirror it with normal redundancy...which means you doubled the cost per GB.  When you consider the cost of SSD vs FusionIO...you can't just look at your quoted price per GB, you have to consider configuration, which makes FusionIO more expensive at first glance.  Another option is to use 2 servers and license a dataguard standby...but that increases your Oracle licensing and again, doubles the cost per GB of FusionIO storage.  If you don't already have the infrastructure in place with extra capacity, you also have to consider the "other" costs on the SSD side...HBA's and SAN costs.


Usually...that's the traditional kind of price comparison that's done with storage.  For database storage, what I highly recommend is that you base your requirements on IOPS per $ if its an OLTP database and throughput per $ if its a DSS database, not GB per $.  When you're talking about a network share...$/GB might be the only consideration...but a database needs IOPS to perform well.  You COULD run your enterprise database on cheap, large SATA disks you bought from Fry's or Microcenter, but when people see how it performs you'll be fired and replaced by somebody who isn't afraid to have the difficult conversations that justify SSD's for databases. :)  You could buy a room full of these drives and maybe achieve the IOPS or throughput required for your database, but at that point, FusionIO or a SAN with SSD's would be cheaper...especially when you consider administration costs of maintaining and cooling that room.


I'm testing with the 320GB FusionIO Version 1 card. Here are the specs:



ioDrive Capacity  320GB 
Nand Type  Multi Level Cell (MLC)
Read Bandwidth (64kB)  770 MB/s
Write Bandwidth (64kB)  790 MB/s
Read IOPS (512 Byte)  140,000
Write IOPS (512 Byte)  135,000
Mixed IOPS (75/25/r/w)  119,000
Access Latency (512 Byte)  26 μs


In contrast, here's the specs for the Version 2 of this card:

ioDrive2 Capacity  400GB  365GB
Nand Type  SLC MLC
Read Bandwidth (1 MB)  1.4 GB/s  710 MB/s
Write Bandwidth (1 MB)  1.3 GB/s  560 MB/s
Read IOPS (512 B)  351,000 84,000
Write IOPS (512 B)  511,000 502,000
Read Access Latency  47µs  68µs
Write Access Latency  15µs  15µs


You might notice the difference in read throughput and IOPS on the V2 card dropped from the V1.  This is because they changed the design of the card to be more dense moving from 30nm to ~22nm.  MLC is noisy, so to ensure error-free data, they've implemented ECC, which serializes reads.  Writes are still parallel.  Latency has been lowered with the new design from 29 μs on the V1 to 15µs on v2 for writes...but read latency is much slower (but still insanely fast) at 68µs.


FIO expects the read speeds to become closer to their write speeds by driver updates in the near future.  This is really cutting edge tech at this chip nm density, its only now starting to be available for non-Facebook customers of FusionIO.  Facebook is a huge customer, and they get the good stuff first...


Anyway, since the V2 card isn't performing as well as it eventually will, and since the V1 card is faster, I opted for testing the V1 card.


I look forward to these tests...they should be interesting.  I'll let you know what I find out.


Flash with Databases - Part 1
Flash with Databases - Part 2
Flash with Databases - Part 3
Flash with Databases Inexpensively 

No comments:

Post a Comment