Monday, July 1, 2013

Oracle RAC on vBlock

My recent project migrating many large, very active databases from single instance AIX to RAC running Redhat 6.2 had a lot of challenges that changed the design as time went on.  Originally the plan was to deploy (according to VMWare's best practices) using vmdk's on datastores, but the overall storage requirements exceeded 60TB, so this was no longer an option and we were forced (to my delight) to use raw devices instead.  All of these databases logically migrated to multiple VCE vBlocks (http://www.vce.com/products/vblock/overview).

Per SAP's ASM best practices (Variant 1), we placed the storage in 3 diskgroups: DATA, RECO and ARCH.


Oracle ASM Disk Group Name Stores
+DATA     - All data files
                  - All temp files
                  - Control file (first copy)
                  - Online redo logs (first copy)

+ARCH     - Control file (second cop

                  - Archived redo logs


+RECO      - Control file (third copy)
                   - Online redo logs (second copy)

Per Oracle's best practices, all the storage in a diskgroup should be the same size and performance...and the SAP layout suggests different IO requirements of these pools, so we went with a combination of SSD's and fast 15k SAS spindles in the DATA diskgroup (FAST on), many smaller 15k SAS spindles in REDO and slower 7200rpm 2TB NL-SAS spindles in +ARCH...after all, its ok if the background processes take longer to archive your logs.  Redo will remain active a little longer, but as long as its cleared long before we wrap around all the redo groups, its sufficient, doesn't affect performance and its much less expensive per GB.  We also created VMWare datastores for the OS out of the arch pool, since it, too, has low iops requirements.

There are some issues with this design but overall its performing extremely well.  The SAP database  is serving about 4 million db calls per minute, generating 1TB of archivelogs/day.  For a mixed load or DSS database, that archivelog generation wouldn't be a big deal...but for a pure OLTP db that's pretty respectable.  The DB Cache is undersized at 512GB...more than the old system had, which has really helped take the load off the storage and reduced our IOPS requirements.  The "DB Time" tracked by SAP is showing over a 2X performance boost.

For the larger non-SAP databases, their performance increase has been much more dramatic.  SAP ties your hands a bit, to make things consistent between all their customers their implementation is very specific...you have to be SAP Migration Certified to move a database to a new platform.  Michael Wang (from Oracle's SAP Migration group), who also teaches some Exadata administration classes, is an excellent resource for SAP migrations, and he's great to work with.   Many features that have been common in Oracle for years aren't supported.  For the non-SAP databases, we're free to take advantage of all the performance features Oracle has...and there are many.  We compressed tables with advanced compression, compressed indexes, tweaked stats and caches, moved to merged incremental backups on a different set of spindles than our data, create profiles suggested during RAT testing...basically everything we could think of.  For some databases, we implemented result cache...for others we found (in RAT testing) that it wasn't beneficial overall...it depends on your workload.  Some of our biggest performance gains (in some cases, 1000X+) didn't come from the new hardware, new software or the new design...but came from the migration itself.  For years, database upgrades were done in place, and since performance was tracked relative to "what it usually is" rather than what it should be...lots of problems, such as chained rows, were hidden.  After we did a logical migration, these problems were fixed and performance reached its potential.  I got lots of emails that went something like, "Wow, this is fast!!"

Its extremely good, but not perfect.  There's still an issue left due to going to multiple VNX's instead of a single vMax.  I'll talk about that one later.