Wednesday, December 8, 2010

Virtualized Storage (USP-V) for Oracle Databases-3. backup/clone

Now that the performance tests are out of the way (see previous post), we can take a look at the virtualization features found in the USP-V as they relate to databases. Incidentally, I'm told the USP-V and the new VSP are virtually (pun) identical in their usage. There are a few changes and differences, such as page level auto tiering is only found in the upcoming micro-code update of the VSP, and the VSP uses more energy efficient, more dense 2.5" drives.

Two features we need from the USP-V are faster backups and faster database refreshes.

Today, clones are accomplished through an RMAN duplicate procedure I put together. Its still limited by the 10GB network (which I used to think was a lot). The requirement is to refresh Prod to Test-take 22 databases, (about 30TB) and refresh them all, keeping them in sync with each other because they have interdependencies. Although you can do an RMAN duplicate to a point in time, there's no RMAN ACTIVE duplicate to a point in time...and restoring (or RMAN duplicating) 22 databases at once would be a huge strain on backup resources. What I came up with is an RMAN active duplicate to 22 standby databases that are each using flashback database. After the RMAN "active duplicate for standby", eventually all the 22 standby databases are current and shipping logs. You stop them, flash them back to the same moment in time, and start them up. Viola...RMAN active duplicate of an entire prod environment to a point in time. :) I should blog about the details around that someday....

These databases are growing quickly (growing by factors over the next few years)...we won't be able to get this done inside our maintenance window next year. Storage virtualization to the rescue....

I ran ~30 tests, I'll just give you the highlights and lessons learned here. There's a feature in NetBackup that will interface between the storage array and RMAN. When RMAN runs a "backup proxy" command, NetBackup passes the details to the storage array, and the storage array does the work. The net effect of this feature is that multi-petabyte databases can have a backup taken in a few seconds by creating a set of pointers to the original storage and tracking deltas after that. An Oracle DBA, who doesn't have access to the Hitachi command interface, can initiate that backup from the familiar settings of Oracle's RMAN. When it comes to restores, the process is basically reversed and your MTTR is reduced to the time it takes to apply archivelogs since your last backup. There's also a feature in the USP-V that allows the storage-level equivalent to the block change tracking feature dba's commonly use in Oracle...only in the USP-V, the smallest denominator is a 42MB page instead of an 8k db block. Since the deltas are tracked after the first backup, only a small percentage of your data (the data that's changed since the last backup) needs to be backed up. The 3rd option is to keep 2 lun sets in sync.

As multiple databases at this client's site are growing to multiple petabytes, this feature holds great promise. I wanted to compare the differences between the current 2tape backups to two alternatives...Oracle's recommended backup strategy which they call merged incremental backups and Hitachi's methods described above.

The database I tested with is around 12TB. Its backed up to 4 LTO-4 drives. I established a baseline by reviewing backup times and performing a baseline restore/recovery. This is a very busy OLTP database, and the backup times for it vary widely around 14hrs...I suspect that at times the drives are attempting to write faster than the database can feed them. Eventually the cache on the tape drives runs dry and they have to stop, rewind and begin writing again where they left off. Its ironic that backing up a busy database that's barely able to keep its head above water is sometimes faster with slower tape drives than it is with fast ones. Anyway, the restore baseline finished in 8.5 hrs, the recovery took 33.25 hrs. The full backup was taken about a week prior to my restore point, so many, many GB of archivelogs needed to be applied. After speaking with the application manager, the cost of 41.75 hrs of downtime would cost the business more than the purchase price of the this feature alone could justify its purchase, all other things being equal.

The first lesson learned came when I tried to add 16, 1TB luns to my ASM diskgroup. Although the published limitation is 2GB/lun for ASM, the error, ORA-15099, reported that I was adding a lun larger than ASM could handle. Doing a:

./kfod di=all

...from the grid home, I was able to see that Oracle was reporting the luns to be 4TB in size, not 1TB. I verified the correct size with AIX's bootinfo, then I created an SR. Oracle identified it as bug 10072750 and they're creating a patch for it. Hopefully it'll be ready before you encounter this issue.

The work-around is to specify the disk sizes (less than 1TB) when you create the diskgroup and add the now I have 16, 1023MB disks in 2 diskgroups, data and fra.

There were complications that prevented the Backup team from applying the NetBackup feature that allows proxy copies to the media server due to a prior issue. It would have been easier for me, but for my purposes, I just need to be able to backup a database using virtualization features. So with a little coordination with the storage team, we were able to manually interface with the storage array and get this to work. Essentially the procedure is:

1. Establish a consistency group (this will mirror 2 sets of luns)
2. Place the database in backup mode
3. Take a snap (this takes a few seconds)
4. Take the database out of backup mode

Putting the db in backup mode would increase this database's archivelog generation...currently reaching 60GB/hr. Since we manually did this we were admittedly a bit clumsier and less efficient than we could have been. I'm told by HDS experts this process, when done in RMAN, places the database in backup mode for no more than a few seconds no matter how big your database is.

Since we were manually doing this we had some options. The USP-V can do "Shadow Images" (which is a storage-layer mirror), or snaps (which are copy on write, tracking deltas). We need to also do a cloning test so we went the Shadowcopy route. To have a point-in-time capability, we would have to do snaps instead.

For the restore, we had the mirror lun set presented to our 2nd lpar on the frame. The storage team split the consistency set in a few seconds and they were done. Once I set up the init.ora/oratab settings we had restored the database to a new LPAR for the restore and since its on a different LPAR, I'm calling it a refresh/clone too. The MTTR is now zero, because there's no recovery to speak of (the database was in a current, crash-consistent state.)

Lastly, I did a test of the RMAN merged incremental backup. The basic procedure of a MIB is:
1. You turn on block change tracking
2. Do an image backup of your database (probably to cheap sata storage)
3. Schedule a "merge"...a recovery of the copies of the datafiles you just made.

With block change tracking...this is a very inexpensive process...normally taking a few minutes on TB-sized, busy databases (depending on activity patterns, of course). In the case of a corrupted datafile, you just have to "switch" (an RMAN command) to the copy, and RMAN will recover that file to be in sync with the rest of the database and start using it. While that's happening, you can restore and recover the original, then switch back. In the event that the entire database is lost, you can "switch database to copy" and the same process takes place.

I've always been a fan of this backup method, and if it weren't for the Hitachi's features, this would still be the way to go. The restore time was eliminated...but they don't say Mean Time To Restore...its Mean Time To Recovery. With block change tracking turned on, the merge took only a few minutes. When it was time for the database switch, since I recommended the merges be scheduled hourly, only 1/2 hr (avg) of archivelogs would have to be applied...compared to the 7 days of archivelogs from the baseline, that's fast to apply. From simulated crash to open, the restore/recovery of the 12TB database took a little under 5 minutes. Like the USP-V features, this option for backup scales...but it also has the advantage of being free. :)

The USP-V was able to complete a restore and a clone instantly...this solution is scalable up to many, many its up to the task of keeping us in the maintenance window indefinitely, and it can do it with the simple RMAN interface a DBA likes to use without giving them storage array access. A close 2nd option is a merged incremental backup strategy...but this wouldn't help us with clones/refreshes.

During step 1, the mirror set is in read-only mode. This gave me an idea on how to cost avoid some licensing for the Golden Gate implementation...I'll have to test that out a different day.

In this series:
HDS USP-V Overview for Oracle DB
HDS USP-V Performance for Oracle DB
HDS USP-V Features for Oracle DB
HDS USP-V Thin Provisioning for Oracle DB

No comments:

Post a Comment