Thursday, December 9, 2010

Virtualized Storage (USP-V) for Oracle Databases-4. thin provisioning

Let's say you need to build a database that's expected to be 7 TB within 2 years. When you create the database server, you tell the storage team or administrator you need 500GB times 20 luns, for example (10TB). You know you need less than that, but its better to be safe than come back for more space later. In a nutshell, thin provisioning is when you're presented with the luns you asked for, 10TB in our example...but physically, you only allocate storage as its needed from the storage. If you only use 2TB the first year...although your server thinks it has 10TB...you're physically only allocating 2TB leaving 8TB for allocation by other servers.

The concept of thin provisioning is motivated by the fact that every server in an enterprise has unused, but allocated space every time a lun is carved out. On local storage, this is just something you live with, but on expensive san storage, this adds up to a huge amount of space...an independent company recently did an audit and claimed upwards of 40% of our storage is allocated, but unused. This is millions of dollars of underutilized storage.

As a database guy, my instinct is to say there's no such thing as wasted storage...the empty space means I'm occupying more spindles giving me the potential for more database speed...but the fact is that empty space is unmanaged. We always need free space for growth, but by the time you sum up empty space in blocks, tablespaces and storage...you're talking about a huge amount of space. It would be better to occupy that space with a managed ILM approach...sharing rarely accessed data with highly accessed data, or packing it in on storage with enough speed to handle it. I don't love the concept of sharing storage with other systems, but if there's enough cache that its transparent to me...no harm, no foul.


There are a lot of details (as it relates to Oracle databases) around storage virtualization that haven't been documented, so I wanted to dive a little deeper into it...specifically around thin-provisioned ASM. In the previous post we looked at thin-provisioned overhead on performance...so that aside, how, and when, is space allocated at the storage layer? How do you become un-thin? What can we do to prevent this and when it happens, what do you do to correct this situation?

We know storage is allocated on the USP-V as its requested in 42MB pages. Once its allocated, it stays allocated...so if you create a 10GB file and then remove it...you're still allocating 10GB. From my tests on NTFS...an OS block is created, then deleted, then created in the next free space for it. This is great for a hard drive because it spreads out the wear. Most SSD's do this internally anyway...but for thin provisioning this is terrible. NTFS with any activity won't remain thin. Oracle ASM is the opposite...it uses a space allocation algorithm that dictates that as storage is written (and allocated), then deleted (but still allocated on storage), then written again, you'll be writing to the exact space that was already allocated in whatever increments you set for your AU size. The problem can still exist that you allocate a lot of space, then remove it (like for archivelogs)...in which case you're no longer thin. The cure for this is the ASRU utility developed by Oracle and 3PAR. Its a pearl script that will shrink all the ASM disks to their minimum size, write (extremely efficiently) 0's to the remaining free space in the diskgroup, then return the ASM disks to their previous size. When this is complete, the storage administrator can click the Zero Page Reclaim button (why isn't this scriptable, HDS?!) and reclaim as free all the space that's free to the storage pool. I had a meeting at Oracle Openworld with reps from Netapp about why this process is necessary...from the ASM instance we can recognize all the AU's that are occupied and their physical addresses...so we should then know all the space that's not being used. In addition, there's an algorithm for AU allocation in Oracle...not only do we know from the ASM instance what AU's are currently being used, from the algorith we can deturmine were the next 100 AU's will be placed. The ASRU utility and the administration associated with it shouldn't be necessary. The first storage vendor to recognize that and implement it wins the database storage market...it'll save companies millions of dollars on new storage.

That being said, its all we have today, so we tested it. My test diskgroup had 350GB...I had 100% allocated on storage but only 20% utilized by the database. The ASRU utility finished in 24 minutes, ZPR took 67 minutes and reclaimed 80% of the storage. To locate the ASRU utility was painful...all the links to it were dead that I found on the internet. I opened an SR and was given a link to an internal Oracle website (which obviously didn't work for me.) Eventually Oracle Support was able to supply it to me. If you need it, let me know.

We created a 10GB datafile to see how much space was allocated. Physically on storage, there are occasional very small writes, even in empty non-sparse tablespaces. The USP-V allocates at a minimum 42MB...so in the end our 10GB empty tablespace physically allocated 7770MB. Lesson learned here...create smaller datafiles with autoextend on.

Most people recognize that a rebalance happens when you add a lun to an ASM diskgroup...to me the exact process of how it moves the AU's to the new lun is unclear...it isn't a direct read and copy...there's a lot more activity than that. My test diskgroup was about 50% full of data and I added an equal amount of space to it. What you end up with is a diskgroup that's now 25% full (according to ASM), but 100% allocated at the storage level. You can correct this with ASRU, but that's a manual process (due to the gui-only mgmt interface HDS provides.) The work around to this is to grow the luns in the USP-V and then resize the ASM disks to their new larger size. This process is officially supported by Hitachi on NTFS only. We tried it with AIX, but after the luns grew (from the USP-V's perspective), the lun sizes never changed at the OS level, so there was nothing ASM could do. Our P5 is using VIO servers which act as a middle man between the luns presented by the USP-V and our LPAR...we suspect that might have been our issue, but due to time constraints we were unable to verify this. This is a problem for thin provisioning on AIX, it makes thin provisioning very difficult for databases using ASM...at least, keeping them thin after a rebalance. We wanted to test this on Suse Linux 11 running on a Cisco UCS, but we discovered SUSE isn't bootable from the USP-V...although a patch is expected within the month.

I remember the first time I explained the concept of thin provisioning to the database tech lead at a client site. Her reaction was a mix of confusion and horror...followed by, "What happens if we try to allocate space and its not really there?" ...that was our first test.

What happens when you call for space that the OS thinks is available, but it isn't due to thin-provisioning? I set up an 11.2.0.2 GI ASM diskgroup using a 10GB lun in a 5GB storage pool. I started adding data until eventually I was trying to allocate storage that didn't exist.

1. I was told by the storage team that at least 4 emails were sent to them to warn them of the situation.
2. The database froze, and I called the storage team and had them add some space to my storage pool...which they did, and the USP-V seemed content without an issue...unfortunately, the db didn't get the memo that the crisis had passed...it was still frozen.
3. I logged in to unix and I could do things like cd and ls...but anything else would give me "IO ERROR". I was working in an OS mount that wasn't in the same storage pool I had tested on...so this surprised me. Eventually the db gave the error "ORA-01114: IO error writing block to file 4..."
5. I decided to bounce the database...but I couldn't log in and create a new session to issue a shutdown command.
6. I killed PMON at the OS level...and it died...but all the other processes were still up!
7. We then bounced the LPAR...first a "soft" boot (which didn't work), then a "hard" boot. When the server came back...several luns were missing...and we later determined, completely corrupted.

Luckily I had scheduled this test right after the "merged incremental backup" test in the previous post...because I was forced to do a complete restore/recovery of the database. After more analysis, I was told the scci driver on the frame had locked up, which affected all the storage on our testing frame. Lesson learned...never, ever let your storage pool run out of storage.

Conclusion: Thin provisioning is in its infancy...only recently has the linux foundation implemented SCCI trim extentions to allow the OS to notify the storage server of the deletes...to my knowledge no distro's have implemented this yet, although I discussed it in a meeting with Oracle VP Wim Coekaerts to be added to their new kernel. Potentially ASM and databases could work very well with it but there's a trade-off between additional maintenance to keep things thin vs storage savings. If your database isn't growing, this might be an option for you...keeping in mind the overhead seen from the performance tests in the previous posts. Its possible that a different OS or even AIX not using a VIO server would have been more successful in these tests. This isn't the USP-V's fault since it works with NTFS...but our big databases are in *nix...so for practical purposes, it didn't work. The Storage Guy has some interesting points on this topic...read about them here.

In this series:
HDS USP-V Overview for Oracle DB
HDS USP-V Performance for Oracle DB
HDS USP-V Features for Oracle DB
HDS USP-V Thin Provisioning for Oracle DB

No comments:

Post a Comment