Tuesday, October 18, 2011

Index Compression Part 1 (dispelling myths)

I have a client that's having difficulty with a rapidly growing database that has a fixed amount of storage available in Exadata.  Oracle has announced the ability to buy storage cells separately...which is great, but they aren't free, so its a matter of good stewardship to be as efficient with space consumption as possible.  In an effort to reduce space consumption on this Exadata platform, I'm looking at different options that may help us.  We're doing HCC/C4OLTP everywhere possible, but there's still a little more that we can do with index compression.  When complete, it'll make your storage utilization shrink more than bernie madoff's bank account and it could make your database run faster than a guy married to kim kardashian.

Index compression has been around since 9i...years and years, but there are still a lot of misconceptions around it.  I met some consultants from Oracle that are very knowledgeable about RAC, Exadata...you name it...but for whatever reason they were confused about index compression.  Each of them reinforced the others incorrect ideas about it until they were all in a fortified group think on the topic.  They thought index compression was like OLTP/HCC compression, where compression logic is applied.  They argued that indexes are for quick lookups...why would you want the delay of decompression applied to them and all the additional CPU usage?  First of all, index compression IS NOT like binary compression...like bzip2 or zlib compression.  Its misleading to call it "index compression"...it should be called "index de-duplication."  There is almost no CPU overhead in index decompression, because there is no decompression...its not compressed...its de-duplicated.

The other misconception I heard from those guys is that, a unique index wouldn't benefit from index compression because every row is unique, which means rows aren't duplicated, so there's no way to reduce the storage...you'll use more CPU for no gain.

If you have a single column (ie: a sequence number) as your PK, that's true.  Often, unique indexes and primary keys are concatenated...so you have multiple columns that together, make something unique.  For example, for a cars table, you have a PK for make, model, year, trim.  3 of the 4 columns can be compressed.

MAKE         TRIM           MODEL          YEAR
FORD          LIMITED     EXPLORER    1990
FORD          LIMITED     EXPLORER    2005
FORD          LIMITED     TEMPO          1991
FORD          SPORT         FUSION         2010

You can see all the first column is repeated, much of the 2nd column could be de-duplicated and much of the 3rd column could be de-duplicated.  Together, all 4 columns are unique, but it could still benefit from index compression.

This has been gone over "on the internets", so I won't rehash (very much).  Richard Foote has a nice post about it HERE.

(again, from Richard Foote's post)
Let’s say we currently have a nocompress Non-Unique index on a NAME column with the following entries:
0: Lady Gaga
  : ROWID
1:
Lady Gaga
  : ROWID
2:
Lady Gaga
  : ROWID
3:
Lady Gaga
  : ROWID
4:
Lady Gaga
  : ROWID
5: David Jones
  : ROWID
6: David Jones
  : ROWID
7: David Jones
  : ROWID
After the index is compressed, the leaf block entries would look logically something like this:
Prefix
0:
Lady Gaga
1: David Jones

 
0:0
  : ROWID
1: 0
  : ROWID
2: 0
  : ROWID
3: 0
  : ROWID
4: 0
  : ROWID
5: 1
  : ROWID
6: 1
  : ROWID
7: 1
  : ROWID

Importantly, each distinct combination of compressed column values is now stored just the once within an individual index leaf block. In the example above, we previously stored “David Bowie” 5 times. In the compressed index leaf block, we now only store “David Bowie” once, with the prefix value now being referenced by each index entry instead.
The net result being we can now (hopefully) store many more index entries per leaf block meaning we don’t need as many leaf blocks in our index.


...and so, we need less blocks to store the index.  Since there's nothing to decompress on read, there's not a huge amount of additional CPU usage to use the index to lookup the rowid.  

For reads, there may be an extremely small amount of additional CPU to use the prefix, but from my testing that's more than made up for in a reduction in IO.  For writes/updates, there may be more CPU used...index compression isn't always the thing to do, it depends on your access patterns.  You must test.  From my experience on Exadata, the CPU loads on databases are less than what they are on non-Exadata platforms due to cell offloading.  This means you may have more free CPU than you anticipated.  Since storage is expensive and you get a reduction in IO from compressed indexes (they're smaller...a full index scan and an index range scan will likely have fewer blocks to read in, because there are fewer blocks in the index, so less IO) the trade-off between a slight CPU increase and less IO is a positive one, or at least a wash.
 

As you can see from this OLTP workload simulating Swingbench test, there's almost no difference in performance between compressed and uncompressed indexes on the overall system...possibly a performance increase, but so small its in the noise...definitely not noticeable on this workload.  These tests were run on a database in VMWare using dNFS to NetApp, with very low memory settings (~256MB db_cache) to emphasize physical IO performance.


Before Compressing the indexes
After compressing indexes
Soo...how do you know which indexes to compress, and how many columns are redundant?  I'll save that for the next post.

Compressed Indexes (Part 1)
Compressed Indexes (Part 2)
Compressed Indexes (Part 3)


No comments:

Post a Comment