Monday, August 15, 2011

Partitoning while migrating to Exadata (ILM and Compression)-P1

In our frenzy to move data from around 22 databases into a single database in Exadata, at the same time we're doing some modifications to the larger tables- hundreds of them are 2GB+ in size.  As you've read in the previous posts, there's concerns that the growth rate of these tables may exceed the storage capacity before the original 3 yr projections.  To postpone this, we're trying to partition these tables for ILM (information lifecycle management) and performance during the migration.

The idea is that usually, the newest data is the most active and the older data is kept for archival purposes.  The new data is usually manipulated with DML and the old data is usually used only for queries.

With Exadata's 8 table compression options, we have a lot of ways to shrink the data and make it fit in our storage capacity as long as possible.  These all have trade-offs between compression density and CPU overhead.  HCC compress for archive high is extremely good at compressing data...but your access is very slow and it uses lots of CPU...compress for OLTP is relatively bad at compression, but its extremely fast.  Since the CPU for compression/decompression is often offloaded to the cores on the storage cells...the only real concern is how well does it compress, and how will it affect query and DML performance?

Experimentation and testing is in progress now, but our initial results show compress for OLTP actually improves performance on Exadata with our workload!  C4OLTP usually gives between 2X and 4X compression.

My thought is, since DML on an HCC compressed table without direct IO moves that row into a C4OLTP block, and since there's no way to delineate the "old, Read-Only" data from the "new, often-modified" data in the tables, we compress some of the largest multi-TB tables with "HCC compress for query-high".  The idea is that we take an initial performance hit, but after a while, internally all the old stable data will be stored with HCC QH, and the new volatile data will be C4OLTP.  Its the best of both worlds...the volatile data will be faster than no compression due to C4OLTP and the older data will be extremely densely compressed.  This should work because our access patterns show we rarely do bulk insert/update/deletes.

Eventually, (I'm thinking annually) we'll have to move the partitions to re-compress the data to HCC QH again and let the activity sort out the data between the 2 compression methods.  As testing continues, I'll let you know how its going.


2 comments:

  1. Please keep us posted on how this HCC->evolves to OLTP compression strategy works out. Thanks.

    ReplyDelete
  2. I definitely will. When I test things like this, I like to isolate the environment, run the exact same scripts on each test object, with and without compression, with and without sorting on key columns....

    In this situation, the testers are an application testing team. There are a lot of layers between the tables and the user's keyboard that will introduce variables (network, application servers, etc), so I'm concerned that the improvement or penalty of the design won't be quantifiable. I'm going to verify we have RAT licensing, then I'll have better numbers...otherwise I'll have to be a little more creative.

    ReplyDelete