Wednesday, October 19, 2011

Index Compression Part 3 (Compressing the indexes online)

My first post on index compression talked about what it is (and isn't).  My 2nd post talked about alter index...validate structure, and offered a very fast way to get the number of columns to compress and estimated compression information into a persistent table.  So now we know what we want to do, and which indexes we want to do it to...now what?

If you ran the procedure from Part 2, you now have a populated table with index owner, index name, the number of columns in the index you should compress (if any) and the expected compression ratio you'll achieve.  The options before you are to:

1. Rebuild the indexes that will benefit from compression, compressing by the number of columns that were indicated from the table populated in Part 2...doing this offline, causing DML locking on the table you're building the index on.

2. Same as above, only doing it online, where you don't block dml.

3. Forget it...leave them the way they are because you don't think the space saved is worth the trouble of compressing them.  This is a completely valid result of this analysis.  If you have abundant storage or if the compression yields were low, don't bother.

Assuming you don't go with 3, and you're in a live database (not your testing database where you're nearly the only user) I would go with #2 above.  Online index rebuilds are similar in some ways to online table redefs.  What happens is that, instead of referencing your table for data to build the index, Oracle references the original index as the source, as of an SCN.  No DDL operations are allowed.  If there's DML on the table, that DML is entered into a journal table.  When the first pass is complete, the journal table is accessed and the changes made during the online create are applied to the new index.  If changes are made during this pass, they're again added to the journal table.  Its possible that high DML activity will cause many passes, which is why Oracle recommends doing this during periods of low DML activity.  Eventually, you'll have a very short pass where all the changes are made to the new index.  At this point, the original index, the new index and the table are all in sync with each other.  The next thing to happen is a nearly instant metadata update that makes the new index the active index, and the metadata for the original index is gone (for all practical purposes).  This means DML is allowed on the table with the online index rebuild for nearly the entire time...there's just one extremely brief lock during the metadata switch-over.  In most environments, this can be done without an outage...only extremely sensitive applications would have a problem with the lock during the metadata update.  Again...test it, so when your management needs to be convinced, you'll have data to prove the outage time is insignificant.

Using the table from part 2 of this series, you can use something like this sql statement to dynamically generate your index rebuilds.  It'll build them parallel 24, and then set them back to whatever parallel setting they had before.  I'm filtering out indexes where compression will save less than 25MB...feel free to tweak it to fit your needs.

set timing off
set heading off
set echo off
set linesize 300
spool comp_idx_usmg.sql
select 'alter index '||owner||'.'||index_name||' rebuild parallel 24 compress '||cmpr_count
||' online; '||chr(10)||'alter index '||owner||'.'||index_name||' parallel (DEGREE '||
(select degree from dba_indexes d2 where d2.owner=d1.owner and d2.index_name=d1.index_name)||');'
from
(
select owner, index_name, cmpr_count,
(select /*+ PARALLEL 24 */ sum(bytes/1024/1024)*(ici.cmpr_pctsave/100)
from dba_segments where owner=ici.owner and segment_name=ici.index_name) saved
from idx_comp_info ici where cmpr_pctsave!=0
) d1 where saved>=25
order by owner, index_name;
spool off
@comp_idx_usmg.sql
exit

I just met with some reps from FusionIO...its clear that as ultra-fast storage options come up, little space-saving techniques like this are even more valuable.  NAND flash technology is becoming cheaper all the time...but it still comes at a premium, and putting indexes on FusionIO is a HUGE performance benefit to your database.  The smaller you can make your indexes, the more you can store on flash, and the faster the FusionIO card performs.  (Yes, as the SSD/Flash card fills, it slows.)  That's an idea for a whole new series, for another day.

I hope all this information helps you when you find that storage is at a premium on your database.  Index compression doesn't have the "Wow!" effect that HCC compression has, but with very little, if any, downside with modern processors, the technology is worth at least a consideration.

Just an addendum:

The stored proc doesn't work for local indexes...its a different syntax to analyze them, and the index partition info is stored in a different view (dba_ind_partitions).  This is unfortunate because that's likely where your biggest indexes (and thus, biggest compression gains) will come from.  Until I get around to adding that functionality, you can use this sql to generate a script to find the compression gains from those indexes:

select 'analyze index '||index_owner||'.'||index_name||' partition ('||partition_name||') validate structure;'||CHR(13)||
  'delete from idx_comp_info where owner='''||index_owner||''' and index_name='''||index_name||''';'||chr(13)||
  'insert into idx_comp_info select '''||index_owner||''','''||index_name||'.'||partition_name||''', opt_cmpr_count, opt_cmpr_pctsave, -1  from index_stats;'||chr(13)||
  'commit;'
  from dba_ind_partitions di where subpartition_count=0
  and exists
  (select 'X' from IDX_COMP_INFO_LOG2 cil
     where di.index_owner=cil.owner
       and di.index_name=cil.name)          
union all    
select 'analyze index '||index_owner||'.'||index_name||' subpartition ('||subpartition_name||') validate structure;'||CHR(13)||
  'delete from idx_comp_info where owner='''||index_owner||''' and index_name='''||index_name||''';'||chr(13)||
  'delete from idx_comp_info where owner='''||index_owner||''' and index_name='''||index_name||'.'||partition_name||''';'||chr(13)||
  'insert into idx_comp_info select '''||index_owner||''','''||index_name||'.'||partition_name||''', opt_cmpr_count, opt_cmpr_pctsave, -1  from index_stats;'||chr(13)||
  'commit;'
  from dba_ind_subpartitions di
  where exists
  (select 'X' from IDX_COMP_INFO_LOG2 cil
     where di.index_owner=cil.owner
       and di.index_name=cil.name)          
     order by 1,

Compressed Indexes (Part 1)
Compressed Indexes (Part 2)
Compressed Indexes (Part 3)


Index Compression Part 2 (Fast compression column count gathering)

In my previous post, I talked about what Oracle index compression is, and what it isn't.  It IS de-duplication, it isn't binary compression.  It is something that may slightly increase CPU, depending on your workload, it isn't something that's likely to hurt your performance...it might even help it.  It definitely saves space, and its worth considering.

Obviously, before you do anything like modifying indexes, that can affect performance, you need to test it in an environment where it won't matter if something negative or unexpected happens.  The analyze index command will tell you the estimated compression ratio and a suggestion on the number of columns to compress.  Unless you're on a very old DB version, don't use the analyze index...compute|estimate statistics...its deprecated, and isn't nearly as good as the more modern options available today.  What we're doing is analyze index...validate structure command, which isn't deprecated.

So...you have many databases with thousands of indexes.  How do you figure out which ones are candidates for compression?  Earlier this week I spoke with a consultant who suggested my client take all the indexes that are over 100MB, run an analyze statement on them in non-prod, and any that have over 25% compression ratio...consider those your candidates.  I know from looking at their storage requirements that many of their indexes are hundreds of GB, totalling 10's of TB.  If they gave even a very slight positive compression ratio, the space savings could be substantial...so what I did was run the analyze on all the indexes.  There are scripts all over "the internets" (that never gets old...thanks Mr Bush) to do this task one index at a time.  It takes a really long time...we're talking about thousands of indexes and TB's of index storage, multiplied by many databases.  From what I see in the syntax documentation,  the parallel clause for the analyze statement is only available for the analyze table statement, not the analyze index/VS statement...I could be wrong about that, but I couldn't get it to work..

This was going to take an eternity to run "analyze index...validate structure" for all indexes, serialized. In addition, I needed to do this on many databases.  After the AI...VS statement, the results are stored in a session-specific table, and they're erased when you analyze the next index...there's no persistent place where the results are stored after the analyze.  Sooooo...I wrote a stored proc to do it...each analyze command is serial, but I can run multiple statements at a time (1 per session)...I chose 20.  The results are then stored persistently in a table.

Something to be aware of...this procedure will do multiple "analyze index...validate storage" commands that will each cause a global, sharing DML lock, which will prevent dml on the table that has the index you're analyzing.  Keep that in mind when you're doing this...don't do this in prod, unless you have an application outage.  What I do is run my stored proc in a copy of the database (hopefully you have one for performance testing you can use, ideally a physical duplicate of the prod database.  If you don't have one do a restore/recover or rman duplicate to a new instance.)  For the analyze, it doesn't have to be performant storage...if you have an NFS server with storage, its easy to use that with dNFS for most databases.  The thing is...you need to test after you compress the indexes.  Really, you always need to test anything you do...so a physical copy with similar hardware to prod is preferred.

What you'll need:
  • (as sys) grant execute on sys.dbms_lock to [username];
  • grant dba to [username];
  • grant analyze any to [username];
  • You may need to modify the exclude list in Query1 to avoid analyzing indexes in your database that are out of project scope for whatever reason.
  • You may want to modify what v_Parallel is initialized to (currently 20) to match your system's CPU_Count (or something less)

What it does:

This stored procedure will create a list of all indexes you want to analyze and put the list in IDX_COMP_INFO.  It will then recursively call itself, loading itself into  the job_queue, v_Parallel (default 20) number of times, so that multiple indexes can be analyzed at once.  As one analyze finishes it puts the results (how many columns you should compress, and the expected compression ratio) in IDX_COMPE_INFO.  It'll load the next index and repeat until all the indexes have been analyzed.  If there's an error while analyzing an index (or for any other reason) it'll put the error in IDX_COMP_INFO_LOG.  You should expect errors like "no data found" in that table for indexes that are built on temp tables or empty tables...also indexes (or indexes with partitions) that are in an unusable state will give errors and put an entry into the log table.

IDX_COMP_INFO's last column will be -1 after this is executed for every row.  The idea is that you can populate that column later with the new size or amount of space savings you'll have...its just nice to keep that info in the same table, IMHO.  If there's an error for some reason while analyzing an index,  IDX_COMP_INFO's  CMPR_COUNT will be -1.  This makes it easy to write sql statements later against this table, because you can say "where cmpr_count > 0" and only look at the compression candidates.

Lastly, there's dynamic sql being used without bind variables, so its vulnerable to sql injection to an evil person that has an execute grant on it...so after you're finished running this procedure on your cloned database, to be ultra safe, I would drop the procedure. 

My next post will be about what to do with the information gained from the procedure below....


CREATE TABLE IDX_COMP_INFO
(
  OWNER         VARCHAR2(30 BYTE)               NOT NULL,
  INDEX_NAME    VARCHAR2(30 BYTE)          NOT NULL,
  CMPR_COUNT    NUMBER,
  CMPR_PCTSAVE  NUMBER,
  MB_SAVED      NUMBER
);


CREATE UNIQUE INDEX IDX_COMP_INFO_UNQ ON IDX_COMP_INFO
(OWNER, INDEX_NAME)
COMPRESS 1;

CREATE TABLE IDX_COMP_INFO_LOG
(
  OWNER  VARCHAR2(80 BYTE),
  NAME   VARCHAR2(255 BYTE),
  ERRM   VARCHAR2(2000 BYTE),
  TRACK  NUMBER
);

CREATE OR REPLACE PROCEDURE get_idx_comp_info
(
 Ind_Owner_in  varchar2 default 'X',
 Ind_Name_in   varchar2 default 'X'
)
authid current_user
AS
 
  type r_script is record (
    owner        varchar2(30),
    idx_name     varchar2(300));
   
  type t_Idx is table of r_script index by binary_integer;
  c_Idx t_Idx;

  v_Query1       varchar2(32000);
  v_Cmpr_Count   number;
  v_Cmpr_Pctsave number;
  v_Job          number;
  v_Count1       number;
  v_Count2       number;
  v_Owner        varchar2(30);
  v_Index_Name   varchar2(30);
  v_Errm         varchar2(255);
  v_Errtrack     number := 0;
  v_MB           number;
  v_MB_Saved     number;

-- *************************************
-- The line below may be adjusted for your platform.  I would make it something less than CPU_Count on your db.
-- *************************************

  v_Parallel number := 20;
 
BEGIN

/*
Author: Andy Black
For more info see: http://otipstricks.blogspot.com/2011/10/index-compression-part-2-how-to-do-it.html

USAGE: create the tables (source available from the link above) and grants needed, then just exec get_idx_comp_info;

Req: You'll need the init.ora parameter job_queue_processes!=0, and it needs to be at least the value of v_Parameter above.

This procedure will submit to the job queue a seperate job for each index, but only v_Parameter (default 20) at a time.  (That way, if job_queue_processes is set to 100, you won't have 100 index rebuilds happening simultaneously.)

This was created to overcome the limitation of not being able to parallelize analyze index...validate structure commands, and no persistent way to trace index compression information found in index_stats (which is cleared out at the start of the next analyze in that session.)  This is many X faster than a serialized script to do the same task.

(as sys) grant execute on sys.dbms_lock to [username];
grant dba to [username];
grant analyze any to [username];
*/

  if Ind_Owner_in='X' then
    execute immediate ('truncate table idx_comp_info_log');
    delete from idx_comp_info where cmpr_count=-1;

--  May need to modify the exclude list below   
    v_Query1 := 'select owner, index_name from dba_indexes
                   where index_type=''NORMAL''
                     and owner not in (''SYS'',''SYSTEM'',''OUTLN'',''DBSNMP'',''WMSYS'',''XDB'',''CTXSYS'')
                     and compression=''DISABLED''
                     and temporary!=''Y''';

    execute immediate v_Query1 bulk collect into c_Idx;
   
    for i in 1..c_Idx.last loop
      begin
        insert into idx_comp_info values (c_Idx(i).owner, c_Idx(i).idx_name, v_Cmpr_Count, v_Cmpr_Pctsave,0);
      exception
        when DUP_VAL_ON_INDEX then
          --  This will allow you to skip the indexes that you've already done, if you want to re-run the procedure
          null;
      end;
      if mod(i,1000)=0 then
        commit;
      end if;
    end loop;      
    commit;
   
    select count(0)
      into v_Count2
      from idx_comp_info
      where cmpr_count is null;
     
    while v_Count2!=0 loop
 
      execute immediate 'select count(0) from dba_jobs where what like ''%get_idx_comp%''' into v_Count1;
               
      if v_Count1<= v_Parallel then     
      
        select owner, index_name
          into v_Owner, v_Index_Name
          from idx_comp_info
          where cmpr_count is null
            and rownum=1
          for update;
           
        update idx_comp_info
          set cmpr_count=-1
          where owner=v_Owner
            and index_name=v_Index_Name;
           
        commit;
        dbms_job.submit( job => v_Job,
               what => 'begin get_idx_comp_info('''||v_Owner||''','''||v_Index_Name||'''); end;',
               next_date =>sysdate-1);
        commit;

      end if;
     
      dbms_lock.sleep(.1);
     
      select count(0) into v_Count2
        from idx_comp_info
        where cmpr_count is null;
       
    end loop;
   
  else
    v_Errtrack := 1;
    execute immediate ('analyze index '||Ind_Owner_in||'.'||Ind_Name_in||' validate structure');
    v_Errtrack := 2;
    select opt_cmpr_count, opt_cmpr_pctsave into v_Cmpr_Count, v_Cmpr_PctSave from index_stats;
    v_Errtrack := 3;
    update idx_comp_info set cmpr_count=v_Cmpr_Count, cmpr_pctsave=v_Cmpr_PctSave, mb_saved=-1 where owner=Ind_Owner_in and index_name=Ind_Name_in;
    v_Errtrack := 4;
    commit; 
  end if; 
 
exception
  when others then
    v_Errm := sqlerrm;
    insert into idx_comp_info_log values (Ind_Owner_in, Ind_Name_in, v_Errm, v_Errtrack);
END;
/

Compressed Indexes (Part 1)
Compressed Indexes (Part 2)
Compressed Indexes (Part 3)

Tuesday, October 18, 2011

Index Compression Part 1 (dispelling myths)

I have a client that's having difficulty with a rapidly growing database that has a fixed amount of storage available in Exadata.  Oracle has announced the ability to buy storage cells separately...which is great, but they aren't free, so its a matter of good stewardship to be as efficient with space consumption as possible.  In an effort to reduce space consumption on this Exadata platform, I'm looking at different options that may help us.  We're doing HCC/C4OLTP everywhere possible, but there's still a little more that we can do with index compression.  When complete, it'll make your storage utilization shrink more than bernie madoff's bank account and it could make your database run faster than a guy married to kim kardashian.

Index compression has been around since 9i...years and years, but there are still a lot of misconceptions around it.  I met some consultants from Oracle that are very knowledgeable about RAC, Exadata...you name it...but for whatever reason they were confused about index compression.  Each of them reinforced the others incorrect ideas about it until they were all in a fortified group think on the topic.  They thought index compression was like OLTP/HCC compression, where compression logic is applied.  They argued that indexes are for quick lookups...why would you want the delay of decompression applied to them and all the additional CPU usage?  First of all, index compression IS NOT like binary compression...like bzip2 or zlib compression.  Its misleading to call it "index compression"...it should be called "index de-duplication."  There is almost no CPU overhead in index decompression, because there is no decompression...its not compressed...its de-duplicated.

The other misconception I heard from those guys is that, a unique index wouldn't benefit from index compression because every row is unique, which means rows aren't duplicated, so there's no way to reduce the storage...you'll use more CPU for no gain.

If you have a single column (ie: a sequence number) as your PK, that's true.  Often, unique indexes and primary keys are concatenated...so you have multiple columns that together, make something unique.  For example, for a cars table, you have a PK for make, model, year, trim.  3 of the 4 columns can be compressed.

MAKE         TRIM           MODEL          YEAR
FORD          LIMITED     EXPLORER    1990
FORD          LIMITED     EXPLORER    2005
FORD          LIMITED     TEMPO          1991
FORD          SPORT         FUSION         2010

You can see all the first column is repeated, much of the 2nd column could be de-duplicated and much of the 3rd column could be de-duplicated.  Together, all 4 columns are unique, but it could still benefit from index compression.

This has been gone over "on the internets", so I won't rehash (very much).  Richard Foote has a nice post about it HERE.

(again, from Richard Foote's post)
Let’s say we currently have a nocompress Non-Unique index on a NAME column with the following entries:
0: Lady Gaga
  : ROWID
1:
Lady Gaga
  : ROWID
2:
Lady Gaga
  : ROWID
3:
Lady Gaga
  : ROWID
4:
Lady Gaga
  : ROWID
5: David Jones
  : ROWID
6: David Jones
  : ROWID
7: David Jones
  : ROWID
After the index is compressed, the leaf block entries would look logically something like this:
Prefix
0:
Lady Gaga
1: David Jones

 
0:0
  : ROWID
1: 0
  : ROWID
2: 0
  : ROWID
3: 0
  : ROWID
4: 0
  : ROWID
5: 1
  : ROWID
6: 1
  : ROWID
7: 1
  : ROWID

Importantly, each distinct combination of compressed column values is now stored just the once within an individual index leaf block. In the example above, we previously stored “David Bowie” 5 times. In the compressed index leaf block, we now only store “David Bowie” once, with the prefix value now being referenced by each index entry instead.
The net result being we can now (hopefully) store many more index entries per leaf block meaning we don’t need as many leaf blocks in our index.


...and so, we need less blocks to store the index.  Since there's nothing to decompress on read, there's not a huge amount of additional CPU usage to use the index to lookup the rowid.  

For reads, there may be an extremely small amount of additional CPU to use the prefix, but from my testing that's more than made up for in a reduction in IO.  For writes/updates, there may be more CPU used...index compression isn't always the thing to do, it depends on your access patterns.  You must test.  From my experience on Exadata, the CPU loads on databases are less than what they are on non-Exadata platforms due to cell offloading.  This means you may have more free CPU than you anticipated.  Since storage is expensive and you get a reduction in IO from compressed indexes (they're smaller...a full index scan and an index range scan will likely have fewer blocks to read in, because there are fewer blocks in the index, so less IO) the trade-off between a slight CPU increase and less IO is a positive one, or at least a wash.
 

As you can see from this OLTP workload simulating Swingbench test, there's almost no difference in performance between compressed and uncompressed indexes on the overall system...possibly a performance increase, but so small its in the noise...definitely not noticeable on this workload.  These tests were run on a database in VMWare using dNFS to NetApp, with very low memory settings (~256MB db_cache) to emphasize physical IO performance.


Before Compressing the indexes
After compressing indexes
Soo...how do you know which indexes to compress, and how many columns are redundant?  I'll save that for the next post.

Compressed Indexes (Part 1)
Compressed Indexes (Part 2)
Compressed Indexes (Part 3)


Wednesday, October 5, 2011

Bloggers meet and great at Julian's

Oracleworld has been a blast...but one of the best times has been hanging with the Pythian guys at Jillian's tonight.  In this pic, I'm laughing so hard I can't breath, next to Alex Gorbachev, Marc Fielding and Karen Angerer.  The guy in front asked that I not reveal his name...he's in Pythian's private security business in anti-terrorism, specializing in micro-brewery anti-terrorism.  He's very good at his job...no micro brews have been hit on his vigilant watch.

It was an honor spending time with these guys.   The highlight was when Alex toasted me...saying, "Respect."



Oracle public cloud

Wow...Larry sure threw Salesforce.com...which bases their whole company on Oracle under the bus!  I wonder how Tom Kyte feels about the slams on Apex?

Oracle Social Network seems like an oxymoron...most techies I know aren't very social.  This isn't Exafacebook, its a BI interface that looks very interesting and different.  This session is a pretty hard slap at SAP.   While showing the SAP UI, he said "It's like looking at the fins of an old Cadillac."

In contrast, the Oracle Social Network looks like a streamlined way for businesses to communicate by issue, rather than by group.  The system recommends contacts for your project rather than the other way around.

Monday, October 3, 2011

New product announcements!

1. Oracle Big Data Appliance
Hadoop map reduce
Oracle data loader for hadoop
Exalytics (see pic), Oracle's new nosql platform.

2. Oem release 12...I can see you can set up IORM in the new OEM UI...this is the new, more intuitive gui.  Human factors have always been a problem for new users of OEM.

3. Exalogic isn't new, but integration with OVM 3.0 is new.  This is a big step forward to the "Big Honkin' Cloud" Larry talked about last year.  Oracle had a product called ASR that monitors Exadata environments and submits SR's to MOS today...it looks like that function has been added to OEM 12 now.  If a hardware problem is detected, an SR is sent to Oracle, and a person is dispatched to replace the part immediately, automaticly.

4. Exalytics BI & EPM appliance.  20x faster relational OLAP performance, 80X faster multidimentional performance.  Times Ten is used in the Exalytics server...think of it as an extension to your db cache, with the ability to offload plsql under some circumstances. 

The demo is using an an analysis of all the cars sold in the world over the last 10 years...a huge amount of data, in memory.

X-2 4

At Openworld, I'll likely have many short posts like this as I see interesting things, but don't have the time to go into depth on them. 

Here's a pic of the front of the new X2-4 Exadata machine:


Openworld 10/3 Oracle Keynote

I think its interesting that Oracle has allowed so much time to their partner/competitor EMC and VMWare in the Monday morning keynote.  The value of EMC's big data strategies, Greenplum, VMWare and EMC Storage w/vfast are all being shown to Oracle's best customers.

They're talking about "Cloud meets big data"...and showing they can get 1 million iops with vfast storage in VMWare..like Exadata, while making available all the virtualization features we've become used to, like snap/clone.

Pared with Greenplum and hadoop, peyabytes are analyzed in seconds.  It'll be interesting to hear Exahadoop's and Exalytics response. :)

Here's a pic of an Exalytics server: