Wednesday, October 19, 2011

Index Compression Part 2 (Fast compression column count gathering)

In my previous post, I talked about what Oracle index compression is, and what it isn't.  It IS de-duplication, it isn't binary compression.  It is something that may slightly increase CPU, depending on your workload, it isn't something that's likely to hurt your might even help it.  It definitely saves space, and its worth considering.

Obviously, before you do anything like modifying indexes, that can affect performance, you need to test it in an environment where it won't matter if something negative or unexpected happens.  The analyze index command will tell you the estimated compression ratio and a suggestion on the number of columns to compress.  Unless you're on a very old DB version, don't use the analyze index...compute|estimate statistics...its deprecated, and isn't nearly as good as the more modern options available today.  What we're doing is analyze index...validate structure command, which isn't deprecated. have many databases with thousands of indexes.  How do you figure out which ones are candidates for compression?  Earlier this week I spoke with a consultant who suggested my client take all the indexes that are over 100MB, run an analyze statement on them in non-prod, and any that have over 25% compression ratio...consider those your candidates.  I know from looking at their storage requirements that many of their indexes are hundreds of GB, totalling 10's of TB.  If they gave even a very slight positive compression ratio, the space savings could be what I did was run the analyze on all the indexes.  There are scripts all over "the internets" (that never gets old...thanks Mr Bush) to do this task one index at a time.  It takes a really long time...we're talking about thousands of indexes and TB's of index storage, multiplied by many databases.  From what I see in the syntax documentation,  the parallel clause for the analyze statement is only available for the analyze table statement, not the analyze index/VS statement...I could be wrong about that, but I couldn't get it to work..

This was going to take an eternity to run "analyze index...validate structure" for all indexes, serialized. In addition, I needed to do this on many databases.  After the AI...VS statement, the results are stored in a session-specific table, and they're erased when you analyze the next index...there's no persistent place where the results are stored after the analyze.  Sooooo...I wrote a stored proc to do it...each analyze command is serial, but I can run multiple statements at a time (1 per session)...I chose 20.  The results are then stored persistently in a table.

Something to be aware of...this procedure will do multiple "analyze index...validate storage" commands that will each cause a global, sharing DML lock, which will prevent dml on the table that has the index you're analyzing.  Keep that in mind when you're doing this...don't do this in prod, unless you have an application outage.  What I do is run my stored proc in a copy of the database (hopefully you have one for performance testing you can use, ideally a physical duplicate of the prod database.  If you don't have one do a restore/recover or rman duplicate to a new instance.)  For the analyze, it doesn't have to be performant storage...if you have an NFS server with storage, its easy to use that with dNFS for most databases.  The thing need to test after you compress the indexes.  Really, you always need to test anything you a physical copy with similar hardware to prod is preferred.

What you'll need:
  • (as sys) grant execute on sys.dbms_lock to [username];
  • grant dba to [username];
  • grant analyze any to [username];
  • You may need to modify the exclude list in Query1 to avoid analyzing indexes in your database that are out of project scope for whatever reason.
  • You may want to modify what v_Parallel is initialized to (currently 20) to match your system's CPU_Count (or something less)

What it does:

This stored procedure will create a list of all indexes you want to analyze and put the list in IDX_COMP_INFO.  It will then recursively call itself, loading itself into  the job_queue, v_Parallel (default 20) number of times, so that multiple indexes can be analyzed at once.  As one analyze finishes it puts the results (how many columns you should compress, and the expected compression ratio) in IDX_COMPE_INFO.  It'll load the next index and repeat until all the indexes have been analyzed.  If there's an error while analyzing an index (or for any other reason) it'll put the error in IDX_COMP_INFO_LOG.  You should expect errors like "no data found" in that table for indexes that are built on temp tables or empty tables...also indexes (or indexes with partitions) that are in an unusable state will give errors and put an entry into the log table.

IDX_COMP_INFO's last column will be -1 after this is executed for every row.  The idea is that you can populate that column later with the new size or amount of space savings you'll have...its just nice to keep that info in the same table, IMHO.  If there's an error for some reason while analyzing an index,  IDX_COMP_INFO's  CMPR_COUNT will be -1.  This makes it easy to write sql statements later against this table, because you can say "where cmpr_count > 0" and only look at the compression candidates.

Lastly, there's dynamic sql being used without bind variables, so its vulnerable to sql injection to an evil person that has an execute grant on after you're finished running this procedure on your cloned database, to be ultra safe, I would drop the procedure. 

My next post will be about what to do with the information gained from the procedure below....

  OWNER         VARCHAR2(30 BYTE)               NOT NULL,



 Ind_Owner_in  varchar2 default 'X',
 Ind_Name_in   varchar2 default 'X'
authid current_user
  type r_script is record (
    owner        varchar2(30),
    idx_name     varchar2(300));
  type t_Idx is table of r_script index by binary_integer;
  c_Idx t_Idx;

  v_Query1       varchar2(32000);
  v_Cmpr_Count   number;
  v_Cmpr_Pctsave number;
  v_Job          number;
  v_Count1       number;
  v_Count2       number;
  v_Owner        varchar2(30);
  v_Index_Name   varchar2(30);
  v_Errm         varchar2(255);
  v_Errtrack     number := 0;
  v_MB           number;
  v_MB_Saved     number;

-- *************************************
-- The line below may be adjusted for your platform.  I would make it something less than CPU_Count on your db.
-- *************************************

  v_Parallel number := 20;

Author: Andy Black
For more info see:

USAGE: create the tables (source available from the link above) and grants needed, then just exec get_idx_comp_info;

Req: You'll need the init.ora parameter job_queue_processes!=0, and it needs to be at least the value of v_Parameter above.

This procedure will submit to the job queue a seperate job for each index, but only v_Parameter (default 20) at a time.  (That way, if job_queue_processes is set to 100, you won't have 100 index rebuilds happening simultaneously.)

This was created to overcome the limitation of not being able to parallelize analyze index...validate structure commands, and no persistent way to trace index compression information found in index_stats (which is cleared out at the start of the next analyze in that session.)  This is many X faster than a serialized script to do the same task.

(as sys) grant execute on sys.dbms_lock to [username];
grant dba to [username];
grant analyze any to [username];

  if Ind_Owner_in='X' then
    execute immediate ('truncate table idx_comp_info_log');
    delete from idx_comp_info where cmpr_count=-1;

--  May need to modify the exclude list below   
    v_Query1 := 'select owner, index_name from dba_indexes
                   where index_type=''NORMAL''
                     and owner not in (''SYS'',''SYSTEM'',''OUTLN'',''DBSNMP'',''WMSYS'',''XDB'',''CTXSYS'')
                     and compression=''DISABLED''
                     and temporary!=''Y''';

    execute immediate v_Query1 bulk collect into c_Idx;
    for i in 1..c_Idx.last loop
        insert into idx_comp_info values (c_Idx(i).owner, c_Idx(i).idx_name, v_Cmpr_Count, v_Cmpr_Pctsave,0);
        when DUP_VAL_ON_INDEX then
          --  This will allow you to skip the indexes that you've already done, if you want to re-run the procedure
      if mod(i,1000)=0 then
      end if;
    end loop;      
    select count(0)
      into v_Count2
      from idx_comp_info
      where cmpr_count is null;
    while v_Count2!=0 loop
      execute immediate 'select count(0) from dba_jobs where what like ''%get_idx_comp%''' into v_Count1;
      if v_Count1<= v_Parallel then     
        select owner, index_name
          into v_Owner, v_Index_Name
          from idx_comp_info
          where cmpr_count is null
            and rownum=1
          for update;
        update idx_comp_info
          set cmpr_count=-1
          where owner=v_Owner
            and index_name=v_Index_Name;
        dbms_job.submit( job => v_Job,
               what => 'begin get_idx_comp_info('''||v_Owner||''','''||v_Index_Name||'''); end;',
               next_date =>sysdate-1);

      end if;
      select count(0) into v_Count2
        from idx_comp_info
        where cmpr_count is null;
    end loop;
    v_Errtrack := 1;
    execute immediate ('analyze index '||Ind_Owner_in||'.'||Ind_Name_in||' validate structure');
    v_Errtrack := 2;
    select opt_cmpr_count, opt_cmpr_pctsave into v_Cmpr_Count, v_Cmpr_PctSave from index_stats;
    v_Errtrack := 3;
    update idx_comp_info set cmpr_count=v_Cmpr_Count, cmpr_pctsave=v_Cmpr_PctSave, mb_saved=-1 where owner=Ind_Owner_in and index_name=Ind_Name_in;
    v_Errtrack := 4;
  end if; 
  when others then
    v_Errm := sqlerrm;
    insert into idx_comp_info_log values (Ind_Owner_in, Ind_Name_in, v_Errm, v_Errtrack);

Compressed Indexes (Part 1)
Compressed Indexes (Part 2)
Compressed Indexes (Part 3)

No comments:

Post a Comment