Monday, July 11, 2011

*FAST* Domain (and normal unq, non-unq) index rebuilds on Exadata

With any migration one of the key objectives is to minimize downtime.  Even in a MAA migration where you're expecting to move the data and keep it in sync via some form of CDC and finish with a nearly instant "flip-over", you still want the time between the start of the data move and the start of the CDC to be minimized. The longer it takes, the more storage is needed to keep track of the DML changes.  So, as we prepare for another round of database migrations to Exadata, one of my objectives has been to minimize the time it takes to get the data in a usable state on the Exadata target.

As part of the migration, to take advantage of the compression and performance features in Exadata, we've also made a lot of table partitioning changes.  We have several 2TB+ tables that use context indexes (aka domain indexes).  These are a little obscure, so if you don't know what these essence, they bring many advanced search features to large text objects...key among those features is they make clobs searchable.  They're completely unlike any other index...when you create a domain index, with a single statement you're creating many new objects...tables and indexes to support that index.  Its very messy, so its common to want to put those "supporting" objects in a different schema.

As you can imagine, to rebuild these multi-terrabyte indexes can take an extremely long time, so its avoided whenever possible.  Do to the migration's new partitioning schemes, it can't be avoided, and these indexes will need to be rebuilt.  The last time it was attempted, one of the indexes took over a week to rebuild.  The reason it took so long is the limitation Oracle has to parallelize the process.  If you have a domain index with 8 partitions and you issue a rebuild command specifying parallel 80, in v$process for the first few seconds you'll see 80 processes spawn off...then you'll see all but 8 of them die.  Oracle will only allow 1 process per partition during the domain index rebuild.  This makes the rebuild process on large domain indexes extremely slow.

To get around this issue, Oracle created a built-in package that will submit multiple jobs (one job per partition), and allow each of those to be parallelized.  Its called dbms_pclxutil.build_part_index.  The first time I used this I was extremely my experimentation instance I was able to recreate the index that had previously taken over a week in about 4 hours.

The problem was, in my sandbox I had created the table and the index in my schema, and I was the one issuing the rebuild so everything just worked.  During a later test migration when I submitted the same procedure that worked gave me errors.  There are multiple limitations with Oracle's procedure...the big one is that the table owner, index owner and index re-builder all have to be the same user.

That makes no sense, especially for domain indexes, which, like I said, commonly are created in a separate schema.  Is this a technical limitation?  No.  I can still do an "alter index rebuild partition" statement on an index not owned by me, and I can build a domain index on a table owned by somebody else.  I checked around the internet and found a possible reason << HERE >>.  Which is to say...there is no reason.

In the link I referenced, For starters, the said index has to be partitioned and unusable. Not a biggie, but why?"   I think the answer is...its not that its necessary, but if you aren't partitioned, why not just do a normal alter index rebuild statement?  It wouldn't hurt to not have the limitation...I'm just saying a procedure wouldn't really help you.  In all the situations I've needed to use this, it was a new index, and so creating it unusable was the fastest way...other than that I don't know what advantage the unusable limitation brings. 

I changed the procedure below to work for both partitioned and non-partitioned indexes.  Although it doesn't bring any performance benefits, with the ability to do both partitioned and non-partitioned, now you don't have to verify if an index is partitioned if you use this in a script.

Like Charles, I find the limitations pointless and oppressive, so...since this is something that's going to need to happen over and over for multiple domain indexes and multiple migrations...I created my own package that does the same thing without the limitations.  It will recursively call itself for each partition, submitting jobs to the job queue to perform the index rebuild.

I also added a few features...the big one is that you specify how many RAC nodes you want it to run on...and when the jobs are submitted, they're submitted round-robin to each node.  For one of the indexes, without this package I was only able to go parallel 9 (because I had 9 partitions on the domain index.)  With it, I went parallel 10 times 9 I had 90 processes across 4 Exadata compute nodes rebuilding this index.  This was on a 4 node high capacity Exadata machine and I was able to build an index that had previously taken over a week on an IBM P595 in only 4 hours using all 4 nodes.

There's nothing technically that specifies "domain index" in I guess it could be used for other large indexes too, although I've only tested it in the scenario I described.

I've now tested it with many different types of indexes, both domain and local without any issues.

See the usage info in the comment section in the package body under the "ind" procedure.

Disclaimer: I hope this helps you, but as always, the thoughts described here are my opinions...use at your own risk.  Like most personal relationships, the good thing about this package is potentially the worst thing about it.  It parallelizes and can consume all the resources on your RAC if you haven't already made limitations in your init.ora parameters for parallelism and job_queue_processes.

CREATE OR REPLACE package quickly_rebuild
authid current_user
PROCEDURE ind_part
  Ind_Owner_in     varchar2,
  Ind_Name_in      varchar2,
  Ind_Part_Name_in varchar2,
  Parallelism_in   number

  Ind_Owner_in        varchar2,
  Ind_Name_in         varchar2,
  Parallelism_in      number,
  Instance_Count_in   number default 1
  v_Last_Node number; --persistent variable to track the last node that was assigned a job, so you can do many jobs spread across nodes equally.
CREATE OR REPLACE package body quickly_rebuild
PROCEDURE ind_part
ind_owner_in IN varchar2,
ind_name_in IN varchar2,
ind_part_name_in IN varchar2,
parallelism_in IN number
) IS
cursor_name INTEGER;
v_Ind_Owner varchar2(30) := replace(ind_owner_in,';',':');
v_Ind_Name varchar2(30) := replace(ind_name_in,';',':');
v_Ind_Part_Name varchar2(30) := replace(ind_part_name_in,';',':');
v_Parallelism number := parallelism_in;

cursor_name := DBMS_SQL.OPEN_CURSOR;
if ind_part_name_in='NOT_PARTITIONED' then
DBMS_SQL.PARSE(cursor_name, 'alter index '||ind_owner_in||'.'||ind_name_in||' rebuild parallel '||parallelism_in, DBMS_SQL.NATIVE);
DBMS_SQL.PARSE(cursor_name, 'alter index '||ind_owner_in||'.'||ind_name_in||' rebuild partition '||ind_part_name_in||' parallel '||parallelism_in, DBMS_SQL.NATIVE);
end if;
ret := DBMS_SQL.EXECUTE(cursor_name);

Ind_Owner_in varchar2,
Ind_Name_in varchar2,
Parallelism_in number,
Instance_Count_in number default 1
type r_script is record (
line number,
text varchar2(1024));

type r_IP is record (
index_owner varchar2(30),
index_name varchar2(30),
partition_name varchar2(30)

type t_IP is table of r_IP index by binary_integer;
c_IP t_IP;
v_Job number;
v_Count number;

Ind_Not_Exist exception;
pragma exception_init(Ind_Not_Exist,-20001);

v_Statement1 varchar2(32767) := 'select count(0) from dba_ind_partitions where index_owner=:a and index_name=:b and status=''UNUSABLE''';
v_Statement2 varchar2(32767) := 'select index_owner, index_name, partition_name from dba_ind_partitions where index_owner=:a and index_name=:b and status=''UNUSABLE''';
v_Statement3 varchar2(32767) := 'select count(0) from dba_indexes where owner=:a and index_name=:b and status=''UNUSABLE''';


Author: Andy Black
For more info see:

USAGE: Create the index with the UNUSABLE clause and then issue:
exec quickly_rebuild.ind(index_owner, index_name, parallelism_per_partition, instance_count);

Req: The user submitting this work will need alter any index granted directly
You'll need the init.ora parameter job_queue_processes!=0
The index you're rebuilding must exist, must be partitioned and must be marked unusable
Don't forget to commit after executing this procedure

This procedure will submit to the job queue a seperate job for each partition of your index, and rebuild each one with the parallism you specified.
In RAC, when you specify instance_count_in, it will try to evenly distribute the job submissions across instances. So on a 5 node RAC an index with 10 partitions
and parallel 10 passed in will have 100 processes rebuilding it, 20 process, 2 partitions and 2 jobs running on each node.

This was created to overcome the limitation of a similar Oracle-supplied package that limits the index owner, the table owner and the index rebuilder to all be the same user.
This procedure doesn't have that limitation. I had a multi-TB domain index that took over a week to rebuild the normal way,
because the index rebuild parallelism is limited to the number of partitions of the domain index. I found this limitation unnecessary and impractical,
so I made this. Using this procedure it took about 4 hrs. There's nothing "domain index" specific about the rebuild statement,
so it will likely work for other indexes, although that hasn't been tested.

Any errors, such as ORA-29952 will be found in the alert log of the node that had the error.


execute immediate v_Statement1 into v_Count using Ind_Owner_in, Ind_Name_in;
if v_Count=0 then
dbms_output.put_line('Not partitioned or error...');
execute immediate v_Statement3 into v_Count using Ind_Owner_in, Ind_Name_in;
if v_Count=0 then
raise_application_error(-20001, 'That index does not exist or is not marked UNUSABLE.',FALSE);
else -- not a partitioned index
quickly_rebuild.v_Last_Node := nvl(quickly_rebuild.v_Last_Node,0)+1;
dbms_job.submit( job => v_Job,
what => 'begin quickly_rebuild.ind_part('''||upper(Ind_Owner_in)||''','''||upper(Ind_Name_in)||''',''NOT_PARTITIONED'','||parallelism_in||'); end;',
next_date =>sysdate-1,
interval =>null,
no_parse => false,
instance => mod(quickly_rebuild.v_Last_Node,Instance_Count_in)+1,
force =>true);
end if;

execute immediate v_Statement2 bulk collect into c_IP using Ind_Owner_in, Ind_Name_in;
for i in 1..c_IP.last loop
quickly_rebuild.v_Last_Node := nvl(quickly_rebuild.v_Last_Node,0)+1;
dbms_job.submit( job => v_Job,
what => 'begin quickly_rebuild.ind_part('''||c_IP(i).index_owner||''','''||c_IP(i).index_name||''','''||c_IP(i).partition_name||''','||parallelism_in||'); end;',
next_date =>sysdate-1,
interval =>null,
no_parse => false,
instance => mod(quickly_rebuild.v_Last_Node,Instance_Count_in)+1,
force =>true);

end loop;
end if;


  1. I like! You know... seems like it would be easy to extend this to other objects as well. I have long been looking for an enhancement to datapump's messed up way of single-threading dml, and this looks perfect (with a couple tweaks).

    Very well done. My only form of constructive criticism would be "Change the font!!" :) But that is merely a subjective, personal point.

  2. Thanks Charles. Sorry about the font...I reapplied the template, hopefully that'll help. Apparently, depending on how you got to the page, you'd get a different font! (Which is why I never knew there was a problem.)

    It isn't out yet, but Tanel Poder wrote a great chapter in his book, "Expert Oracle Exadata" (link on right). I couldn't wait, so I got an "alpha" e-look at his book from:

    He describes in detail the problem with datapump's issues, and suggests you instead do database links using append and parallel hints. You CAN do a db_link copy with datapump...but for a large transfer you really don't want to. I talked to Tanel about this and he said:

    "Yep it allows to avoid using intermediate dump files - but the problem is that impdp uses just a regular INSERT SELECT over dblink in this case (not APPEND). So, basically it's slow and can explode your undo tablespace (and DBWR has to work like hell when doing a lot of random block writes like this - massive amount of random writes doesn't go so well with Exadata...)"