Monday, August 15, 2011

Equi-sized Partition Cuts

There are a lot of resources to show you how to make a partitioned table...syntax search results are plentiful...but its unusual to find a resource that tells you about the process to partition a table.

Some rows are wider than others, and a lot of times your partition key doesn't have gaps...so this may not apply to you.  When partitioning my approach has been to:

1. Identify the access patterns
    You can do this by talking to the application development teams and selecting from hist_sql_plan (see below).  That will give you a good idea about some of the sql that's hitting the table in question.  Is it direct-io reads or conventional dml?  Are they simple queries, where response time is likely important, or are they more complex, taking several minutes to complete?  Look at the operation column.

select * from dba_hist_sql_plan where sql_id in (select sql_id from DBA_HIST_SQL_PLAN where object_name='TABLE_NAME') order by sql_id, id;

2. From #1, what columns do you often see in the where clauses?  This/these columns may be good partition key column candidates.

3. From #1, what's the best compression type you can use without negatively affecting application performance?  (long runtime queries could likely have more aggressive compression on their tables, short queries will have to be C4OLTP or no compression)

4.  If possible, use interval partitioning instead of range partitioning.  There are a lot of things that will keep you from using interval partitioning  (like domain indexes), but it'll save you time in management if you can use it.


5.  There's a balance between adding lots of partitioning for performance reasons and having too many partitions to be easily managed.  As a rule of thumb, I shoot for ~20, depending on table size, but never let a partition be smaller than 2% of the db_cache size (see earlier post re:smallness logic).  This is hard with ASMM/AMM, since the db_cache size is dynamic...but even if you use ASMM and AMM you should set minimum sizes in the memory pools.  Its a good practice to make the partitions as equally sized as possible...if that means equal number of rows or equally size segments, that's for you to decide...it depends on your situation.  Also consider the partitioned tables commonly joined with this table...if your queries pull back many rows, partition-wise joins will make you want to use the same partitioning column if possible.


We're partitioning many, many tables and we need a process to do this as quickly as possible.  Some of the tables are multi-terrabye, and it was taking me a really long time to come up with the values for the ranges.  So, I changed my method and just created a query to do it.  There might be a better way...but this is performing very well, even on multi-billion row tables this returns in around a minute.  If you have a better method, I'd love to hear about it.

The idea here is that there are repeated values and/or gaps in the sequence column you want to make your partitioning key...so you can't just take the sequence number.  The first partition has to be the row that's 5% in...regardless of what the value of the partitioned key is.  This query will give you the value of the partitoning column key that's about 1/20th of the rows in the table...the 2nd will give you the one that's around 2/20th's...etc.  Change the table_name to be your table, and seq_id to be your partition key.


select part, max(seq_id), max(rownumber) from (
select trunc(rownumber,2) part, tm2.* from (
select /*+ PARALLEL 96 */ seq_id, CUME_DIST() OVER (PARTITION BY 'X' ORDER BY seq_id) AS RowNumber
from table_name tn
) tm2
where trunc(rownumber,2) in (0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,
0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1))
group by part
order by 1;

...when its done, it'll give you 3 columns.  The first one is your "goal" (.6 is the one ~60% into the row count), the second is the partition key value at that point, the 3rd is how close to your goal you came (which depends on the data cardinality...how unique the column values are.)  Use the 2nd column as the value for your partitioned table range. IE:

0.05    20      0.06
0.1      36      0.11
0.15    52      0.16
0.2      67      0.21
0.25    82      0.26
0.3      95      0.31
0.35    124    0.36
0.4      151    0.41
0.45    169    0.46
0.5      186    0.51
0.55    200    0.56
0.6      217    0.61
0.65    235    0.66
0.7      253    0.71
0.75    296    0.76
0.8      311    0.81
0.85    347    0.86
0.9      381    0.91
0.95    415    0.96
1         445    1

PARTITION BY RANGE (SEQ_ID)

  PARTITION P_1 VALUES LESS THAN (20),
  PARTITION P_2 VALUES LESS THAN (36),
  PARTITION P_3 VALUES LESS THAN (52),
  PARTITION P_4 VALUES LESS THAN (67),
  PARTITION P_5 VALUES LESS THAN (82),
  PARTITION P_6 VALUES LESS THAN (95),
  PARTITION P_7 VALUES LESS THAN (124),
  PARTITION P_8 VALUES LESS THAN (151),
  PARTITION P_9 VALUES LESS THAN (169),
  PARTITION P_10 VALUES LESS THAN (186),
  PARTITION P_11 VALUES LESS THAN (200),
  PARTITION P_12 VALUES LESS THAN (217),
  PARTITION P_13 VALUES LESS THAN (235),
  PARTITION P_14 VALUES LESS THAN (253),
  PARTITION P_15 VALUES LESS THAN (296),
  PARTITION P_16 VALUES LESS THAN (311),
  PARTITION P_17 VALUES LESS THAN (347),
  PARTITION P_18 VALUES LESS THAN (381),
  PARTITION P_19 VALUES LESS THAN (415),
  PARTITION P_20 VALUES LESS THAN (445),
  PARTITION P_MAX VALUES LESS THAN (MAXVALUE)
)

I hope this helps you, or at least saves you some time determining your table partition key and the best range of values for that key.

No comments:

Post a Comment