As I said in the previous posts...there are a lot of variables in play as we move to Exadata...the two I'm most concerned about is the effect of overhead due to OLTP compression and the effect of dropping many of our indexes.
I've never read anything from Oracle that says, "If you have Exadata you don't need indexes." I've heard it implied though, in the context of storage reduction. Data Cells are licensed by the spindle, so if you need more storage, it gets expensive. When you bring this up, the sales guys will talk about how you'll get great compression rates with HCC (which is true) and you'll reduce space because you can drop indexes. To no fault of Oracle's, some people extrapolate that idea and think, "I can drop all indexes...or most indexes."
Before we look at dropping indexes, let's start out with a baseline. I've installed the SOE schema for Swingbench...I'm hitting the scan listener of a High Capacity half rack of Exadata using 500 sessions. Some swingbech settings I've modified:
Number of Users: 500
Min Delay between transactions: 0
Max Delay between transactions: 1
Logon Delay: 10ms
Almost all defaults are being used for the Exadata machine. There would be huge gains in performance if this was tweaked...but for our purposes, this setup is fine.
When I first ran this test, the best I could do was about 310K, which is blistering fast...but then I learned we had a hardware failure that was preventing the use of one of our storage cells. The failed equipment was replaced and I re-ran the test.
I was able to get just over 400K transactions per minute (in red above)...that's very fast. For a frame of reference, our legacy system test was on an LPAR (IBM's VM) with 16 CPU's on fast 15k disks using an EMC Clarion CX4...definetaly no slouch, but it was 5 years old, and it was only able to produce ~63K TPM.
OCS' plan is to compress all tables (that aren't being replicated by Golden Gate) with either OLTP or some form of HCC compression and to drop all indexes that aren't either PK/UK indexes or indexes supporting FK's. The patch for Golden Gate that allows it to mine compressed tables still hasn't come out, so that's why those tables are being skipped. We may use Stream on some OLTP-compressed tables, since the 11.2.0.2 version does have the ability to mine archivelogs of compressed tables. Let's try some OLTP compression on Exadata. I altered the tables to be "compress for OLTP."
select 'alter table SOE.'||table_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move;' from dba_tables where owner='SOE' and partitioned='NO' union all select 'alter table SOE.'||table_name||' modify partition '||partition_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move partition '||partition_name||';' from dba_tab_partitions where table_owner='SOE';
...and then I rebuilt all the indexes.
The new baseline test results are nearly 450K trans/min! This is the fastest Swingbench score I've ever seen, and this is only on a half rack of relatively slow 7,200 RPM high capacity SAS storage! The other 3 high performance machines just arrived today...I can't wait to see what they can do! Granted, I expect there was serious caching in the db_cache...but this still says a lot about the compute nodes, since decompression can be done on either the compute or storage nodes and all compression is done on the compute nodes.
Next, (as is planned by our Oracle architects), I mark "unusable" all the indexes that aren't PK/UK's or supporting FK's. I'm leaving the function-based index usable though.
I restart Swingbench and begin the "index reduced" run...this is what I see:
All 96 threads on all servers are pegged at 100% user (in green above)...almost no CPU is used by system or IO waits. My response time is through the roof reaching 1266ms (red line above), and Transactions/Min has plummeted to less than 20K.
I ran ADDM/AWR reports and checked out the high waits during the "reduced index" test. What's happening is that, without indexes on the columns used in where clauses, we're forcing full table scans, but only 2% of the time is being spent actually doing the FTS.
This is exactly why I wanted to run these test. All over the internet you can find results of tests where somebody executes a single query to encounter a storage index with timing...things act differently when there's contention in an OLTP environment.
The high wait was "enq: KO - fast object checkpoint". One of the features of 11g is that the optimizer can choose to use direct path reads even if you don't specify the append hint or make the transaction parallel...this is called a serial direct read. Direct path reads are necessary to use storage indexes and HCC compression...and basicly to take advantage of all the special sauce Exadata offers. To begin the direct path read, a segment checkpoint is needed. The idea is that in a direct path read, we're going to pull the data straight from disk (or flash card) into memory in the PGA, which is very fast...you avoid all the SGA overhead. If there are dirty blocks in the SGA, we'd get incorrect results...so the dirty blocks are flushed to disk before we start our direct read.
As Tanel Poder put it, "You see this event because of object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is-how much of your session's response time is spent waiting for that event - and whether the winning in query speed due to direct read/parallel execution outweighs the time spent waiting for this checkpoint."
In a DW/DSS environment, this feature will accelerate performance. With fewer process to contend with, the chance of creating a long "fast object checkpoint" wait is reduced, and your IO transfer is faster, but we're implementing Exadata in an OLTP environment.
This isn't an Exadata-specific issue...its just the way Oracle works on a hot segment with direct path reads in 11g+. By dropping the indexes we're forcing the optimizer to do DW-style IO in an OLTP environment. We're handcuffing the Oracle optimizer and taking away its options. The answer for this as we migrate will be to replace the indexes that were removed...and basicly only drop indexes that aren't used very often.
The take-aways from this are:
1. Don't believe you can just "drop all indexes" or that Exadata doesn't need indexes...your workload, not the hardware dictates this. Exadata will work better than anything else out there if you don't have indexes and you're forced to do a FTS, but for an OLTP environment, indexes aren't optional.
2. OLTP compression on Exadata is extremely fast. OLTP compression performance has always been a tradeoff between the time saved in IO vs the extra time used consuming CPU. With the fast Nahalem EP's in the X2-2, compression actually improves performance.
I've never read anything from Oracle that says, "If you have Exadata you don't need indexes." I've heard it implied though, in the context of storage reduction. Data Cells are licensed by the spindle, so if you need more storage, it gets expensive. When you bring this up, the sales guys will talk about how you'll get great compression rates with HCC (which is true) and you'll reduce space because you can drop indexes. To no fault of Oracle's, some people extrapolate that idea and think, "I can drop all indexes...or most indexes."
Before we look at dropping indexes, let's start out with a baseline. I've installed the SOE schema for Swingbench...I'm hitting the scan listener of a High Capacity half rack of Exadata using 500 sessions. Some swingbech settings I've modified:
Number of Users: 500
Min Delay between transactions: 0
Max Delay between transactions: 1
Logon Delay: 10ms
Almost all defaults are being used for the Exadata machine. There would be huge gains in performance if this was tweaked...but for our purposes, this setup is fine.
When I first ran this test, the best I could do was about 310K, which is blistering fast...but then I learned we had a hardware failure that was preventing the use of one of our storage cells. The failed equipment was replaced and I re-ran the test.
I was able to get just over 400K transactions per minute (in red above)...that's very fast. For a frame of reference, our legacy system test was on an LPAR (IBM's VM) with 16 CPU's on fast 15k disks using an EMC Clarion CX4...definetaly no slouch, but it was 5 years old, and it was only able to produce ~63K TPM.
OCS' plan is to compress all tables (that aren't being replicated by Golden Gate) with either OLTP or some form of HCC compression and to drop all indexes that aren't either PK/UK indexes or indexes supporting FK's. The patch for Golden Gate that allows it to mine compressed tables still hasn't come out, so that's why those tables are being skipped. We may use Stream on some OLTP-compressed tables, since the 11.2.0.2 version does have the ability to mine archivelogs of compressed tables. Let's try some OLTP compression on Exadata. I altered the tables to be "compress for OLTP."
select 'alter table SOE.'||table_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move;' from dba_tables where owner='SOE' and partitioned='NO' union all select 'alter table SOE.'||table_name||' modify partition '||partition_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move partition '||partition_name||';' from dba_tab_partitions where table_owner='SOE';
...and then I rebuilt all the indexes.
The new baseline test results are nearly 450K trans/min! This is the fastest Swingbench score I've ever seen, and this is only on a half rack of relatively slow 7,200 RPM high capacity SAS storage! The other 3 high performance machines just arrived today...I can't wait to see what they can do! Granted, I expect there was serious caching in the db_cache...but this still says a lot about the compute nodes, since decompression can be done on either the compute or storage nodes and all compression is done on the compute nodes.
Next, (as is planned by our Oracle architects), I mark "unusable" all the indexes that aren't PK/UK's or supporting FK's. I'm leaving the function-based index usable though.
I restart Swingbench and begin the "index reduced" run...this is what I see:
All 96 threads on all servers are pegged at 100% user (in green above)...almost no CPU is used by system or IO waits. My response time is through the roof reaching 1266ms (red line above), and Transactions/Min has plummeted to less than 20K.
I ran ADDM/AWR reports and checked out the high waits during the "reduced index" test. What's happening is that, without indexes on the columns used in where clauses, we're forcing full table scans, but only 2% of the time is being spent actually doing the FTS.
This is exactly why I wanted to run these test. All over the internet you can find results of tests where somebody executes a single query to encounter a storage index with timing...things act differently when there's contention in an OLTP environment.
The high wait was "enq: KO - fast object checkpoint". One of the features of 11g is that the optimizer can choose to use direct path reads even if you don't specify the append hint or make the transaction parallel...this is called a serial direct read. Direct path reads are necessary to use storage indexes and HCC compression...and basicly to take advantage of all the special sauce Exadata offers. To begin the direct path read, a segment checkpoint is needed. The idea is that in a direct path read, we're going to pull the data straight from disk (or flash card) into memory in the PGA, which is very fast...you avoid all the SGA overhead. If there are dirty blocks in the SGA, we'd get incorrect results...so the dirty blocks are flushed to disk before we start our direct read.
As Tanel Poder put it, "You see this event because of object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is-how much of your session's response time is spent waiting for that event - and whether the winning in query speed due to direct read/parallel execution outweighs the time spent waiting for this checkpoint."
In a DW/DSS environment, this feature will accelerate performance. With fewer process to contend with, the chance of creating a long "fast object checkpoint" wait is reduced, and your IO transfer is faster, but we're implementing Exadata in an OLTP environment.
This isn't an Exadata-specific issue...its just the way Oracle works on a hot segment with direct path reads in 11g+. By dropping the indexes we're forcing the optimizer to do DW-style IO in an OLTP environment. We're handcuffing the Oracle optimizer and taking away its options. The answer for this as we migrate will be to replace the indexes that were removed...and basicly only drop indexes that aren't used very often.
The take-aways from this are:
1. Don't believe you can just "drop all indexes" or that Exadata doesn't need indexes...your workload, not the hardware dictates this. Exadata will work better than anything else out there if you don't have indexes and you're forced to do a FTS, but for an OLTP environment, indexes aren't optional.
2. OLTP compression on Exadata is extremely fast. OLTP compression performance has always been a tradeoff between the time saved in IO vs the extra time used consuming CPU. With the fast Nahalem EP's in the X2-2, compression actually improves performance.
Hi Andy,
ReplyDeleteI liked your posting about Indexes on Exadata OLTP Databases - it is indeed not necessarily so that you will be able to remove a lot of indexes after migration of an OLTP Database to Exadata. Some will not be needed anymore, though.
In order to determine whether you need an index still or not, a better way than making them unusable is making them INVISIBLE (new 11g feature).
Instead of having to rebuild them if you find out they are still needed, you can simply make them visible again.
Thank you, sir...I'm glad you liked it. I completely agree with what you said, and usually that's what I do...it depends on the situation. If you want to see the effect on your sql explain plans and performance of not having the index in question...and you're just concerned about select statements, then what you said is exactly correct. The index is invisible to the CBO, so it won't be used in any plans. If its a large index, rebuilding it can take a while and can create locking issues...so the back-out plan of making an index visible again is much better than rebuilding the index.
ReplyDeleteIn some situations, there are DML benefits to not having the index or marking them unusable...if you mark the index invisible, Oracle still has all the overhead of maintaining the index. It really depends on the workload and usage patterns if this is a big deal or not to you.
The situation I was writing about in this post was a dry run of a migration to Exadata from last year. Looking back, it wasn't a very productive exercise because nearly all the original indexes were needed. Since my client had sized their Exadata environment with the expectation of dropping a large percentage (over 50%) of their indexes, they're having issues with storage today.
Oracle announced around Openworld that its selling high capacity storage cells to allow you to add storage to Exadata. I've heard through the exagrapevine that they're going to start selling high performance storage cells soon too, in response to their customers requests. I think this is great...without the ability to buy storage cells independently, if forces you to buy a quarter rack of Exadata, with compute nodes you may not need.
I've heard that conversation before...it went something like this:
"Yeah, sooooo, although we're only using about 15% of the overall CPU on the nodes...we need about a half mil to buy more cpu and more nodes. As soon as we buy them we'll turn them off because we really just need the storage they come with."
*awkward silence*
"No."
Good test. Good job. Good results.
ReplyDeleteYou saved my time because I'm convinced as you that "indexes aren't optional" !!
Thanks a lot