As I said in the previous posts...there are a lot of variables in play as we move to Exadata...the two I'm most concerned about is the effect of overhead due to OLTP compression and the effect of dropping many of our indexes.
I've never read anything from Oracle that says, "If you have Exadata you don't need indexes." I've heard it implied though, in the context of storage reduction. Data Cells are licensed by the spindle, so if you need more storage, it gets expensive. When you bring this up, the sales guys will talk about how you'll get great compression rates with HCC (which is true) and you'll reduce space because you can drop indexes. To no fault of Oracle's, some people extrapolate that idea and think, "I can drop all indexes...or most indexes."
Before we look at dropping indexes, let's start out with a baseline. I've installed the SOE schema for Swingbench...I'm hitting the scan listener of a High Capacity half rack of Exadata using 500 sessions. Some swingbech settings I've modified:
Number of Users: 500
Min Delay between transactions: 0
Max Delay between transactions: 1
Logon Delay: 10ms
Almost all defaults are being used for the Exadata machine. There would be huge gains in performance if this was tweaked...but for our purposes, this setup is fine.
When I first ran this test, the best I could do was about 310K, which is blistering fast...but then I learned we had a hardware failure that was preventing the use of one of our storage cells. The failed equipment was replaced and I re-ran the test.
I was able to get just over 400K transactions per minute (in red above)...that's very fast. For a frame of reference, our legacy system test was on an LPAR (IBM's VM) with 16 CPU's on fast 15k disks using an EMC Clarion CX4...definetaly no slouch, but it was 5 years old, and it was only able to produce ~63K TPM.
OCS' plan is to compress all tables (that aren't being replicated by Golden Gate) with either OLTP or some form of HCC compression and to drop all indexes that aren't either PK/UK indexes or indexes supporting FK's. The patch for Golden Gate that allows it to mine compressed tables still hasn't come out, so that's why those tables are being skipped. We may use Stream on some OLTP-compressed tables, since the 11.2.0.2 version does have the ability to mine archivelogs of compressed tables. Let's try some OLTP compression on Exadata. I altered the tables to be "compress for OLTP."
select 'alter table SOE.'||table_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move;' from dba_tables where owner='SOE' and partitioned='NO' union all select 'alter table SOE.'||table_name||' modify partition '||partition_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move partition '||partition_name||';' from dba_tab_partitions where table_owner='SOE';
...and then I rebuilt all the indexes.
The new baseline test results are nearly 450K trans/min! This is the fastest Swingbench score I've ever seen, and this is only on a half rack of relatively slow 7,200 RPM high capacity SAS storage! The other 3 high performance machines just arrived today...I can't wait to see what they can do! Granted, I expect there was serious caching in the db_cache...but this still says a lot about the compute nodes, since decompression can be done on either the compute or storage nodes and all compression is done on the compute nodes.
Next, (as is planned by our Oracle architects), I mark "unusable" all the indexes that aren't PK/UK's or supporting FK's. I'm leaving the function-based index usable though.
I restart Swingbench and begin the "index reduced" run...this is what I see:
All 96 threads on all servers are pegged at 100% user (in green above)...almost no CPU is used by system or IO waits. My response time is through the roof reaching 1266ms (red line above), and Transactions/Min has plummeted to less than 20K.
I ran ADDM/AWR reports and checked out the high waits during the "reduced index" test. What's happening is that, without indexes on the columns used in where clauses, we're forcing full table scans, but only 2% of the time is being spent actually doing the FTS.
This is exactly why I wanted to run these test. All over the internet you can find results of tests where somebody executes a single query to encounter a storage index with timing...things act differently when there's contention in an OLTP environment.
The high wait was "enq: KO - fast object checkpoint". One of the features of 11g is that the optimizer can choose to use direct path reads even if you don't specify the append hint or make the transaction parallel...this is called a serial direct read. Direct path reads are necessary to use storage indexes and HCC compression...and basicly to take advantage of all the special sauce Exadata offers. To begin the direct path read, a segment checkpoint is needed. The idea is that in a direct path read, we're going to pull the data straight from disk (or flash card) into memory in the PGA, which is very fast...you avoid all the SGA overhead. If there are dirty blocks in the SGA, we'd get incorrect results...so the dirty blocks are flushed to disk before we start our direct read.
As Tanel Poder put it, "You see this event because of object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is-how much of your session's response time is spent waiting for that event - and whether the winning in query speed due to direct read/parallel execution outweighs the time spent waiting for this checkpoint."
In a DW/DSS environment, this feature will accelerate performance. With fewer process to contend with, the chance of creating a long "fast object checkpoint" wait is reduced, and your IO transfer is faster, but we're implementing Exadata in an OLTP environment.
This isn't an Exadata-specific issue...its just the way Oracle works on a hot segment with direct path reads in 11g+. By dropping the indexes we're forcing the optimizer to do DW-style IO in an OLTP environment. We're handcuffing the Oracle optimizer and taking away its options. The answer for this as we migrate will be to replace the indexes that were removed...and basicly only drop indexes that aren't used very often.
The take-aways from this are:
1. Don't believe you can just "drop all indexes" or that Exadata doesn't need indexes...your workload, not the hardware dictates this. Exadata will work better than anything else out there if you don't have indexes and you're forced to do a FTS, but for an OLTP environment, indexes aren't optional.
2. OLTP compression on Exadata is extremely fast. OLTP compression performance has always been a tradeoff between the time saved in IO vs the extra time used consuming CPU. With the fast Nahalem EP's in the X2-2, compression actually improves performance.
I've never read anything from Oracle that says, "If you have Exadata you don't need indexes." I've heard it implied though, in the context of storage reduction. Data Cells are licensed by the spindle, so if you need more storage, it gets expensive. When you bring this up, the sales guys will talk about how you'll get great compression rates with HCC (which is true) and you'll reduce space because you can drop indexes. To no fault of Oracle's, some people extrapolate that idea and think, "I can drop all indexes...or most indexes."
Before we look at dropping indexes, let's start out with a baseline. I've installed the SOE schema for Swingbench...I'm hitting the scan listener of a High Capacity half rack of Exadata using 500 sessions. Some swingbech settings I've modified:
Number of Users: 500
Min Delay between transactions: 0
Max Delay between transactions: 1
Logon Delay: 10ms
Almost all defaults are being used for the Exadata machine. There would be huge gains in performance if this was tweaked...but for our purposes, this setup is fine.
When I first ran this test, the best I could do was about 310K, which is blistering fast...but then I learned we had a hardware failure that was preventing the use of one of our storage cells. The failed equipment was replaced and I re-ran the test.
I was able to get just over 400K transactions per minute (in red above)...that's very fast. For a frame of reference, our legacy system test was on an LPAR (IBM's VM) with 16 CPU's on fast 15k disks using an EMC Clarion CX4...definetaly no slouch, but it was 5 years old, and it was only able to produce ~63K TPM.
OCS' plan is to compress all tables (that aren't being replicated by Golden Gate) with either OLTP or some form of HCC compression and to drop all indexes that aren't either PK/UK indexes or indexes supporting FK's. The patch for Golden Gate that allows it to mine compressed tables still hasn't come out, so that's why those tables are being skipped. We may use Stream on some OLTP-compressed tables, since the 11.2.0.2 version does have the ability to mine archivelogs of compressed tables. Let's try some OLTP compression on Exadata. I altered the tables to be "compress for OLTP."
select 'alter table SOE.'||table_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move;' from dba_tables where owner='SOE' and partitioned='NO' union all select 'alter table SOE.'||table_name||' modify partition '||partition_name||' compress for OLTP;'||chr(13)||'alter table SOE.'||table_name||' move partition '||partition_name||';' from dba_tab_partitions where table_owner='SOE';
...and then I rebuilt all the indexes.
The new baseline test results are nearly 450K trans/min! This is the fastest Swingbench score I've ever seen, and this is only on a half rack of relatively slow 7,200 RPM high capacity SAS storage! The other 3 high performance machines just arrived today...I can't wait to see what they can do! Granted, I expect there was serious caching in the db_cache...but this still says a lot about the compute nodes, since decompression can be done on either the compute or storage nodes and all compression is done on the compute nodes.
Next, (as is planned by our Oracle architects), I mark "unusable" all the indexes that aren't PK/UK's or supporting FK's. I'm leaving the function-based index usable though.
I restart Swingbench and begin the "index reduced" run...this is what I see:
All 96 threads on all servers are pegged at 100% user (in green above)...almost no CPU is used by system or IO waits. My response time is through the roof reaching 1266ms (red line above), and Transactions/Min has plummeted to less than 20K.
I ran ADDM/AWR reports and checked out the high waits during the "reduced index" test. What's happening is that, without indexes on the columns used in where clauses, we're forcing full table scans, but only 2% of the time is being spent actually doing the FTS.
This is exactly why I wanted to run these test. All over the internet you can find results of tests where somebody executes a single query to encounter a storage index with timing...things act differently when there's contention in an OLTP environment.
The high wait was "enq: KO - fast object checkpoint". One of the features of 11g is that the optimizer can choose to use direct path reads even if you don't specify the append hint or make the transaction parallel...this is called a serial direct read. Direct path reads are necessary to use storage indexes and HCC compression...and basicly to take advantage of all the special sauce Exadata offers. To begin the direct path read, a segment checkpoint is needed. The idea is that in a direct path read, we're going to pull the data straight from disk (or flash card) into memory in the PGA, which is very fast...you avoid all the SGA overhead. If there are dirty blocks in the SGA, we'd get incorrect results...so the dirty blocks are flushed to disk before we start our direct read.
As Tanel Poder put it, "You see this event because of object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is-how much of your session's response time is spent waiting for that event - and whether the winning in query speed due to direct read/parallel execution outweighs the time spent waiting for this checkpoint."
In a DW/DSS environment, this feature will accelerate performance. With fewer process to contend with, the chance of creating a long "fast object checkpoint" wait is reduced, and your IO transfer is faster, but we're implementing Exadata in an OLTP environment.
This isn't an Exadata-specific issue...its just the way Oracle works on a hot segment with direct path reads in 11g+. By dropping the indexes we're forcing the optimizer to do DW-style IO in an OLTP environment. We're handcuffing the Oracle optimizer and taking away its options. The answer for this as we migrate will be to replace the indexes that were removed...and basicly only drop indexes that aren't used very often.
The take-aways from this are:
1. Don't believe you can just "drop all indexes" or that Exadata doesn't need indexes...your workload, not the hardware dictates this. Exadata will work better than anything else out there if you don't have indexes and you're forced to do a FTS, but for an OLTP environment, indexes aren't optional.
2. OLTP compression on Exadata is extremely fast. OLTP compression performance has always been a tradeoff between the time saved in IO vs the extra time used consuming CPU. With the fast Nahalem EP's in the X2-2, compression actually improves performance.