Monday, February 21, 2011

Exadata feature performance overhead

We're getting ready to migrate 88+ large non-RAC OLTP databases from an IBM P5 595 into a single RAC database running on 3 High Performance Exadata machines and 1 High Capacity machine. We're having Oracle Consulting Services do the actual architectural design and migration work and I'm here to keep them on their toes and make sure the recommendations work from a holistic perspective on the overall system. There are a lot of changes that will be made during this consolidating migration...even one variable can have a negative effect that could impact the entire system. Before I get into the variables I see as having the largest impact (and before I test them) let's talk about the hardware a bit.

Exadata is much more than a "database-in-a-box." Its a set of compute nodes (think RAC node servers) combined with ultra-fast infiniband (...and 10GB ethernet, and multiple 1GB ethernet) and storage nodes. Storage nodes are basicly "helper" servers with local storage. The database can send requests to the storage node (that it sees like an ASM disk), the node will be able to handle the request and just send the results (offloading the workload) or it'll handle it like a traditional ASM disk would handle it...returning the blocks back to the compute node. New concepts in storage nodes include storage indexes (dynamicly created indexes stored on flash storage that may/may not be used to retrieve your data), compression/encryption offloading (cpu in data cells decrypt and uncompress the data, if that's in your best interest. Sometimes Oracle will choose to send the compressed blocks to the compute node instead.) When doing incremental backups, the storage cells will only send the changed blocks to the compute nodes to accelerate performance.  There's a misconception out there that RMAN backups can be completely offloaded to your data cells...this (mostly) isn't true...what Oracle means by that is that the CPU interrupts when accessing the spindles are handled by the data cell...all the rest of the work is handled by your compute node.  From a db that's on a server with local storage, this is a big improvement...for a db that uses a SAN...its not really an improvement.

Storage indexes are a unique feature that allows us to drop indexes and perform full table scans very fast. Exadata will create storage indexes dynamicly, based on the first query to hit the tables in question. The first time you access the table, you perform a full table scan. The storage cells will then build a storage index on your table. The second time you query that table, the storage cell will have the option to be able to access the table via the storage index. There are a lot of great blogs re:storage indexes out there, so I won't rehash the topic again, except to say, in theory, you can do a full table scan and have your results returned almost as fast as if you had done an index range scan...and depending on the amount of data you're returning and how its layed out (clustering factor), maybe even faster.

Other great Exadata features include multiple types and levels of compression (HCC/OLTP) that are handled in some situations on the data cells themselves. The marketing guys at Oracle will say you can take your database "as-is" and drop it into Exadata. Some people will take it to the extreme and claim you don't need indexes at all...which is a misunderstanding of the technology. Oracle's Advanced Consulting Services is planning to drop all indexes except those supporting FK's and PK's and compress almost everything else via OLTP or HCC compression. They'll then compare performance with the legacy system (via RAT) and put indexes back where they're needed. What will be the effect on CPU/IO...and ultimately performance in an OLTP environment?

Again, lots of people on lots of blogs have discussed the features. What I haven't seen blogged about is the overhead of these activities. Yes, dropping indexes and having quick restores may be possible, what what happens on a macro level to the system? Poder and Claussen have talked about storage indexes and their effect on a query...and for a DSS system that's all that matters...but in an OTLP environment you'll have many statement happening simultaneously by hundreds of users. What's the overhead of storage indexes then? What's the overhead of OLTP compression on Exadata in an OLTP scenario?

Let's see if we can get a baseline of what the planned implementation of these features will do. I'll install Swingbench with a 10GB database and run a series of tests on a half rack (4 compute node, 7 data cell) high capacity Exadata machine. Swingbench is an excellent graphical tool that provides performance information and simulates OLTP-type load. Please don't think these tests are something you can deturmine Exadata performance against...there are a lot of limiting factors outside of Exadata in play here (like the network speed between swingbench and Exadata, etc...) measure these tests against each other and the variables I introduce.

No comments:

Post a Comment