There are 10,000 blog posts and oracle docs on the internets (thank you, Mr Bush) for improving datapump performance. This is one of the features used very frequently in Oracle shops around the world. And DBA's are under huge stress to meet impossible downtime SLA's for their export/import. I think they all miss the best tweak (ADOP)...but they basically summarize to one simple fact...outside of parallelism, if you have a well-tuned database on fast storage, there's not a lot more you can do to improve performance more than a few percentage. The obvious improvements are:
1. Parallelize! To quote Vizzini, "I do not think it means what you think it means."
This will create multiple processes and each will take one object and work with it, each one serialized (usually). This is great, and if all your objects are equally sized, this is perfect...but that's not typically reality. Usually you have a few objects that are much bigger than the rest and each of them by default will only get a single process. This limit really hurts during imports, when you need to rebuild large indexes...more on this later.
Check that its working as expected by hitting control-c and typing in status. Ideally, you should see all parallel slaves working. Check not just that they exist, but that they're working (verify you used %U in your dump filename if they aren't.) ie: DUMPFILE=exaexp%U.dmp PARALLEL=100
2. If you're importing into a new database, you have the flexibility to make some temporary changes to tweak things. Verify Disk Asynchronous IO is set to true (DISK_ASYNCH_IO=true) and disable all the block verification (DB_BLOCK_CHECK=FALSE, DB_BLOCK_CHECKSUM=FALSE) These aren't "game changers" but they'll give you 10-20% improvements, depending on how you had them set previously.
3. Memory Settings - Datapump parallelization uses some of the streams API's, and so the streams pool is used. Make sure you have enough memory for the shared_pool, streams_pool and the db_cache_size parameters. Consult your gv$streams_pool_advice, gv$shared_pool_advice, gv$db_cache_advice and gv$sga_target_advice views. I like to tune it so as the delta in ESTD_DB_TIME_FACTOR from one row to the row below it approaches zero, the corresponding size of the pool is close to 1. (Any more than that is a waste, any less than that is lost performance.)
Sometimes you'll see a huge dropoff and its more clear than this example...but you get the idea. If you're importing into a new database, you'll need to run the import dry run and then check these views to make sure this is tuned well.
4. Something often missed when importing into a new database, size your redo logs to be relatively huge. The redo logs will work like a cache and cycle around. Eventually if you're adding data extremely fast, the last log will fill and can't switch until the next log is cleared. "Huge" is relative to the size, speed of your database and hardware. While you're running your import, select * from v$log and make sure you see at least one "inactive" logs in front of the current log.
The best, virtually unused tweak is in the next post....
This post: The last secret tweak for improving datapump performance (Part 1)
Next post: The last secret tweak for improving datapump performance (Part 2)
1. Parallelize! To quote Vizzini, "I do not think it means what you think it means."
This will create multiple processes and each will take one object and work with it, each one serialized (usually). This is great, and if all your objects are equally sized, this is perfect...but that's not typically reality. Usually you have a few objects that are much bigger than the rest and each of them by default will only get a single process. This limit really hurts during imports, when you need to rebuild large indexes...more on this later.
Check that its working as expected by hitting control-c and typing in status. Ideally, you should see all parallel slaves working. Check not just that they exist, but that they're working (verify you used %U in your dump filename if they aren't.) ie: DUMPFILE=exaexp%U.dmp PARALLEL=100
2. If you're importing into a new database, you have the flexibility to make some temporary changes to tweak things. Verify Disk Asynchronous IO is set to true (DISK_ASYNCH_IO=true) and disable all the block verification (DB_BLOCK_CHECK=FALSE, DB_BLOCK_CHECKSUM=FALSE) These aren't "game changers" but they'll give you 10-20% improvements, depending on how you had them set previously.
3. Memory Settings - Datapump parallelization uses some of the streams API's, and so the streams pool is used. Make sure you have enough memory for the shared_pool, streams_pool and the db_cache_size parameters. Consult your gv$streams_pool_advice, gv$shared_pool_advice, gv$db_cache_advice and gv$sga_target_advice views. I like to tune it so as the delta in ESTD_DB_TIME_FACTOR from one row to the row below it approaches zero, the corresponding size of the pool is close to 1. (Any more than that is a waste, any less than that is lost performance.)
Sometimes you'll see a huge dropoff and its more clear than this example...but you get the idea. If you're importing into a new database, you'll need to run the import dry run and then check these views to make sure this is tuned well.
SGA_SIZE | SGA_SIZE_FACTOR | ESTD_DB_TIME | ESTD_DB_TIME_FACTOR | Time Factor Diff |
158720 | 0.5 | 12162648 | 1.2045 | |
178560 | 0.5625 | 11421479 | 1.1311 | 0.0734 |
198400 | 0.625 | 10937800 | 1.0832 | 0.0479 |
218240 | 0.6875 | 10607608 | 1.0505 | 0.0327 |
238080 | 0.75 | 10604578 | 1.0502 | 0.0003 |
257920 | 0.8125 | 10375361 | 1.0275 | 0.0227 |
277760 | 0.875 | 10222886 | 1.0124 | 0.0151 |
297600 | 0.9375 | 10101716 | 1.0004 | 0.012 |
317440 | 1 | 10097674 | 1 | 0.0004 |
337280 | 1.0625 | 10017905 | 0.9921 | 0.0079 |
357120 | 1.125 | 9955300 | 0.9859 | 0.0062 |
376960 | 1.1875 | 9908850 | 0.9813 | 0.0046 |
396800 | 1.25 | 9905821 | 0.981 | 0.0003 |
416640 | 1.3125 | 9870479 | 0.9775 | 0.0035 |
436480 | 1.375 | 9844225 | 0.9749 | 0.0026 |
456320 | 1.4375 | 9826049 | 0.9731 | 0.0018 |
476160 | 1.5 | 9821001 | 0.9726 | 0.0005 |
4. Something often missed when importing into a new database, size your redo logs to be relatively huge. The redo logs will work like a cache and cycle around. Eventually if you're adding data extremely fast, the last log will fill and can't switch until the next log is cleared. "Huge" is relative to the size, speed of your database and hardware. While you're running your import, select * from v$log and make sure you see at least one "inactive" logs in front of the current log.
The best, virtually unused tweak is in the next post....
This post: The last secret tweak for improving datapump performance (Part 1)
Next post: The last secret tweak for improving datapump performance (Part 2)
No comments:
Post a Comment