Wednesday, July 13, 2011

Be careful when you stop using a v11 Goldengate extract

Just wanted to give a heads up on an issue that I caught, that would have taken down our Exadata environment.  In the old version of Goldengate, you create extracts and you have .prm files that have their parameter information.  When you no longer want to use that extract, you can be lazy and just leave it there not running, or you can delete the prm file and drop the extract.

When you do this...verify that the number of extracts you have running in GGSCI is equal to the number of entries you see in dba_capture.  Make sure the old entries are gone.  There's an enhancement in GG 11 that will prevent rman from deleting files that haven't been mined.  What actually happens is rman checks min_required_capture_change# in v$database, which is populated every 6 hrs based on dba_capture.  So, if you stop using an extract, but its entry is still in dba_capture, rman won't delete the archivelogs with the usual command:

backup archvielog all delete input;


Instead you'll get RMAN-08137 and you may see this in your backup log:


RMAN-08120: WARNING: archived log not deleted, not yet applied by standby

From note 1079953.1 you can force it to delete them:
delete noprompt force archivelog all completed before 'sysdate-10/1440';

...but in our situation we're using a standby database.  The enhancement to not delete logs that you still need is great...I don't want them deleted before I can have GG mine them and data guard apply them.  So...forcing the delete seems...rash to me.

I've been told be a consultant from Oracle (formerly a consultant from GG) that if you connect to the db and then delete the extract, it will also remove the streams/dba_capture entry from the database...if you don't connect to the db when you drop the entry, it'll stay in the db until you manually unregister it.  Also, after you create the extract, when you start it, it verifies there's an entry in dba_capture for this process...if there isn't, it creates one.  He said there's very little documentation on this...just a mention in the release notes.  This change is a pretty big deal, given that if you aren't aware of it, and don't notice the errors in your backup logs, it could stop preprocessing in your database.  What would be *much* better, is if...instead of just verifying your extract exists, that it did something like a sync, and made the list in dba_capture match the list in GGSCI.  It would also be nice if they changed a "drop extract" command to require a db login to minimize the issue.

Regardless if you choose to go with the MOS work around or unregister the extra streams entries in dba_capture, be aware if you don't do something, your FRA or archivelog destination will eventually fill, which will halt your database.  

6 comments:

  1. Andy, this article saved our warehouse db during a crisis! We had goldengate extracts that were mysteriously slowing down and hanging.

    To remove the entries we:

    info detail
    -- pull all of the relevant thread/rba/seq info
    dblogin
    delete
    add ...
    add exttrail...
    alter , extseqno, extrba, thread 1
    ...
    alter , extseqno, extrba, thread N

    start

    BAM! Problem resolved.

    Thanks again!

    ReplyDelete
  2. That's great...I'm glad it helped you. I knew if I almost had the issue...other people would run into it too.

    ReplyDelete
  3. Andy, thanks so much for this, you just saved our bacon. :) We were convinced this must be a problem with our logical standbys until we found your article and realized it was GoldenGate holding up the archive log deletes, due to a stray extract that wasn't in use.

    I wanted to verify - yes, if you don't do the dblogin first, it does leave the extract's entry in db_capture. I just had this happen. The good news is you can easily fix it by recreating the extract with the same name. Do the dblogin first to be sure. You will get this warning during recreate:

    2011-12-08 11:50:25 WARNING OGG-01758 This EXTRACT EOHCMQ01 is already registered with the database.
    EXTRACT added.

    Once that's done, then you can do the delete again and this time it will actually remove the entry from db_capture (make sure you're still in the session with dblogin or do it again).

    ReplyDelete
  4. fyi - we had the same problem but in a restored non-prod copy of the database. Even after installing goldengate and trying to use ggsci to remove the extract, it wouldn't budge from dba_capture. So I put in an oracle SR. They came back with this oracle article (1351352.1
    https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=HOWTO&
    id=1351352.1) and this command to remove things manually from dba_capture - DBMS_CAPTURE_ADM.DROP_CAPTURE:

    select 'exec DBMS_CAPTURE_ADM.DROP_CAPTURE ('''||capture_name||''');' from dba_capture;

    then run whichever bits of the output need to be removed.

    ReplyDelete
  5. Hmmm...that's an interesting scenario that could happen often too. This is a great feature...but it seems like it wasn't very well thought out. In a streams or standby environment, (what this feature was originally intended for) you'd already be logged in if you went to drop the capture process. In Goldengate, you may or may not be logged in to the database when you drop the capture process...you likely wouldn't be since it isn't required. Since Oracle can't know if your capture process is due to Streams, Active Database or a standby database...all of which could put entries in this table, the only answer would be to force people to do a dblogin before they drop the capture process in ggsci in a future version. Its probably a good habit for GG users to do now. Anyway, thanks for letting us know about your experience!

    ReplyDelete
  6. Good Post. I have written some details on the exact topic on my blog.


    http://gauthamc.blogspot.com/2011/12/golden-gate-rman-interaction-causing.html

    Let me know if that is useful?
    Gautham

    ReplyDelete