ORA-02050 Remote DB in Doubt? How About Gone for Years??

Interesting Issue… I was working on an interrupted distributed, (remote) transaction that I’d come across in one of our smaller reporting systems and attempting to rerun since 2008 against a decommissioned database, (yes, I said 2008…) This is a very “sublime” issue, as Oracle thinks it’s just taking care of a problem behind the scenes, but, like I said, this database no longer existed and was taken down while this remote process was running. The problem consisted of three databases, the reporting database, we’ll call REPRT_1, the current production warehouse, PROD_1 and the older production base, PROD_X

Oracle keeps submitting a new transaction in an attempt to complete this circle when it can’t find the PROD_X remote database, last follow through on 2/28/2010 and rolling back for quite some time using the PROD_2, in REPRT_1.

Here’s the pending transaction:
LOCAL_TRAN_ID GLOBAL_ORACLE_ID               STATE        RECO_TIME
——————————————————————————————
6.114.203                 PROD_X.41a46747.6.114.203     collecting     2/28/2010

When did it orginally fail?
FAIL_TIME
8/26/2008

How do we trace this back?

Here’s an example of a failure from the log:

Error 3113 trapped in 2PC on transaction 1.120.13876. Cleaning up.
Error stack returned to user:
ORA-02050: transaction 1.120.13876 rolled back, some remote DBs may be in-doubt
ORA-03113: end-of-file on communication channel
ORA-02063: preceding line from PROD_1

Fri Mar 05 09:21:48 2010
DISTRIB TRAN REPRT_1.764783ad.1.120.13876
is local tran 1.120.13876 (hex=01.78.3634)
insert pending collecting tran, scn=72519859225 (hex=10.e2853c19)

If you look at the entry below, it matches the RECO_TIME in the select for the pending transaction above.

Sun Feb 28 17:42:39 2010
Errors in file /u01/app/oracle……/reprt_1_reco_20549.trc:

ORA-02019: connection description for remote database not found

From the reprt_1_reco_20549.trc trace file:

ERROR, tran=6.114.203, session#=1, ose=0:
ORA-02019: connection description for remote database not found
*** 2010-02-09 19:36:07.388
ERROR, tran=6.114.203, session#=1, ose=0:
ORA-02019: connection description for remote database not found

In this log, you will see every entry for the transactions that have tried to complete the original one, including 1.120.13876 shown in the example from the alert log… This completed to the commited one that you see at the end.

Since the original transaction and database it used to connect to is no longer available, there was a challenge, even after attempting traditional means of dealing with the issue, (i.e. standard attempt to FORCE COMMIT, FORCE ROLLBACK or exec dbms_transaction.purge_lost_db_entry(‘6.114.203’), enter manually into dba_trans, etc…) all failed as the status of the transaction was “collecting” and such.
Example:
force commit ‘1.108.13875’;

ORIGINAL TRANSACTION FROM DBA_2PC_PENDING:

LOCAL_TRAN_ID  GLOBAL_ORACLE_ID FAIL_TIME
RECO_TIME
——— —————————————————————-
6.114.203     REPRT_1.41a46747.6.114.203      26-AUG-08  <–yes, that says 2008!!
28-FEB-10

At this point, I’ve come to the conclusion that since Metalink and googling hasn’t given me any resolutions, it’s time to take the information I have gained from these documents and start thinking up my own resolution. DBA’s are entering data into the tables, is it possible to address the collecting issue in the existing entry? 

I’m quite close, but can’t seem to set the transaction beyond collecting due to this odd collection of circumstances and seem to be caught in a loop…hmmm…

Step 1:
Is the actual transaction “recognized” in the KT* tables at all???
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID *
KTUXESTA Status,
KTUXECFL Flags
FROM x$ktuxe
AND ktuxeusn= 6;

Nope, confirmed- 6.114.203 doesn’t seem to exist, so after retaining the current information from dba_2pc_pending, I hold my breath and go on the attack- (Note- this is a last resort step when this type of situation occurs, remote database has been decommissioned for a long time!!)

Step 2:
First, make it stop trying to recover the transaction-
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

Second, update the status from collecting to committed:
update sys.dba_2pc_pending
set status=’committed’
where status=’collecting’
and local_tran_id=’6.114.203′


commit;

exec dbms_transaction.purge_lost_db_entry(‘6.114.203’);

SQL> PL/SQL procedure successfully completed.
Step 3:
Now that it’s committed, let it clean it up!
ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;

Step 4:
Review that ever growing trace file again!
Now I see at the end of the trace file, /u01/app/oracle……/reprt_1_reco_20549.trc:

*** 2010-03-05 11:59:12.699
is local tran 8.101.13783 (hex=08.65.35d7))
delete pending committed tran, scn=72630478954 (hex=10.e91d286a)

Finally!

Now we just need to check and verify the rows disappear in each of these tables as the distributed recovery recognizes that the process is no longer collecting and does it’s job!

SELECT * from dba_2pc_pending;
SELECT * from dba_2pc_neighbors;
select * from sys.pending_trans$;
select * from sys.pending_sessions$;

The above should all be empty in a short amount of time…
Monitor your alert log, verify that you have resolved the issue successfully and in my case, the problem was resolved.

Print Friendly
March 7th, 2010 by

facebook comments:

  • oraclenerd

    OCD/ADHD indeed! 🙂

    Cool story…most of us mere mortals would have let it die on the vine.

  • Martin Berger

    wow, great approach!
    Now I nave to learn x$kt* 😉

    As you have not listed all the errors you got, have you tried/though about _smu_debug_mode?


    alter session set "_smu_debug_mode" = 4;
    commit; — MUST BE ADDED TO PREVENT ORA-1453
    execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

  • Kellyn Pedersen

    Chet!! 🙂 Only one former boss accused me of OCD, the rest just say I'm "persistent"..:D

    And Martin- I did try the debug with _smu_debug_mode but couldn't get any expected reponses once I performed the alter session command. As this process only appeared to only "partially" exist in the database, causing the failure for it to recovery the distributed transaction or purge it successfully, I think this is why I was not successful with the debug option. I considered inserting into the tables what I thought was the missing data, but actually updating the existing to bypass the "collecting" status just seemed easier!

  • Kellyn Pedersen

    And Chet- I figured if it hadn't died since 2008, it probably wasn't going to go away anytime soon without intervention. This is that same APEX system I was talking to you about, too! (I sooooo still love wrapped code- NOT!!)

  • JK_Bangalore

    Thank you.

  • Amar

    I am pretty sure that even without updating the dab_2pc_pending table, dbms_transaction.purge would have run successfully to delete the old orphan entry.

    Many a times I have noticed this scenario when doing Mview refresh from remote database and the remote connection getting reset and the Mview database getting this error. I have to follow 2 way approach, delete the transaction using dbms_transaction and rerun the Mview refresh job again.

  • It didn’t in this situation and this is a pretty old post from a 10g environment from a couple years ago.
    Thanks!
    Kellyn

  • Facebook
  • Google+
  • LinkedIn
  • Twitter