Fast Parallel Rollback and Large Transactions

The SMON spit out this error after the cycle of the db on the rollback of the huge insert into the largest table in one of our warehouse environments.  I had serious concerns about a poor execution plan and wondered what we were going to be in for on the rollback and with good reason.   The source to the performance hit was a combination of missing column statistics on the main table used for the joins from the staging table, but also fragmentation caused by a poor design choice, (yes, get exchange partition in there ASAP!)

Here is the issue w/ fix with the parallel rollback waits that actually perform faster when put serial-

/u01/app/oracle/admin/xxxx/bdump/xxxx_smon_8262.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Node name:      xxxxxxx

Machine:        x86_64

Instance name: xxxx

Redo thread mounted by this instance: 1

Oracle process number: 15

Unix process pid: 8262, image: oracle@xxxx (SMON)

*** 2011-06-08 11:18:38.941

*** SERVICE NAME:(SYS$BACKGROUND) 2011-06-08 11:18:38.941

*** SESSION ID:(872.1) 2011-06-08 11:18:38.941

Serial Transaction recovery caught exception 30319

*** 2011-06-08 11:22:46.742

Serial Transaction recovery caught exception 30319

*** 2011-06-08 11:26:47.117

Serial Transaction recovery caught exception 30319

*** 2011-06-08 11:30:47.490

Serial Transaction recovery caught exception 30319

Instead of running this as “TRUE” or “LOW” for parallel rollback, setting it to “FALSE” is the best choice for a large, complicated transaction, (as there are other transactional processes updating the tables involved in this rollback!)

SQL> ALTER SYSTEM SET fast_start_parallel_rollback=’FALSE’; –temporarily set for this rollback of a large transaction!

System altered.

 SQL> select * from x$ktprxrt;

no rows selected

SQL> select * from x$ktuxe where KTUXECFL=’DEAD’ and KTUXESTA=’ACTIVE’;

00002B5A86A2AA00      10344          1         50        191     693218        144      73959 4246983573         41 ACTIVE                                                                                         DEAD                               2          0          0          0          0          0     840668

SQL> select * from GV$FAST_START_TRANSACTIONS ;

   INST_ID        USN        SLT        SEQ STATE            UNDOBLOCKSDONE UNDOBLOCKSTOTAL        PID    CPUTIME  PARENTUSN  PARENTSLT  PARENTSEQ XID              PXID             RCVSERVERS

         1         50        191     693218 RECOVERING                 1875          842490                   865                                  3200BF00E2930A00                           0

Watch the UNDOBLOCKSDONE column for how many blocks per minute are being scanned through.  This should go up considerably now that you have set this to serial.  If you still see waits on it, please do check for fragmentation, which happened to be my secondary issue someone had slipped into code at one point vs. using exchange partition for such a large disection of data before data loads and updates.

Post the change, started to monitor the process and noted an exceptionally improved performance on the recovery in the block count.  

***The one thing to note:  If you update this parameter during a rollback, the rollback will START OVER!!  I watched it do this twice as I didn’t believe my eyes the first time!  Please take into consideration how far you may be into the process before updating the parameter due to this little gotcha, as it may be quicker to just let the slower one complete!

Noting that there was significant waits on the second run of the inserts on the process for this partitioned table again, I happened to refer to a segment fragmentation report I’d just submitted to my manager and it so happened that the ID# for this load was the same as the # on the partitions for the same object that we were having issues loading into.

Post rebuilding the partition and indexes, then updating the code to collect statistics correctly, the load would have completed correctly, (Now, if we can get the time to update the code to exchange partition for this code before this load process comes back around next month!:))

Print Friendly
June 23rd, 2011 by

facebook comments:

  • Facebook
  • Google+
  • LinkedIn
  • Twitter