Fast Parallel Rollback and Large Transactions
The SMON spit out this error after the cycle of the db on the rollback of the huge insert into the largest table in one of our warehouse environments. I had serious concerns about a poor execution plan and wondered what we were going to be in for on the rollback and with good reason. The source to the performance hit was a combination of missing column statistics on the main table used for the joins from the staging table, but also fragmentation caused by a poor design choice, (yes, get exchange partition in there ASAP!)
Here is the issue w/ fix with the parallel rollback waits that actually perform faster when put serial-
/u01/app/oracle/admin/xxxx/bdump/xxxx_smon_8262.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Node name: xxxxxxx
Machine: x86_64
Instance name: xxxx
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 8262, image: oracle@xxxx (SMON)
*** 2011-06-08 11:18:38.941
*** SERVICE NAME:(SYS$BACKGROUND) 2011-06-08 11:18:38.941
*** SESSION ID:(872.1) 2011-06-08 11:18:38.941
Serial Transaction recovery caught exception 30319
*** 2011-06-08 11:22:46.742
Serial Transaction recovery caught exception 30319
*** 2011-06-08 11:26:47.117
Serial Transaction recovery caught exception 30319
*** 2011-06-08 11:30:47.490
Serial Transaction recovery caught exception 30319
Instead of running this as “TRUE” or “LOW” for parallel rollback, setting it to “FALSE” is the best choice for a large, complicated transaction, (as there are other transactional processes updating the tables involved in this rollback!)
SQL> ALTER SYSTEM SET fast_start_parallel_rollback=’FALSE’; –temporarily set for this rollback of a large transaction!
System altered.
SQL> select * from x$ktprxrt;
no rows selected
SQL> select * from x$ktuxe where KTUXECFL=’DEAD’ and KTUXESTA=’ACTIVE’;
00002B5A86A2AA00 10344 1 50 191 693218 144 73959 4246983573 41 ACTIVE DEAD 2 0 0 0 0 0 840668
SQL> select * from GV$FAST_START_TRANSACTIONS ;
INST_ID USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT PARENTSEQ XID PXID RCVSERVERS
1 50 191 693218 RECOVERING 1875 842490 865 3200BF00E2930A00 0
Watch the UNDOBLOCKSDONE column for how many blocks per minute are being scanned through. This should go up considerably now that you have set this to serial. If you still see waits on it, please do check for fragmentation, which happened to be my secondary issue someone had slipped into code at one point vs. using exchange partition for such a large disection of data before data loads and updates.
Post the change, started to monitor the process and noted an exceptionally improved performance on the recovery in the block count.
***The one thing to note: If you update this parameter during a rollback, the rollback will START OVER!! I watched it do this twice as I didn’t believe my eyes the first time! Please take into consideration how far you may be into the process before updating the parameter due to this little gotcha, as it may be quicker to just let the slower one complete!
Noting that there was significant waits on the second run of the inserts on the process for this partitioned table again, I happened to refer to a segment fragmentation report I’d just submitted to my manager and it so happened that the ID# for this load was the same as the # on the partitions for the same object that we were having issues loading into.
Post rebuilding the partition and indexes, then updating the code to collect statistics correctly, the load would have completed correctly, (Now, if we can get the time to update the code to exchange partition for this code before this load process comes back around next month!:))
Pingback: BUG parallel transaction recovery « Oracle Explorer: Standing on the shoulders of giants
Pingback: Pepto Bismol Pink: Waiting for “Fast” Parallel Rollback | chrisoracle