RMOUG 2010

I attended, (and presented) at RMOUG this year and although it wasn’t as much of a “class reunion” as I expect, (seeing everyone that I’ve worked with at previous companies) I still did meet up with some previous coworkers and met some really great new DBA’s and Developers. It may just have been me, but I decided with the recession, attendance was down and I just need to help and promote it more along with everyone else! :)

payday loans lenders online

My presentation was on parallel execution and no one seemed to notice it was my first presentation- which is a good sign. It wasn’t as solid as I would have liked- I’m just one of those people that either says something absolutely brilliant or ridiculously IQ’less, so my main goal was too keep my IQ deadspots as minimal as possible during my presentation, (keeping them to conversations in the main conference area where my brain is overwhelmed with too many distractions- yay me, now I only sound like an idiot in front of Oak Table folks! :) )

As for the presentations that I enjoyed, there were a number of ones that made a solid impression on me, (and a list of new features to try out in my new 11G environments!):

  • Robert Freeman’s Advanced RMAN Backup and Recovery
  • Jean-Pierre Dijcks, 11G In Memory Parallel Execution
  • Cary Milsap, Method R-  (My personal choice for tuning!)
  • Mike Messina, 11g Partitioning and Advanced Compression

Muphy’s law, I missed a few presentations I would have liked to have attended due to duty calling, (darn Blackberry, we can’t get away these days, can we?) but I left with a few new tricks to try that I think we end up my sleeve.  I really wanted to see Jean-Pierre’s other presentation on 11g dataloading for data warehouses and have a feeling that I will be finding every bug in the new 11g parallel auto settings, (and he’s going to hate what I’m going to have to do to ensure my environment performs optimally with 11g with parallel….sigh)  His presentation was solid and he had folks asking him great questions that I felt were given too much of a “sales pitch” answer, but at least he gave the best answers he could to the questions at hand.

Cary Milsap’s presentation on method R was high level and enjoyable.  I was first introduced to method R by Dan Fink years ago, but we only worked together for a short period of time and due to his great skill in tuning, it inspired me to research what he referred to as “method R” tuning on my own.  It’s a very logical approach to tuning vs. many other techniques.  The aspects I’ve been able to utilize, have served me well, so anyone who gets a chance to see this presentation should make an effort to attend.

Robert Freeman is easy going and easy to listen to.  He had some great new features for RMAN in 11g to share with us that makes me want to upgrade to 11g faster just so I can start utilizing them.  The idea that I can take an incremental backup, apply it to my level 0, so that I would have the same requirements to restore as I would if I were restoring from a standard full backup is a feature that would sell my manager on incrementals.

Mike Messina, Mr. Freebie of TUSC.  You are never bored in his presentations.  They are always incredibly interactive-  tons of questions, input from the audience and discussions on the features.  He rewards his audience’s contributions with prizes of mugs, t-shirts, mousepads and even luggage tags-  and they love it! 

I really wanted to speak with Alex Gorbachev and Cary Milsap, along with talking with Robert Freeman longer than I did, but there always seems to be a shortage on time, too much to do, too many people that you want to visit with, that the new folks you need to make an effort to speak with can drop off the list, (plus I chicken out, knowing that if my brain is going to make the choice between being brilliant or being IQ’less, it’s going for the latter… :) )

I want to recognize all the folks that were so kind to me as a new presenter this year- 

  • Patty Charlebois, a long time presenter who gave me all kinds of advice and was so patient with my nervous chatter at the presenter’s reception. 
  • Tim Gorman, for always being so personable and talking to me-  folks actually told me this week, because he talks to me, that made them decide that I must be ”alright”… (OK, so I’m too honest some times… :) )
  • Randy Cunningham, for sharing funny stories about my “partner in crime”, fellow DBA at my last two companies.  Randy’s one of the few people that knows my quiet, reclusive, fellow DBA and also is aware what a great DBA he is!
  • Carol Dacko, another long-time RMOUG DBA who put up with my nervous chatter and gave me words of support.
  • The DBA/Developers that I’ve worked with, work with that came and supported me- Rajesh, Benny, Jim, Rhonda, Steve and Charles.
  • And Nevin Hahn who gave me great feedback after my presentation, was a DBA that had a lot of similar topics to discuss and was a friendly face in many of the sessions I attended!

I’ll post my slides and white paper for my presentation soon.  There were some valuable queries in the presentation, (mine was definitely not high level in the slides with all the technical details spoken!)

Dealing with Bind Variable Peeking

Tanel Poder had a great post about bind variable peeking, http://blog.tanelpoder.com/2010/02/02/bind-variable-peeking-execution-plan-inefficiency/ 
I run into this exact issue more often than I would like to and have found a few tricks to refrain it from happening in code that is “vulnerable” to the situation.

Truthfully, there isn’t much you can do when Oracle has gone in and performed a bind peek and chosen a poor example to peek on.  As a DBA, you simply know there is an issue when the explain plan states that the CBO has determined the cost is exceptionally low to perform a nested loop on a few rows, although a quick check for the actual value or of the actual rows being processed will show otherwise.

When you are lucky enough that you have internal code that is the victim of this malicious and unprovoked ”bind variable peek attack” :) there are some techniques you can utilize to stop it from becoming a bad habit for the database.

First, check for the following:
1.  Tables in hash joins that are missing or have stale statistics. 

  • If they are stale, ensure that statistics are kept up to date.
  • Create histograms on the columns that use bind variables.

2.  If a table is updated with dynamic sampling, (search in the buffer for ‘%OPT_DYN%’ and the table name   (s).  View the ACTUAL statement in the buffer for what columns in the hash have been collected, (this might include the column with the bind variable from the other table, quite a surprise the first time I came across this!)

  • Ensure, if this is a datawarehouse environment, you are not using the default setting of 2, (I prefer 4-6 for datamarts and warehouse environments.)
  • If at all possible-  COLLECT STATISTICS!  Dynamic sampling should be an enhancement to, not a replacement for statistics!
  • If in a warehouse or mart environment, once the dimension and fact tables are built-  LOCK THE STATS.  This will allow the dynamic sampling to affect the smallest percentage of tables, (See Post, The Good, the Bad and The Ugly.)

3.  If tables are partitioned, inspect the partitioning key and partition sizes.

  • Are there any “place holder” or empty/truncated partitions that can be removed?  These can cause serious havok with explain plans in a warehouse.
  • Are the partitions of similar size or are the partitioning keys leaving the data askew across the partitions?  This will leave you more vulnerable to this situation.
  • If the above is in play-  consider placing the USE_HASH  and FULL hint in vulnerable code to hinder the nested loops issue often seen with bind variable peeking.  **Also note that the FULL hint does not mean full table in a paritioned table, but will result in full partition or just the partition iterator scan across the necessary partitions to fulfill the criteria.

A good example is the following query from our main production database.  The Customer Accounts table has

select

ca.rowid as ca_row_id,
case when (nvl(t.do_not_call_flag,ca.do_not_call_flag) <> ca.do_not_call_flag) then t.do_not_call_flag else ca.do_not_call_flag end as upd_do_not_call_flag,

nvl (t.flex_5, ca.flex_5) as upd_flex_5,
nvl (t.flex_6, ca.flex_6) as upd_flex_6,
nvl (t.flex_7, ca.flex_7) as upd_flex_7,
nvl (t.flex_8, ca.flex_8) as upd_flex_8,
nvl (t.flex_9, ca.flex_9) as upd_flex_9
from OWNR.DI_MTCH_389431_620 t,  <–CTAS Performed just before this step
OWNR.MBR_ACCT ca
where t.i_id = ca.i_id
and t.idnbr = ca.idnbr

and ca.id = :b1 and t.id > 0;  <– ID in questiion has 3,212,685 rows in ca

What does the Explain Plan Look like?
Execution Steps:

Step # Step Name
6 SELECT STATEMENT
5 NESTED LOOPS
1 OWNR.DI_MTCH_389431_620 TABLE ACCESS [FULL]
4 PARTITION LIST [SINGLE]
3 OWNR.MBR_ACCT TABLE ACCESS [BY LOCAL INDEX ROWID]
2 OWNR.MBR_PK INDEX [UNIQUE SCAN]

Step
#
Description
Est.
Cost
Est.
Rows Returned
Est. KBytes Returned
  1   
This plan step retrieves all rows from
table DI_MTCH_389431_6200.
48
60,695
20,626.816
  2   
This plan step retrieves a single
ROWID from the B*-tree index MBR_PK.
1
1
  3   
This plan step retrieves rows from
table MBR_ACCT through ROWID(s) returned by a local partitioned
index.
1
1
0.086
  4   
This plan step has no supplementary
description information.
     
  5   
This plan step joins two sets of rows
by iterating over the driving, or outer, row set (the first child of the
join) and, for each row, carrying out the steps of the inner row set (the
second child). Corresponding pairs of rows are tested against the join
condition specified in the query’s WHERE clause.
6,118
246
104.742
  6   
This plan step designates this
statement as a SELECT statement.
6,118
246
104.742
                                       

Nice explain plan, right?  Unfortunately, the nested loops and the choice to utilize the PK on MBR_ACCT was a poor choice.  The ID has over 3 million rows in MBR_ACCT, yet in this explain plan, the bind variable peek chose one with an incredibly small number, so the index looks very attractive to the CBO.

The resolution for this one was to place a USE_HASH(ca,t) FULL(ca) into the code, which bypasses the global index and proceeds to the partition directly, (ask me another time why I’m not a real fan of global indexes… :) )

When Parallel is Wrong

The Lead Developer and I have been working very, very hard on tuning the main process that produces much of the revenue for our company. The tuning exercise has been very successful but we’ve run into some challenging on the pre-processing that occurs for each week to build the mart that the newly tuned process runs on afterwards. Due to this, I’ve been granted some time to monitor the build process a little closer than previously and although I have a lot of tuning to do, some of the largest challenges comes in the form of queries that the analysts have to execute as part of the final checks before we toggle over to the newly created mart.

These statements all have the same where clause, two different tbl_keys, (12 or 48), pulling the entire table, (each table between 90-380GB) and then ordering the data by the i_id. They all have the same parallel hints, just different tables substituted into the statements. The reason for the sort is that the data is then pulled into an application to finish the final step. The application is incapable of dealing with this type of large sorts, so they decided to do it internal to the database. The final data is used to verify that the correct data is present in the mart of the members that were fed into the primary database that the mart is based off of for the timeline.

select /*+ full(t) parallel (t 8) */ t.*
from tbl_summary t WHERE (tbl_key=12 and usr_id=0)
 

order by i_id ;

The above statement is going to result in 16 parallel slaves, pulling the entire data set for user class=0 vs. a specific id, for the last 12 months of data, then sort it by the usr_id. Note the first hint- FULL. Realize, as this data is across multiple partitions, we will not see a “partition list single” in the explain plan, but “px iterator” for the database to correctly identify the individuals that meet the criteria for each partition. Once it completes this step, then the challenge comes in sorting all that data and the problem with the second hint- PARALLEL.

I love parallel. I am thrilled with the performance gains I can attain when hashing large tables when accessing multiple partitions. I do not love parallel when someone decides to put a group by or order by into the statement. If any sorting has to occur on the statement, I’ve been seeing some odd behavior in about 80% of the executions. It appears that when sorting, it’s not Oracle sorting the data all at once, but sorting one parallel slave’s data set at a time. This creates an escalation in temp space usage that can become insanely difficult for Oracle to process efficiently and performance degrades immensely.

This is what occurred with the statements for the analysts.
With Parallel:
select /*+ full(t) parallel (t 8) */ t.*
 
from tbl_summary t WHERE (tbl_key=12 and usr_id=0)
order by i_id ;

Elapsed Time Breakdown
SQL Time (sec) 47,951.17
PL/SQL Time (sec) 0.00
Java Time (sec) 0.00

Activity By Time
Elapsed Time (sec) 47,951.17
CPU Time (sec) 9,734.81
Wait Time (sec) 38,216.35

Activity By Waits
Remaining Waits(20.9%)
User I/O Waits(58.8%)
CPU(20.3%)

I decided to query the temp usage, as I knew the above query would need to sort the ib_id and would impact the performance of the parallel execution.  The first and last sql_id in the table are the queries in question with parallel, which the one owned by kpedersen is my run at it without parallel:

SID
OSUSER
PROCESS
SQL_ID
SEGTYPE
MB
TABLESPACE
 
571
usr1
26584
029y9k9qvgtp2
SORT
61.5
TEMP
 
772
usr1
26580
029y9k9qvgtp2
SORT
90
TEMP
     
673
usr1
26582
029y9k9qvgtp2
SORT
131.75
TEMP
     
626
usr1
26590
029y9k9qvgtp2
SORT
4370.25
TEMP
   
837
usr1
26592
029y9k9qvgtp2
SORT
41248.75
TEMP
Part 48
with parallel
767
usr1
6434
94bu54yp2ay2v
SORT
107.25
TEMP
     
703
usr1
6436
94bu54yp2ay2v
SORT
73.5
TEMP
     
568
usr1
6438
94bu54yp2ay2v
SORT
278.5
TEMP
     
635
usr1
6442
94bu54yp2ay2v
SORT
1248.25
TEMP
     
811
usr1
6440
94bu54yp2ay2v
SORT
3679.75
TEMP
Create table for Cognitive
data
655
Mine
5976:4628
arz8vhqzt80y3
SORT
2584.25
TEMP
Mine Without Parallel
   
600
usr1
26656
bazjcx55wky5u
SORT
2517.25
TEMP
     
827
usr1
26658
bazjcx55wky5u
SORT
29089.5
TEMP
Part 12
with parallel
 

Now after seeing the waits for temp, the long operations, (441 minutes left on the sort in temp…) I decide to say enough it enough and attempt the execution without the hints- just plain ol’ query of this huge table, nothing more, (see above in the table, showing my temp usage for the single process, as “Mine”.) Fifteen minutes, yes, that’s right, fifteen minutes and I had the first rows back on my screen:

SQL Text-
select t.* from tbl_summary t
WHERE (tbl_key=12 and usr_id=0) order by i_id;

DETAILS on Without Parallel:
Elapsed Time Breakdown
SQL Time (sec) 945.25
PL/SQL Time (sec) 0.00
Java Time (sec) 0.00

Activity By Time
Elapsed Time (sec) 945.25
CPU Time (sec) 370.67
Wait Time (sec) 574.58

Activity By Waits
Remaining Waits(1.1%)
User I/O Waits(59.7%)
CPU(39.2%)

Moral:
Carefully test any parallel processing when sorting is involved. You may not receive performance benefits as expected.