EM12c’s Configuration Topology

This is another little known feature, but one you have to work with to get a solid feel for and find value in. When I came upon it in EM12c, I was not that impressed, but quickly became more impressed with the “just the facts” and “all in one” interface that it is, especially if you are working in a RAC environment.

payday loans lenders online

Its another one of those features the DBA can stumble upon as they are working through issues, but to purposefully utilize it, the quickest way is after logging into the Enterprise Manager, click on Targets –> Hosts and then click on the host you wish to inspect thru the topology interface.

Once on this host, click just below the host name on the Host drop down button, then click on Configuration –> Topology.
The first view is quite simple- just the host:

If you click on the drop down next to “View” where it has defaulted to “Uses” and click on “Used By”, the view will change, drastically if the chosen host happens to be in a RAC environment.

The default for the “Used by” view is a vertical layout, which I don’t prefer, having a widescreen on my laptop and this can be easily adjusted by clicking on Options –> layout –> left right.

This gives you a display that can be zoomed and easily readable on a wide screen as is common these days.
1. You can drop and drag the topology layout on the screen to narrow in on anything you wish to concentrate on.
2. The boxes have notification symbols on any parent or child that has an event, incident or information that can be drilled down when you “hover” over the object.
3. By clicking on the object, a pane will appear with high level info on the parent or child.
4. That small pane can be expanded to offer more info, such as version, status and incident type.
5. There is a “more” link that can be clicked to open a larger info box with tabs containing even more detailed info.
6. Each box contains links to drill down to EM pages to address or read more details on the specified subject.
7. You have the ability to adjust the layout as complex as you would like to make it, including labeling the links between the node/db /listener/etc. relationships
8. You can print the topology map for reference.

The topology map, in its graphical interface, can be used in the following example:
Let’s say our goal is to become familiar with a new RAC environment’s physical layout, hardware, nodes and supporting structures.

Using the configuration topology, this becomes an easily accomplished feat:

We can see the toplogy, (already changed the view left to right…):

We can then Look at the relationships between the objects by changing the Annotations in the menu above the graph, choosing to show the status, incidents and/or link labels, also choosing how much metric detail we want to show in the graph.

The following close up shows you the information showed in the metrics data for each object:

Hovering over an object will show you more detail, which will open other panes with extended detail information and tabs about incident summaries,

The picture becomes clearer as you continue to browse the topology of the enironment.
You can filter your view by what objects have critical incidents that need to be addressed, simply by clicking on the critical incidents, (the red circle with an ‘X’ in the middle of it..) above the topology graph:

You can also sort by the targets that are up by clicking on the green up arrow next to the critical incidents symbol.

Now after all of that, if you aren’t a graph person, you can even switch it to table mode and drill down through the same features, just through a table format instead of the graph:

From this interface, either the graph or table view, a large amount of the EM12c features are available to the user. I highly recommend utilizing the UI to get a great view of the Oracle environment that the Enterprise Manager is privy to, allowing the DBA to have one more way to interact with monitored targets.

Oracle for the SQL Server DBA Presentation Slides

Chris Shaw from Colorado Springs SQL Server SQLPass group invited me to come speak at this wonderful group last week and I promised a few folks that I would upload my slides to my site, (they are also available via Chris, too…)

The group is a small, close knit group that I was very pleased to have the opportunity to speak to and look forward to having more interaction with in the future!

Thank you to Chris and the rest of Springs SQLPass!

Oracle for the SQL Server DBA Slides

Enkitec and SQLPass, Colorado Springs Style

Two things to be happy about today!

1st-  Enkitec did a lovely announcement on Enkitec regarding my coming on board.  I just arrived last night after three days at the main office in Irving, Tx, (Dallas area is lovely with all the fields of Blue Bonnet flowers this time of year…) and am psyched about working remotely for them from my home, northwest of Denver.

2nd-  I have a wonderful opportunity to speak this evening at Colorado Springs SQL Pass group on “Oracle for the SQL Server DBA”.  SpringsSQL

It’s only going to get crazier the next two months from here, so stay tuned! :)

My Review of REI Monorail Sling Bag

Originally submitted at REI

The REI Monorail Sling bag has the volume of a daypack and the features to meet all the challenges of the daily urban grind. Outdoor-inspired details make hauling the load comfortable and convenient.


Light, Sturdy, Roomy and Comfortable!!

By dbakevlar from Westminster, CO on 3/17/2012

 

5out of 5

Pros: Lightweight, Easy To Access Items, High Quality, Comfortable, Good Capacity, Easy To Carry, Stylish

Cons: No Zipper pocket in main

Best Uses: Weekend Trips

Describe Yourself: Modern, Stylish, Practical, Comfort-Oriented, Career

Travel Frequency: 6-10 Annual Trips

Primary use: Business

Was this a gift?: No

Going on both regional and international trips this year and needed a bag to carry both my ultrabook, tablet and accessories. This bag sits close to the body and is extremely comfortable. I have arthritis in my neck and shoulders, so for anyone needing extra comfort for long trips through airports, cities and towns- this is a great bag!!

(legalese)

KSCOPE 2012 Events!

I’m forwarding out the latest KSCOPE 2012 Email. I love their stuff!!

————————————————————————————
The stars at night are big and bright, deep in the heart of Texas… And some of the brightest stars of the Oracle world will light up the scene at ODTUG’s Kscope12, San Antonio, June 24-28.

Celebrated around the world for its top-notch content, outstanding networking opportunities, and memorable special events, ODTUG Kscope is unquestionably THE premier user conference for Oracle training. Kscope12 promises to deliver the goods again this year!

At last year’s special event, ODTUGgers were putting on the Ritz aboard the Queen Mary. At this year’s special event, Wednesday, June 27, we’re ready to saddle up and giddy up for a real live rodeo at the Knibbe (pronounced ka-NIB-bee) Ranch, a working ranch in the beautiful Texas Hill Country.

All you wranglers get ready for some good old-fashioned fun at the Knibbe including:

Mechanical bull rides
Racing armadillos
An official rodeo (not a showdeo)
Texas two-step and line dancing
A hay ride to tour this certified Texas Century Ranch and its archeological digs
Authentic Texas BBQ

Visiting the Knibbe Ranch during Kscope12 will be an experience you will never forget. Don’t just take our word for it—check out what Harold Hecuba has to say about it by clicking on this video. This once-in-a-lifetime event is included with your registration fee.

If you are bringing your junior wranglers to the conference, ODTUG has a special Wednesday night planned for them too at the Kscope Kids Club at the JW Marriott. Trained childcare professionals will supervise a Glow in the Dark Kid’s Party including pizza, a fun movie, glow in the dark games, fabulous face painting, interactive crafts, and energy-filled dance. Click here for other Fun Factor activities offered at the resort.

So mosey on over to the registration page ‘cause Early Bird Registration ends in less than two weeks. Already saddled up for Kscope12? Great! But hotel rooms are going fast, so make your reservation today.
—————————————————————————————
OK Pardners! Go sign up already!!! :)
~DBAKevlar

The Technical Bully Scenario

I’ve been again, unfortunately, reminded what makes a poor work environment for a DBA or Developer.  Eddie Awad had sent this out and the second one really resonated with me:

http://www.washingtonpost.com/national/on-leadership/how-to-completely-utterly-destroy-an-employees-work-life/2012/03/05/gIQAxU3iuR_story.html?tid=sm_twitter_washingtonpost

I refer to this in the IT world as the “technical bully syndrome” and it’s quite common in the arena.  It’s not because it’s a male dominated group- I’ve seen bullies come in all shapes and sizes, along with being of both genders.  In fact, there are a couple studies out there that say women managers and leads are more likely to bully a subordinate than a male is.

Now there isn’t any law against having a bad boss/manager/lead.  I lucked out in that I actually have quite a good manager where I currently work, but even if you are a good manager, you also have to note the cost to the business when you have people report to someone who does bully.  This individual pinpointed as the bully, will most likely get along very well with 90% of the folks around them.  They have a consistent need though, to pick out an “outlet” for their aggressions/frustrations and may have a small set of subordinates or peers that they switch from, bullying in intervals.    Often the switch occurs as the bully’s target starts to or simply threatens to escalate to management or HR if the bullying does not cease, the bully will then move their bulls-eye onto another target.

The cost?  Time, resources, revenue and productivity to the business.

It takes time to bully someone- time out of the bully’s day, but also time out of the target’s day when they could be productive.  The target can also be highly impacted post the incidents, feeling frustrated, disrespected and devalued.  The time lost then results in productivity loss to both parties, which in turn then costs the company revenue.  If left to continue for very long, resources are lost and high-turnover becomes the norm.

The target, who may have been highly productive in the past will start to show less results and seem to have conflicting requirements.  They may seem frustrated and angry, feeling unable to understand how to address the problem successfully or no matter what changes they make to their work style, without management intervention, they are unable to find a way to avoid the bully’s wrath.

The bully often chooses targets that are not weak, but strong.  Peers or subordinates that make them feel threatened, no matter if the threat is real or imagined.  They will continually find ways to undermine the target and make them seem insignificant.  If the target does decide to choose a defensive pose, becoming angry or responding in kind, this can escalate the behavior from the bully.  Often the bully does not see themselves as doing anything wrong!  They may be completely unaware of their behavior, having grown up with it and/or may live with similar disrespect in their personal relationships currently.

When I have been a target in the past, I have often either threatened the bully because of my technical skills, which they seemed to be completely focused on trying to convince everyone that were not sound and/or give conflicting requests to make it impossible for me to meet requirements or I reminded them of someone, (often a wife or mother..) that they had long-time issues with and felt it was acceptable to use me as an outlet when they became frustrated.

There are those that were equal-opportunity bullies.  It didn’t matter if you were male, female, what race you belonged to or your position- as long as you were below them, you were fair game and you simply did your best to avoid their bulls-eye.

I have stayed employed for a company where I was a target on and off for a couple years, until someone figured out what was going on and rid the company of the bully that was causing the high-turnover and tumultuous work environment.  The DBA’s group at that time found quite peaceful, productive years post the bully’s departure.

I have worked for another company for a very short time, where I saw the bulls-eye aiming for me and left so fast that the bully’s head spun.

No one knows what a DBA does but a DBA.  No one knows what a good DBA is except other DBA’s, (and then some would argue that there are a few that are heavily followed that would not be if most DBA’s only knew… J)  As a DBA who has seen how well IT groups can work together when everyone respects each other and respects what skills each person offers and how poorly the group will perform, how high turnover will be when a bully exists in the group, I hope more managers taken notice.

Yes, it’s not against the law to have a bully in your midst, but it should be a crime to allow it to damage the productivity, growth, resources and revenue for the company.

CBO, Statistics and A Rebel(DBA) With A Cause

This is not an “I’m against Oracle Best Practices” article.

This is not an “I’m preaching against histograms” article.

This is not an article telling folks that they should EVER take the steps or the choices that are presented in this article.

I’m also going to refer to the wonderful post by Greg Rahn as to the reasons why what works in my environment would most likely NOT work in yours:

http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/

What this article IS:

  1.  An honest account of how a unique database design was impacted when changing from a previously researched method of stats collection to post an 11g upgrade to the recommended choices.
  2. The reasons, including the high level data, (to protect the innocent, which happens to be the DBA writing this post, due to her confidentiality agreement…) that lead to the findings.
  3. The steps that were utilized, (and some that are still underway) to correct the performance issues introduced.
  4. The why a DBA should know their data/database, research, test, research, test some more and trust their knowledge.

 

Many folks in the Oracle arena were aware that I was undertaking a complex and challenging 11g upgrade.  This upgrade was challenging due to multiple datamarts, built using RMAN’s duplicate command, that would require me to perform a series of upgrades, one after another successfully.  IT was an “all or nothing” upgrade scenario and due to the interval, I had a very limited timeline to make it right or an inability to rollback the upgrade.

The initial source database environment was upgraded to 11.2.0.2.0 with the latest patch on a Tuesday and the first datamart to be built off of the upgrade was initialized, with all the script changes on that Friday in mid-December.  There were a couple challenges, including the office closed down between the Christmas week, (if no one is using the databases, how do you really know there is something wrong from your user base??  There were positives and negatives, I took the path of less users in the database meant I had downtime to work through any issues!)  The first build was completed and outside of a few, minor performance issues, the initial upgrade phase was successful and progressed along successfully through the initial mart build.

The 10g environment had been offered a slot of my time back in late 2009-early 2010 for careful performance tuning recommendations.  The database is a dual-purpose database that is over 12TB of large, wide main tables, over 100 columns wide for most and incorporating only 2-4 columns per table in any where clause.    Much of the data is used during a load process and almost exclusively in that load process.  This resulted in me recommending, due to this unique design,  to move much of the stats collection into the code level.  The DBA staff previously had relied on dynamic sampling for much of the safety net and by moving to statistics collection within the code, much of the performance challenge stabilized.  There was a nightly job that was the “catch-all”, creating stats collection calls from objects  listed as “STALE” in the DBA_TAB_MODIFICATIONS table.  Due to partitioning strategies that were based on reporting and not on data loads, hints were incorporated into the load processes to stabilize many of the erratic plans that were impacting performance and to deter poor choices from bind peeking.   As we moved the developers from performing DML transactions against the large partitions to a more mature choice of partition exchange, the database performance continued to stabilize and the business grew more satisfied.  The remaining performance issues were considered non-issues and no further performance tuning was really addressed in the source database except to realize there would be some required with the upgrade to 11g and a few one-offs that came up as new code was introduced.

I had learned quickly in 9i and 10g to know my database when it came to how to collect statistics and for this database, where large, wide tables, the same 2-4 indexed columns used in all where clauses and execution plans that needed to be stable, the following held particular, satisfying results:

METHOD_OPT=> ‘FOR ALL INDEXED COLUMNS’

The database simply did not care about the other columns and ran well with a streamlined, efficient data dictionary that did not offer many options than the basic information.  The culture among the developers and former DBA’s had grown into one that due to design challenges, had implemented a solid “code with hints” formula that was the only one I had ever seen created successfully.  It was an abrupt change, considering my former two positions I had spent most of my time removing hints to let the optimizer do its jobs after correcting statistics!  With this new environment, the source database only needed to retain information about the tables as a whole and the indexed columns that would be used in the load and reporting processes.

  • The load process would use the indexed columns.
  • The reporting processes used the partitions

For the 10g environment, stats collection on stale objects could run anywhere from 10 minutes to 8hrs.  The DBA_TAB_MODIFICATIONS table was cleared out nightly and plans were stable.

The 11g upgrade had been in place for the main design for one week when I proceeded forward with the *standard* for 11g statistics gathering.   All the DBA’s were in agreement that I would proceed forward with this post the upgrade, but until they attended Maria Colgan’s sessions at RMOUG in mid February, I don’t know if they truly realized what change I was undertaking, (they’ve mostly left performance tuning and statistics to me happily…)

Maria Colgan has been one of my favorites for quite some time, so of course, her blog posts are referenced here as some of the most important ones to me in my decision for what we should do for stats gathering in our forthcoming 11g databases:

https://blogs.oracle.com/datawarehousing/entry/managing_optimizer_statistics

I’d already utilized many of her recommendations, albeit hacked for the unique mart environments with much success when it came to partitioned objects:

http://optimizermagic.blogspot.com/2009/02/maintaining-statistics-on-large.html

I have extended stats on many of number of specific objects in my marts and even one table that I drop stats and lock the table stats due to dynamic sampling is the ONLY way the CBO utilizes this table correctly on a consistent basis, (always  have to have one odd one out!)  Everything is scripted, runs automatically and consistently since December, 2009, providing, (again!) consistent performance in our multiple mart environments, where prior to December 2009, the performance of the mart environments were erratic and beyond anything the former DBA’s could guarantee.

I knew I would not have the issues in my source database with any of the above concerns, including large partitions, as we were collecting stats at the partition level in the code during load/creation time.

Greg Rahn, Randolf Geist and others all had incredibly informative and universally similar messages on what should and should not be done in an 11g database, including improvements in how 11g’s stats collection features functioned.  I really was only interested in one change and that was to METHOD_OPT.

I’ve always been a DBA engrossed by anything Cost Based Optimizer, (CBO) related and considered myself well-versed in the area, specifically in the challenges of warehouse databases.

I then made my first, “should know better” mistake.  I stopped researching my own environment, testing further stats collection changes from 10g to 11g and blindly trusted what I was reading.   Hey, I was only going to give the CBO more information, let 11g make the decisions on the method of how to collect the statistics, everything else in my scripts would remain the same.

CBO performance is dependent on a number of factors for the feature to function optimally in any database and Oracle offers us auto stats gathering jobs and a scheduler to run these jobs-straight out of the box.  One of the largest dependencies to this offering of is:

  1.  Design of the database.
  • Is the database actually designed so that joins can be effective and efficient?
  • Is the partitioning strategy well thought out and evenly distributed?
  • Do the correct indices exist or are their extra indices that should be reviewed and removed?
  • Are their hints, bugs or any other obstacles to the CBO doing its job?

2.   Data

  • How is the data loaded vs. queried.  Is this process very different from the requirements.
  • Is the database multi-use so that transactional and reporting reside in the same database, resulting in concurrency issues?
  • Is the database so large that the auto scheduler window runs out of time each day, building up a queue of objects that never have stats gathered?

I had collected my 10g SQL Baselines and had these to return to at any time.  I was confident in my skills to determine good plan from bad plan, (I work with a DBA currently who first approached me in a conference a couple years ago because he wanted to know how I knew a slide in my presentation was a mistake, having put in a “bad plan” for a specific SQL statement…  This all goes back into the depth some of us obsessed DBA’s get into our data..:))

Now, with this in knowledge in hand, I implemented the following change:

METHOD_OPT=> ‘FOR ALL INDEXED COLUMNS’

Yes, I simply removed the method option, allowing Oracle to make the choice it felt was best.  This is when, slowly but surely, all $#%& broke loose.

 

Now many of you will quickly assume it was in the form of poor performance and you would be correct.  The reason behind the performance and what was most challenging about it, was what was not as clear.

I did have a few incidents of bind peeking and where hints were ignored and such.

select * from table(dbms_xplan.display_awr(’7rjk8mr85s2nq));

Wow…21 different plans for the same statement…

INSERT INTO DW_USR.CT1 (COL1,COL2,COL3,COL4,COL5,COL6,COL7.....
COL18,COL19,COL20)
SELECT * FROM ( SELECT S.COL1, S.COL2, S.COL3, ROW_NUMBER ()
OVER (PARTITION BY S.COL5, S.COL6,S.COL7, S.COL8
ORDER BY S.COL12) COL6,S.COL2, S.COL12, S.COL15,
S.COL19, S.COL5, S.COL15, S.COL22,
S.COL27, S.LAST_FILE_ID, S.COL28,
S.COL29, S.COL31, S.COL33,
S.COL35, S.COL36, S.COL37,
S.COL38, S.COL39, S.COL40,
S.COL43, S.COL44, S.F1, S.F2, S.F3,
S.F4, S.F5, S.F6, S.F7, S.F8, S.F9, S.F10,
S.COL45, S.COL46, S.COL47,
S.COL48, S.COL81
FROM ( SELECT /*+ use_hash(cts ca) */
CTS.COL5, CA.COL3, CTS.COL6,
CTS.COl76 AS COL6, CTS.COL2,
CTS.COL12, CTS.COL15, NULL AS COL19,
CTS.COL5, CTS.COL15, CTS.COL22,
CTS.COL27, CTS.COL7 AS LFID,
CTS.COL28, CTS.COL17, CTS.COL31,
CTS.COL33, CTS.COL35, CTS.COL36,
CTS.COL37, CTS.COL38,
CTS.COL39, CTS.COL40 AS COL40, :B5 AS
COL43, :B5 AS COL44, CTS.F1, CTS.F2,
CTS.F3, CTS.F4, CTS.F5, CTS.F6, CTS.F7, CTS.F8,
CTS.F9, CTS.F10, CTS.COL45, :B4 AS COL46, :B3 AS
COL47, CTS.COL48, CTS.COL49, CTS.COL50,
CTS.COL51, CTS.COL52,
CTS.COL53, CTS.COL54,
CTS.COL55, CTS.COL48,
CTS.COL28
FROM DW_USR.CTS1 CTS,
(SELECT COL2, COL3, COL5
FROM (SELECT COL2, COL3,
COL5, ROW_NUMBER () OVER(PARTITION BY COL2, COL5
ORDER BY COL43 DESC, IND_ID) C_RNK
FROM DW_USR.CA1  WHERE COL2 = :B1 ) C
WHERE C_RNK = 1) CA
WHERE CTS.P_ID = :B2 <--partition ID on stage table.
AND CTS.COL2 = :B1 <--Askew partition indicator for final tables.
AND CTS.COL2 = CA.COL2 (+)
AND CTS.COL5 = CA.COL5 (+)
AND CTS.COL72 = 'N' ) S ) X
WHERE X.COL3 IS NOT NULL
AND NOT EXISTS
( SELECT /*+ use_hash(ct) */ 1  ßHint, yeah, who cares??
FROM DW_USR.CT1 CT
WHERE X.COL9 = CT.COL9
AND X.COL4 = CT.COL4
AND X.COL2 = CT.COL2
AND X.COL2 = CT.COL3
AND CT.COL3 = :B1 );

This is where I remind everyone that yes, a hint can be ignored by the CBO.  A hint is simply a recommendation to the database.  If Oracle believes it has more complete information than your hint, it can continue to make its own decision about the path it will take in the database.

  • The data is askew across partitions.
  • The partitioning key is different on the ca table than the other tables.
  • The plan changes were not just in different order, it also involved different index usage, too.

How is this possible?  For the cts, staging table, the actual P_ID or partitioning key is a random number.  There are 1500 of these partitions, mapped out in a table to mark them as used or unused.  The original concept for the design was this staging table was a “parking lot” and any load process could choose an empty space, (partition) and load into it.  This left us open to considerable challenges for the CBO to make a solid decision when the lot could have a smart car one time in it and a semi-truck with trailer in its spot the next run.  I don’t find this a flaw in the CBO, but one in the design that the developer and the DBA group needed to address.

 

PERFORMANCE TANK

Two weeks since change:

  • Very minor performance hits outside of what I consider- “Why we should not insert/update/delete against large partitions” issues.  Can anyone say “REWRITE!!”
  • Stats job starts to run long…longer each day.  Not wanting to complete.
  • SYSTEM and SYSAUX tablespace, multiple additional space requests, (escalated and performed by after hours support, so not first noticed by main DBA staff, multiple environments and tablespace requests nightly, as it is…

 

Due to a performance challenge on a report, I ran a 10046 trace.  A red flag triggers when I start noticing large counts and long runs against certain data dictionary objects-  those involved with histograms and objects stats are clearly apparant:

 

SQL ID: 96g93hntrzjtr Plan Hash: 2239883476     COST: RULE

 

select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#,
sample_size, minimum, maximum, distcnt, lowval, hival, density, col#,
spare1, spare2, avgcln
from
hist_head$ where obj#=:1 and intcol#=:2
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        7      0.00       0.00          0          0          0           0
Execute    209      0.02       7.01          0          0          0           0
Fetch      209      0.00       3.04          6        836          0         209
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      425      0.02      10.06          6        836          0         209

 

SQL ID: db78fxqxwxt7r Plan Hash: 3312420081  COST:  RULE

 

select /*+ rule */ bucket, endpoint, col#, epvalue
from
histgrm$ where obj#=:1 and intcol#=:2 and row#=:3 order by bucket
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        4      0.00       0.00          0          0          0           0
Execute     56      0.01       7.03          0          0          0           0
Fetch       56      0.00       5.21         26        168          0         805
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      116      0.01      12.25         26        168          0         805

 

I knew I was seeing some large waits, but I was collecting stats on system level objects, fixed stats, plus, this was RULE Based Optimizer.  What was the sudden, consistent high fetch and times on different data dictionary calls for the optimizer?

I then noted the allocation of more space to the SYSTEM and SYSAUX tablespaces.  I queried objects from these two tables:

 

 SEGMENT_NAME                             MB
1              C_OBJ#_INTCOL#                           6723
2              I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST           6113
3              WRI$_OPTSTAT_HISTGRM_HISTORY             4697
4              I_WRI$_OPTSTAT_H_ST                      3038
5              WRH$_ACTIVE_SESSION_HISTORY              1910
6              SYS_LOB0000008982C00004$$                1096
7              WRH$_SQL_PLAN                            1088
8              WRI$_OPTSTAT_HISTHEAD_HISTORY            971
9              I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST            935
10             I_H_OBJ#_COL#                            804

 

I then checked a few partitioned tables, noting HOW many histograms were being collected for all the wide tables vs. what would ever be utilized in the simple queries, (remember, the queries here want ALL the data for anything in 2-4 columns…)  We previously would have only have the indexed columns with column stats data, few histogram data.

 

Of the main five, wide tales, I had the following histogram distribution:

OBJECT                 INDEXED COLUMNS                        HISTOGRAM COLUMNS
CA_TBL                      2%, (3)                                                  80%
CT_TBL                      4%, (5)                                                  84%
I_TBL                       2%, (3)                                                  70%
HH                          2% (3)                                                   74%
ILS                         1% (1)                                                   90%

 

The final counts came up and I had 30GB of new stats info *bloating* my SYSTEM and SYSAUX tablespace that I hadn’t had before and that was causing performance issues that many don’t really think of.

The performance waits added by the additional statistics on the database, which you need to also include to all load processes, as they collect statistics at the TIME we load the partition, increased processing time by 18% overall.

Now, with this said, we need to think about how stats are retained in the data dictionary, both for use by the CBO and if we needed to revert.

Would I gain anything in performance by reverting back?

Upon testing I realized:

  • A simple reversion to 10g plans would not solve this issue.
  • A simple change in back to how we collect statistics would not correct the issue, outside of clearing the data dictionary of all the stats post the retention time.

I chose to perform some lower level tests against the stats changes.  I just happen to have a couple copies of the main schema for our source database lying around!  It is utilized to build our marts, but available for testing afterwards.

I performed the following:

  • Change the retention from 31 days to 10:
exec dbms_stats.alter_stats_history_retention(10);
  • Tried to purge the stats from the data dictionary, but after two days of waiting for the delete statement to clear out some 30GB of stats, finally gave up and let it clear manually.
exec DBMS_STATS.PURGE_STATS(SYSDATE-10);
alter table WRI$_OPTSTAT_IND_HISTORY move;
  • After this, needed to rebuild my indexes…

 

Well, that’s a little better…from 30GB down to 12.6GB

 

Choosing the following statement as a test example I could point at a couple partitions, (there were others…):

SELECT I.I_ID, I.FN_COL, I.ROWID, I.ORA_ROWSCN
FROM CA_TBL C, I_TBL I
WHERE C.M_ID = 41891 <--Partitioning Key
AND C.C_NBR = 2000065698902 <--Unique column, part of index
AND I.I_ID = C.I_ID; <--Unique Column part of index

 

I chose one of the largest partitions in the CA_TBL and noted that it had stats collected on 100% of the columns and 800% of those contained histograms.

The execution time was 12.02 seconds.

Ran a 10053 trace file throughout the steps…

 

Clean out ALL STATS:

EXEC DBMS_STATS.DELETE_TABLE_STATS(ownname=>'<OWNER>', tabname=>'CA_TBL', PARTNAME=>'P_41891');

 

Recollected the table stats:

EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>’OWNER’, TABname=>’CA_TBL’, PARTNAME=>'P_41891', METHOD_OPT=>'FOR ALL INDEXED COLUMNS', CASCADE=FALSE, DEGREE=8);
EXEC DBMS_STATS.GATHER_INDEX_STATS(ownname=>'DW_PROD', indname=>'CA_ACCT_PK', partname=>'P_41891');

Cleared the shared pool and ran again…

Elapsed: 00:00:00.01

Yes, I traced it, yes, I grabbed a 10053 trace, too.  I wanted to know WHY performance was so much better!

 

It used the SAME explain plan both times:

First RUN:

============
Plan Table
============
----------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
----------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |            |       |       |  1965 |           |       |       |
| 1   |  NESTED LOOPS                        |            |       |       |       |           |       |       |
| 2   |   NESTED LOOPS                       |            |     1 |    52 |  1965 |  60:21:57 |       |       |
| 3   |    PARTITION LIST SINGLE             |            |     1 |    26 |  1964 |  60:19:06 | KEY   | KEY   |
| 4   |     INDEX RANGE SCAN                 | CA_PK |     1 |    26 |  1964 |  60:19:06 | 2636  | 2636  |
| 5   |    INDEX UNIQUE SCAN                 | I_PK     |     1 |       |     1 |  00:02:51 |       |       |
| 6   |   TABLE ACCESS BY GLOBAL INDEX ROWID | I_TBL |     1 |    26 |     1 |  00:02:51 | ROW LOCATION| ROW LOCATION|
----------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
4 - access("C"."M_ID"=41891)
4 - filter(TO_NUMBER("C"."C_NBR")=2000065698902)
5 - access("I"."IND_ID"="C"."I_ID")

 

============
Plan Table
============
----------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
----------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |            |       |       |   624 |           |       |       |
| 1   |  NESTED LOOPS                        |            |       |       |       |           |       |       |
| 2   |   NESTED LOOPS                       |            |     1 |    52 |   624 |  19:10:52 |       |       |
| 3   |    PARTITION LIST SINGLE             |            |     1 |    26 |   623 |  19:08:01 | KEY   | KEY   |
| 4   |     INDEX RANGE SCAN                 | CA _PK |     1 |    26 |   623 |  19:08:01 | 2636  | 2636  |
| 5   |    INDEX UNIQUE SCAN                 | I_PK     |     1 |       |     1 |  00:02:51 |       |       |
| 6   |   TABLE ACCESS BY GLOBAL INDEX ROWID | I_TBL |     1 |    26 |     1 |  00:02:51 | ROW LOCATION| ROW LOCATION|
----------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
4 - access("C"."M_ID"=41891)
4 - filter(TO_NUMBER("C"."C_NBR")=2000065698902)
5 - access("I"."I_ID"="C"."I_ID")

 

But you’ll notice the cost has gone down significantly for CA_PK on the CA_TBL.  The time has decreased significantly.  Why?

 

In the 10053 Trace File, the Before and AFTER:

FIRST RUN-

Index Stats::
Index: CA_ PK  Col#: 1 4 2  PARTITION [2635]
LVLS: 2  #LB: 19641  #DK: 16052378  LB/K: 1.00  DB/K: 1.00  CLUF: 11032002.00
LVLS: 2  #LB: 19641  #DK: 16052378  LB/K: 1.00  DB/K: 1.00  CLUF: 11032002.00
Index: CA_ IX1  Col#: 4  PARTITION [2635]
LVLS: 2  #LB: 9776  #DK: 15506817  LB/K: 1.00  DB/K: 1.00  CLUF: 11030574.00
LVLS: 2  #LB: 9776  #DK: 15506817  LB/K: 1.00  DB/K: 1.00  CLUF: 11030574.00
Access path analysis for CA_TBL1

SECOND RUN-

Index Stats::
Index: CA_ PK  Col#: 1 4 2
    USING COMPOSITE STATS
LVLS: 2  #LB: 7467054  #DK: 5267937617  LB/K: 1.00  DB/K: 1.00  CLUF: 4906174903.00
Index: CA_IX1  Col#: 4
USING COMPOSITE STATS
LVLS: 2  #LB: 4006915  #DK: 3749763777  LB/K: 1.00  DB/K: 1.00  CLUF: 4911402731.00
Access path analysis for CA_TBL

 

By using “FOR ALL INDEXED COLUMNS” Composite Stats were built for the index in question and this did result in a better plan for the query used for the test than if there were individual column stats that had been created.

 

The second gain was due to data dictionary clean up that removed waits for the internal SQL Statements:

SQL ID: 96g93hntrzjtr Plan Hash: 2239883476

 

select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#,
sample_size, minimum, maximum, distcnt, lowval, hival, density, col#,
spare1, spare2, avgcln
from
hist_head$ where obj#=:1 and intcol#=:2
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        6      0.00       0.00          0          0          0           0
Execute     84      0.00       0.01          0          0          0           0
Fetch       84      0.00       0.04          8        330          0          78
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      174      0.00       0.05          8        330          0          78

 

SQL ID: db78fxqxwxt7r Plan Hash: 3312420081

 

select /*+ rule */ bucket, endpoint, col#, epvalue
from
histgrm$ where obj#=:1 and intcol#=:2 and row#=:3 order by bucket
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute     36      0.00       0.00          0          0          0           0
Fetch       36      0.00       0.05         11        108          0         623
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       75      0.00       0.05         11        108          0         623

 

Ahhhh…much better!!

 

Why did I delete the old statistics before collecting with a new method option?

I’ve always been fascinated when looking at DBA_TAB_COLUMNS vs. DBA_TAB_COL_STATISTICS and DBA_PART_HISTOGRAMS.  I’ve often noted the discrepancy between their views when instead of purging and recollecting stats, a different method option is chosen, leaving old histogram data in place and how this can impact the choices for the CBO:

 

  • Columns with “old histogram/column data” appear more attractive than columns with updated statistics when natural growth occurs.
  • Incorrect execution plans confusing the optimizer.
  • Misleading data for the DBA/Developer when viewing statistical info from the stats data in columns views vs. histogram or historical stats data.

 

My First Recommendations

  1. Decrease the retention policy on statistics from 31 days to 7 to purge out the stats from the data dictionary quicker.
  2. Attempt to test on one of the datamarts DBMS_STATS.PURGE_STATS(sysdate-7) and see how long it would take to purge the old stats, (the mart contained the source data dictionary from the RMAN duplicate..)
  3. Rebuild the supporting “OPSTAT” indexes that would now be quite fragmented from the purging process.
  4. Only collect statistics for indexed columns from this point out, documenting the scenario for historical purposes.

 

Upon taking on this task, step #1 was quick.  Step #2, not so much.

DBMS_STATS.PURGE_STATS utilizes a delete statement.  Have you ever tried to run a single delete on constantly accessed, bloated, data dictionary tables, non-parallel, must complete before you it will commit?

 

Two attempts were made to purge stats from the data dictionary, both lasted over two days.  First one was forced to end due to requirements, the second ended during a sudden hiccup on the network, (why DBA’s should nohup or screen EVERYTHING!:))

 

Final Recommendation

The final recommendation, as I am on my final week with the company was to pull back the retention time and let them age out naturally.

I also recommended, due to the partition level stats being collected in the code during the load stage.  The best choice was to delete the stats before recollecting, but do so granularity=>AUTO so as to not collect stats on an already collected partition level objects.

 

Moral

There is always one exception to the story.  The day we say that it should ALWAYS be done a certain way or blindly do something because that is the way it is supposed to be done without using our intellect and skill to verify it is the right choice for our environment, is the day we can make a mistake.

I still believe in the standard practice for how statistics are best collected in Oracle 11g, but I also am aware, I just happen to be working in an environment that has the combination that did not benefit from the common, best choice.


RMOUG 2012 Revisited

As many of you know, I’ve been a bit busy since RMOUG Training Days finding a number of presentation opportunities and while I was at it, a new gig with Enkitec. I am finally sitting down and putting to paper what a great year the RMOUG Training Days conference had for 2012.

As everyone is still reeling from the great keynote from Cary Millsap, (so glad I was wrong about screwing up my passenger pick-up schedule, thinking I’d left our keynote speaker at the airport!!)  His speech was enlightening, enjoyable and worth every minute of a heavy morning commute to attend the initial morning!

First, kudos directed to TeamYCC and John Juenette for all their hard work for heading up the conference. John was this year’s conference director and it is a challenging endeavor ensuring all goes smoothly. There is a ton of moving parts as the conference comes together and if these are not coordinated effectively, the conference will be left less than enjoyable for the presenters and its attendees. John decided to add the challenge of performing his duties from Vermont, where he moved earlier in 2011 due to his own career opportunity, so he’s a real trooper and will be sorely missed on the board of director’s next year.
My next shout-out goes to all the folks in the social media arena that were wonderful, ensuring to participate in tweets, blog posts and messages in regards to presenting and attending the conference. Last year, I was able to count all the “tweeters” on one hand, (also prompting me further to run for the board…) where is this year, there were literally hundreds of posts, tweets and blog comments.
Third, I want to thank all the ACE and ACE Directors, many of who heard my plea to come to RMOUG and present, (they are so patient with me… :) ) during Oracle Open World and either submitted abstracts, attended or did both and marketed the conference for RMOUG. I appreciate the support they give to this great “grass-roots” conference and all that it has to offer.
Finally, I want to thank the great board of directors that I get to be a part of and the user group as a whole. We have an incredibly impressive, versatile and supportive membership that drives the conference to be more than it was before, yet keep what makes this area a great place to live as a technical specialist in our respective arena. The continued hard work of the board, including our wonderful administrative assistant, Heidi Kuhn, deserves a round of applause.
The conference as a whole was a local Goliath with the lineup, alone! It was the first time since 2004 that Jonathan Lewis, Pete Sharman and a few others have presented or even attended. This year we had Maria Colgan, Graham Wood, Riyaj Shamsudeen, James Morle, Marco Gralike, Guy Harrison and a mantra of impressive presenters- more names than I can list. We had a record year for ACE and ACE directors, along with an incredibly high amount of OAK Table members. The content and quality was amazing, as seen at the second day’s lunch when each ACE/ACED was assigned their own table for questions. My coworkers had narrowed in on Jonathan Lewis’ table early on and I enjoyed watching Jonathan keep up with our resident “Tsunami” Developer’s comments.
My two presentations for first day both went well. My first one, on EM12c was quite packed and did well- well enough that it’s been picked up for a number of conferences, including MOW and KSCOPE this year. I thoroughly enjoyed giving my “Sherlock Holmes for DBA’s” and appreciate my bud, Riyaj Shamsudeen offering to be my Dr. Watson if I ever need one, (considering the man’s technical skills, Holmes would prevail every time!!:))
I really enjoyed a number of presentations, including those given by Dominic Delomino, Don Seiler, Pete Sharman, Alex Gorbachev and Marc Fielding. I seemed to have my own presentations scheduled up against Jeff Smith, Gwen Shapira , Jonathan Lewis and even Tim Gorman’s. I had a chance to sit with Gwen Shapira a number of times, along with Marc Fielding and Karl Arao, (my new coworker!!) at many of the presentations I attended.
I ensured that I “stalked” Jean-Pierre Djicks, as often as he has stalked mine in the past. I keep waiting for an email from him, pointing out my frustrated comment regarding system stats, calibrate IO and the chaos that this poorly designed area of database IO information has become, but so far, I’ve been spared.
I finally got to meet Gaja Krishna Vaidyanatha ,serving as his stand-in room ambassador….for about 10 minutes, then his assigned one showed up and I told him he did not need me any longer and explained that I must show up for Kerry Osborne’s presentation, (which I’m sure he now understands the importance of my attendance… :) ) It was still very flattering that instead of “DBA Goth Princess”, he just referred to me as “DBA Goddess”…Yeah I’ll take that any day!
The actual Training Days Conference, post set up, always starts with the Presenters/Volunteer Appreciation banquet at Wynkoop, a local brewery, which I always enjoy. It’s a chance for many of the presenters to unwind after traveling into town and visit with each other, along with the Training Day’s volunteers. My partner was picking up folks at the airport for the conference, so I walked down to the event with a number of other folks from the hotel, including Don Seiler, Jeff Smith and Kris Rice.
The ACE dinner, Wednesday evening was excellent, even if I was knocked half unconscious by a new RA medication, (yes, I practically fell asleep at my first, “official” ACE dinner!!) OTN chose one of the newly recognized, top restaurants in Denver,  Row 14, so dinner was exceptional. Peggy King finally made a few excuses for me after I kept nodding off and I went back to my hotel to sleep it off, (thank you, Peggy!)
The last day was busy and fun- Got to meet and work for a short time with Jeremiah Wilton on his Oracle vs. SQL Server Recovery competition project, met fellow “Yooper”, Jonathan Gennick of Apress and then packed the SUV to go up to the mountains for “FOAK Table” and skiing, (fun and chaos of over 30 people in a house in Breckenridge for the President’s Day Weekend!  :) )
I’ve already started to plan out a few ideas for RMOUG Training Days 2013, (Feb. 11th-13th, I believe are the dates!) as the plans are being formulated for me to try on the Training Day’s Director hat. There are so many great conferences out there, all with their own unique flavor and I hope Training Days can keep the best of what it has to offer and still have a few more surprises that will keep everyone coming back for more!