The Problem with the Automated Statistics Gathering
Any DBA worth the time of day knows how important having solid statistics is to a database. This is how the cost based optimizer is able to make solid decisions about the best path to take when performing a table scan, choosing an index, the amount of resources required to perform a task and so on and so forth.
In any database that is built, post 10g, release 2 and on, there are maintenance jobs that are created by default, one of them, to collect statistics automatically, with it’s goal to ensure that the database has the newest, most updated statistics for the cost based optimizer. In most OLAP environments, I have found that this job lulls the DBA into believing they are covered for statistics only to find out there are a few issues with the automated jobs that make me want to submit a request for a GSCA, (Gather Statistics Configuration Assistant).
Most DBA’s are familiar with the automated jobs that are created, but how familiar are we all with the pieces that make up the whole of the jobs in the DBA_SCHEDULER_JOBS view? (OK, so I’m only going to cover a few, feel free to go on you own from there!)
From a high level, the statistics job appears pretty clear:
select Owner, job_name, program_name, schedule_name, schedule_type,
failure_count, last_start_date , comments
SYS GATHER_STATS_JOB GATHER_STATS_PROG MAINTENANCE_WINDOW_GROUP WINDOW_GROUP 0 22-NOV-10 10.00.01.549828 PM -05:00 Oracle defined automatic optimizer statistics collection job
The job is owned by SYS, it gather statistics, it calls the gather stats program, started at 5pm last night and a nice comment defining what the job does and its scheduled as part of some windows group?
First of all, this job has multiple steps to it. The first one is a feature that I am actually quite fond of. Oracle goes out, inspects objects in the database and when it finds that they posses stale statistics, it populates this data to the SYS.DBA_TAB_MODIFICATIONS table. If you haven’t taken some time to familiarize yourself with this table, do so- I think this was one of the coolest new features added to 10g, at least for me, a DBA who is often obsessed with statistics, thought it was… There are some drawbacks to it. Upon observing the behavior of this table, it appears to work through objects at the table level, so often it will cascade all the partitions into the table vs. inspecting at the partition level first for a table and as databases become larger, it only inspects a section of the database at a time, so it might not be as current as a DBA would like as databases become larger, (as to why this occurs, we will discuss soon…)
The second part is the statistics gathering, which is based off the objects in the DBA_TAB_MODIFICATIONS table. The job then collects statistics just on these stale objects. There isn’t any guessing, no collecting of stats on objects that may not be stale, which is a great time saver. The drawback, if the job exists in a large, high transaction database, this can mean a lot of rows in the DBA_TAB_MODIFICATIONS table that need to be cleared out during the maintenance window each night.
Now for the “Windows Group” information from the stats job I showed you earlier. This is the area that most people haven’t looked into and its most interesting:
select window_name, repeat_interval,duration, window_priority,
next_start_date, last_start_date, Comments
1. WEEKNIGHT_WINDOW freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0 +000 08:00:00 <--Duration, allowed to run for 8 hrs max! LOW 23-NOV-10 10.00.00.200000 PM 22-NOV-10 10.00.00.092053 PM Weeknight window for maintenance task
2. WEEKEND_WINDOW freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0 +002 00:00:00 <--No limit to the weekend run, in fact, you can restart multiple times. LOW 27-NOV-10 12.00.00.000000 AM 20-NOV-10 06.00.01.202642 AM Weekend window for maintenance task
So, the maintenance tasks can run for a max of 8 hrs from 10pm on week nights, (will be ended at 6am, no matter if work is complete or not) and till the end with restarts on the weekends. I’m sure you are all following me so far, right?
Lets look at the GATHER_STATS_PROG and it’s details and you should be starting to build a picture here…
It’s main job is to collect statistics based on the following:
1. the tables found in the sys.DBA_TAB_MODIFICATIONS table.
2. The sample size set by by the database, (commonly at 25% but I’ve seen it determine 100% in some databases)
3. Deciding the method option for the tables, determining if histograms are required, etc..
4. Determining the date the job is to be processing stats for, the priority of what objects should be gathered and in what order.
5. Exiting the gather stats job at the end of the maintenance window and not “over-running” this window.
If you have a database that is not a small, OLTP, I would like you to now go an inspect your sys.DBA_TAB_MODIFCATIONS table with the following query:
select * from sys.dba_tab_modifications
where table_owner not in (‘SYS’,’SYSTEM’)
order by timestamp;
The timestamp is when the object found in the DBA_TAB_MODIFICATIONS table was marked stale by Oracle. Do you have objects that have been sitting in the table that are older than one week? One month? One year?!?!
First of all, you aren’t the only one- When I first came across this and was troubled, I started to “Google” as many of us do and found out that others are not too happy with this “safety mechanism” from Oracle that has the unformtunate side effect of leaving objects with stale statistics. If you are running these jobs and want to be more concerned, I would take out the where clause from the select and see how many of your data dictionary objects are stale. They are considered low priority and are often why your queries to you dba views are so slow, (IMO…)
There is always a light at the end of the tunnel. Oracle didn’t make it easy to update and change a lot of these features, but I can make recommendations that can fix the problems left to those with databases that might not look like “Uncle Larry’s” test lab.
1. 10g separated the stale collection from the stats job. I prefer this method and do like the stale collection job, which I have kept in all my production databases. Many DBA’s have implemented the 10g, release 2 version of the two separate jobs vs. the “all for one” job that is now in 11g by reverting to the older dbms_stats, (there are a couple ways to actually do this, I’m told….)
2. Segment advisor job- it’s set and scheduled to run right along with the stats job. I see no reason for this to run each night and it can be a huge burden on the system. Reschedule it to run once a week or better yet, let DBA’s run it on demand when they actually need it.
3. The stats job is set to defaults that can not be changed, especially many of the window group aspects. Do I agree with them? Not really. In a large OLAP environment with a large partitioned base, and few indexes, I’m not going to want a lot of histograms and prefer less information that might “bog” down my data dictionary and such. I also might want it to clear out all the data in the DBA_TAB_MODIFICATIONS table on the weekends when I have the time, but it just doesn’t work that way currently, working off the current date.
4. Monitor the DBA_TAB_MODIFICATIONS table. If you are still learning your environment, script it to notify the DBA’s when there are tables that have timestamps older than a certain retention time. If your dba views are starting to return slower than you would like, look into waits on obj$, tab$, col$, these often are a sign that your database might be giving TMI, (too much information) to the database with the automatic statistics gathering job, too!
What has been my solution? I actually wrote a shell script that utilizes the DBA_TAB_MODIFICATIONS table, dynamically creating the stats gathering scripts with lower sample sizes and a method option that gathers for my databases, depending on use, what is needed, but this also requires that I know my databases and I know my data- keep that in mind. As I said up above, I chose to keep the separated jobs, keeping the job to collect stale objects, disabling both the GATHER_STATS_JOB and the AUTO_SPACE_ADVISOR_JOB, then running my script to collect dynamically. I have a threshold set so if I suddenly dump 4000 partitions in the DBA_TAB_MODIFICATIONS, I won’t run into the work day, (My script’s weekend version clears everything out vs. the weekday that has that threshold setting…) This hasn’t been the answer for all databases, but it really took a lot of pressure off of the data dictionary, performance increase and my DBA_TAB_MODIFICATIONS table is empty each morning.
I’ll post my scripts to dynamically gather stats from DBA_TAB_MODIFICATIONS in this next week to my scripts page. I’m off this next week to train for three weeks in Canada and hoping to have some time to post while at the hotel!