The Problem with the Automated Statistics Gathering
Any DBA worth the time of day knows how important having solid statistics is to a database. This is how the cost based optimizer is able to make solid decisions about the best path to take when performing a table scan, choosing an index, the amount of resources required to perform a task and so on and so forth.
In any database that is built, post 10g, release 2 and on, there are maintenance jobs that are created by default, one of them, to collect statistics automatically, with it’s goal to ensure that the database has the newest, most updated statistics for the cost based optimizer. In most OLAP environments, I have found that this job lulls the DBA into believing they are covered for statistics only to find out there are a few issues with the automated jobs that make me want to submit a request for a GSCA, (Gather Statistics Configuration Assistant).
Most DBA’s are familiar with the automated jobs that are created, but how familiar are we all with the pieces that make up the whole of the jobs in the DBA_SCHEDULER_JOBS view? (OK, so I’m only going to cover a few, feel free to go on you own from there!)
From a high level, the statistics job appears pretty clear:
select Owner, job_name, program_name, schedule_name, schedule_type,
failure_count, last_start_date , comments
from dba_scheduler_jobs;
SYS GATHER_STATS_JOB GATHER_STATS_PROG MAINTENANCE_WINDOW_GROUP WINDOW_GROUP 0 22-NOV-10 10.00.01.549828 PM -05:00 Oracle defined automatic optimizer statistics collection job
The job is owned by SYS, it gather statistics, it calls the gather stats program, started at 5pm last night and a nice comment defining what the job does and its scheduled as part of some windows group?
First of all, this job has multiple steps to it. The first one is a feature that I am actually quite fond of. Oracle goes out, inspects objects in the database and when it finds that they posses stale statistics, it populates this data to the SYS.DBA_TAB_MODIFICATIONS table. If you haven’t taken some time to familiarize yourself with this table, do so- I think this was one of the coolest new features added to 10g, at least for me, a DBA who is often obsessed with statistics, thought it was… 🙂 There are some drawbacks to it. Upon observing the behavior of this table, it appears to work through objects at the table level, so often it will cascade all the partitions into the table vs. inspecting at the partition level first for a table and as databases become larger, it only inspects a section of the database at a time, so it might not be as current as a DBA would like as databases become larger, (as to why this occurs, we will discuss soon…)
The second part is the statistics gathering, which is based off the objects in the DBA_TAB_MODIFICATIONS table. The job then collects statistics just on these stale objects. There isn’t any guessing, no collecting of stats on objects that may not be stale, which is a great time saver. The drawback, if the job exists in a large, high transaction database, this can mean a lot of rows in the DBA_TAB_MODIFICATIONS table that need to be cleared out during the maintenance window each night.
Now for the “Windows Group” information from the stats job I showed you earlier. This is the area that most people haven’t looked into and its most interesting:
select window_name, repeat_interval,duration, window_priority,
next_start_date, last_start_date, Comments
from dba_scheduler_windows;
1. WEEKNIGHT_WINDOW freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0 +000 08:00:00 <--Duration, allowed to run for 8 hrs max! LOW 23-NOV-10 10.00.00.200000 PM 22-NOV-10 10.00.00.092053 PM Weeknight window for maintenance task
2. WEEKEND_WINDOW freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0 +002 00:00:00 <--No limit to the weekend run, in fact, you can restart multiple times. LOW 27-NOV-10 12.00.00.000000 AM 20-NOV-10 06.00.01.202642 AM Weekend window for maintenance task
So, the maintenance tasks can run for a max of 8 hrs from 10pm on week nights, (will be ended at 6am, no matter if work is complete or not) and till the end with restarts on the weekends. I’m sure you are all following me so far, right?
Lets look at the GATHER_STATS_PROG and it’s details and you should be starting to build a picture here… 🙂
It’s main job is to collect statistics based on the following:
1. the tables found in the sys.DBA_TAB_MODIFICATIONS table.
2. The sample size set by by the database, (commonly at 25% but I’ve seen it determine 100% in some databases)
3. Deciding the method option for the tables, determining if histograms are required, etc..
4. Determining the date the job is to be processing stats for, the priority of what objects should be gathered and in what order.
5. Exiting the gather stats job at the end of the maintenance window and not “over-running” this window.
If you have a database that is not a small, OLTP, I would like you to now go an inspect your sys.DBA_TAB_MODIFCATIONS table with the following query:
select * from sys.dba_tab_modifications
where table_owner not in (‘SYS’,’SYSTEM’)
order by timestamp;
The timestamp is when the object found in the DBA_TAB_MODIFICATIONS table was marked stale by Oracle. Do you have objects that have been sitting in the table that are older than one week? One month? One year?!?!
First of all, you aren’t the only one- When I first came across this and was troubled, I started to “Google” as many of us do and found out that others are not too happy with this “safety mechanism” from Oracle that has the unformtunate side effect of leaving objects with stale statistics. If you are running these jobs and want to be more concerned, I would take out the where clause from the select and see how many of your data dictionary objects are stale. They are considered low priority and are often why your queries to you dba views are so slow, (IMO…)
There is always a light at the end of the tunnel. Oracle didn’t make it easy to update and change a lot of these features, but I can make recommendations that can fix the problems left to those with databases that might not look like “Uncle Larry’s” test lab.
1. 10g separated the stale collection from the stats job. I prefer this method and do like the stale collection job, which I have kept in all my production databases. Many DBA’s have implemented the 10g, release 2 version of the two separate jobs vs. the “all for one” job that is now in 11g by reverting to the older dbms_stats, (there are a couple ways to actually do this, I’m told….)
2. Segment advisor job- it’s set and scheduled to run right along with the stats job. I see no reason for this to run each night and it can be a huge burden on the system. Reschedule it to run once a week or better yet, let DBA’s run it on demand when they actually need it.
3. The stats job is set to defaults that can not be changed, especially many of the window group aspects. Do I agree with them? Not really. In a large OLAP environment with a large partitioned base, and few indexes, I’m not going to want a lot of histograms and prefer less information that might “bog” down my data dictionary and such. I also might want it to clear out all the data in the DBA_TAB_MODIFICATIONS table on the weekends when I have the time, but it just doesn’t work that way currently, working off the current date.
4. Monitor the DBA_TAB_MODIFICATIONS table. If you are still learning your environment, script it to notify the DBA’s when there are tables that have timestamps older than a certain retention time. If your dba views are starting to return slower than you would like, look into waits on obj$, tab$, col$, these often are a sign that your database might be giving TMI, (too much information) to the database with the automatic statistics gathering job, too!
What has been my solution? I actually wrote a shell script that utilizes the DBA_TAB_MODIFICATIONS table, dynamically creating the stats gathering scripts with lower sample sizes and a method option that gathers for my databases, depending on use, what is needed, but this also requires that I know my databases and I know my data- keep that in mind. As I said up above, I chose to keep the separated jobs, keeping the job to collect stale objects, disabling both the GATHER_STATS_JOB and the AUTO_SPACE_ADVISOR_JOB, then running my script to collect dynamically. I have a threshold set so if I suddenly dump 4000 partitions in the DBA_TAB_MODIFICATIONS, I won’t run into the work day, (My script’s weekend version clears everything out vs. the weekday that has that threshold setting…) This hasn’t been the answer for all databases, but it really took a lot of pressure off of the data dictionary, performance increase and my DBA_TAB_MODIFICATIONS table is empty each morning.
I’ll post my scripts to dynamically gather stats from DBA_TAB_MODIFICATIONS in this next week to my scripts page. I’m off this next week to train for three weeks in Canada and hoping to have some time to post while at the hotel!
Good luck!
Pingback: Tweets that mention DBA Kevlar » Blog Archive » The Problem with the Automated Statistics Gathering -- Topsy.com
As promised, I’ve added a Kshell page to the website and placed my dynamic stats gathering script. I am going to have to tweak the page a bit to not require so many word wrapping, but the shell script is finally out there, enjoy!
Hi,
Was googling across to find about best method to collect stats for a highly volatile database. I came across this website , pretty informative.
I could not find details for something mentioned…
“10g separated the stale collection from the stats job. I prefer this method and do like the stale collection job, which I have kept in all my production databases. Many DBA’s have implemented the 10g, release 2 version of the two separate jobs vs. the “all for one” job that is now in 11g by reverting to the older dbms_stats, (there are a couple ways to actually do this, I’m told….)”
How is it seperated in 10g and what changed in 11g.
I thought it is controlled by statistics_level (though cannot be disabled in 11g).
Thanks
GP Singh
Hi GP,
Look for jobs in 10g owned by the OCM, (in DBA_SCHEDULER_JOBS). I’ve always wondered what the original goal was of the Oracle configuration manager after seeing this… 🙂 You will see the two OCM jobs to collect stale objects as of 10g, release 2.
Well, I don’t know if that’s going to work for me, but definitely worked for you! 🙂 Excellent post!
Hi
I would like to know your scripts, i develop some of them but i am starting on this subject. Your code will be great for me. My databases are datawere House.
Thanks
I would like to know more about the statistics over partitions and if it this possible that you can shared your current code for stats that you mentioned above with me, it will be great.
If we run out of time during a maintenance window, does it gather stats on what it can and then stop?
Yes, exactly. A good way to see if you have a building queue of stale objects is to query a few tables-
sys.dba_tab_modifications, (no synonym)
dba_tab_statistics
The first is going to tell you when a object has experienced large changes, what kind of changes, (insert, update, delete) and the timestamp the object was added to the dba_tab_modifications list. This is fulfilled by a scheduler job that by default, runs late in the evening and is owned by ‘OCM’, (Oracle Configuration Manager) one of two that is owned by default. This is tracking changes by the setting stating what it means to have “stale stats’.
Select * from sys.dba_tab_modifications
where owner not in (‘SYS’,’SYSTEM’, ‘DBSNMP’);
From the query above, collect stats manually on one of the tables from the results. The table will be removed from dba_tab_modifications immediately after the completion of the stats collection.
Now to find what objects are currently stale, along with last_analyzed info for stale objects listed from the query below:
Select owner, table_name, [partition_name],last_analyzed from dba_tab_statistics
where stale_stats=’YES’;
Thank you very much for the clarification.I know there is lot of discussion about stale percentage but what is an ideal percentage for large tables.
Hi,
Could not able to find this script on your scripts page:
“I’ll post my scripts to dynamically gather stats from DBA_TAB_MODIFICATIONS in this next week to my scripts page. “
Apologies, I’ve now added it, you’ll see it at the top of my scripts page, https://dbakevlar.com/scripts/ and it’s called “Dynamically Collecting Modified Object Stats”
Thanks for pointing out it was missing!
Kellyn
Hi Kellyn,
Nice post on “The Problem with the Automated Statistics Gathering”.
I am facing the same problem with the automated statistics gathering in 11g.The batch process normally takes 2 Hrs to complete.Sometimes it takes 1 hours more to complete.Also i have observed that the auto stats is running during the batch job during the weekdays i:e from 10:00 PM to 03:00 AM.But when the auto tasks is scheduled on weekends the batch process takes 2 hour more i:e on weekends it starts at 06:00 AM.
After Activation Of DEFAULT_MAINTENANCE_PLAN User Sessions Are Stalled On “Resmgr:Cpu Quantum” (Doc ID 1952762.1).
Have checked the AWR for the Resmgr:Cpu Quantum wait event but the waits are not significant to take into consideration.
But in dba_tab_modifications table is populated with the tables which are very critical for the batch process & the auto stats job is not picking this tables for stats collection.The statistics for this tables are stale.
Initially the batch job used to run in due time.Now it is batch process is very erratic.
Could you please shed light what could be the main issue that is causing the batch process to take such a long time on weekdays & weekends.
Regards,
Shadab