Oracle

Gathering Statistics on Highly Partitioned Tables

method_opt=>’for all columns size 1′

This is the method option for DBMS_STATS.GATHER_TABLE_STATS that is commonly recommended- “Let Oracle look at all the columns and decide what kind of statistics, histograms, etc. need to be gathered on each…” I’ve never been real thrilled about it, especially in a data warehouse or mart environment where we rarely utilize more than a few columns per table in our where clauses. Really, why does Oracle need so much specific info on columns it’s not doing anything other than selecting? We have a set of columns that we consistently use in our where clause- these are the ones that we need info on and the basic table info…

Here’s where this common recommendation becomes a potential disaster in a largely partitioned environment:
This can cause the cluster to grow out of control in the SYSTEM tablespace that manages statistics and histograms on partitioned columns! There is a “feature” in Oracle that causes this to happen if you have a heavily partitioned database, reuse partitions or use exchange partition.

select segment_name, bytes/1024/1024 MB from dba_Segments
where tablespace_name=’SYSTEM’
order by bytes desc;

SEGMENT_NAME MB
C_OBJ#_INTCOL# 2755 <–Cluster created from the tables and indexes that contain column info stats and yes, this is in SYSTEM!
HIST_HEAD$ 280
I_H_OBJ#_COL# 216
I_HH_OBJ#_INTCOL# 144
I_HH_OBJ#_COL# 144

The only way Oracle states you can *safely* correct this is to rebuild you database from scratch. You can’t shrink this, rebuild it, etc. I tested this fully in one of the marts, since these databases are rebuilt from scratch each new mart and if we set up larger tables to use this method of collecting statistics, we quickly ran up a 13GB SYSTEM tablespace, which may not seem like much, but for SYSTEM, that is outrageous and will make your data dictionary SLOW AS MOLASSAS… 

The safest and best choice for method_opt, due to the fact that we pretty much ALWAYS index our partition key is:
method_opt=>’for all indexed columns’ or collect stats on just the indexed columns, those are the columns in our where clauses…

for new tables that are partitioned without indexing, (which will be our goal with a star schema, we will be looking at the partitioning key and grouping by this for dynamic scripts or for in code:
method_opt=>’for columns size <# buckets if desired>’ collect stats/histograms on our partition key/and or any columns we add to the method_opt that is consistently needed in our where clauses.