The NO_INVALIDATE Option in DBMS_STATS with 10g

October 20, 2011 - By Kellyn Gorman

I had worked hard on a report, 47 SQL statements in all to tune it down from 5 hrs to under 30 minutes. The first runs had been quite successful, so when a third run sent an alert on temp usage, I knew something was wrong.

     SID PROCESS      MACHINE SQL_TEXT    SQL_ID     TOTAL MB
-------- ------------ -------------------- ---------------------
     507 1137	      HOST SELECT *** 7t3muww36xhzn  45516
     600 1139	      HOST SELECT *** 7t3muww36xhzn  45516
     525 1132	      HOST SELECT *** 7t3muww36xhzn  45516
     509 1135	      HOST SELECT *** 7t3muww36xhzn  45516

I checked the stats first, as one of the fixes was to ensure the staging tables in this process were collecting stats after the initial feeds came in, but both tables involved showed valid statistics:

SQL> select num_rows, last_analyzed from dba_tab_partitions 2 where table_name='<I_STAGE>' 3 and partition_name='P170';

  NUM_ROWS LAST_ANAL
---------- ---------
 480900000 17-OCT-11

SQL> select last_analyzed from dba_tables 2 where table_name='<SML_TBL>';

LAST_ANAL
---------
17-OCT-11

I ran a quick AWR report for the specific SQL_ID to see what I was dealing with, execution plan wise..

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     46198 18-Oct-11 09:00:09       282       7.9
  End Snap:     46200 18-Oct-11 10:00:13       245       8.4
   Elapsed:               60.07 (mins)
   DB Time:            1,874.62 (mins)

SQL ID: 7t3muww36xhzn           DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> SELECT ***

    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   324636810               4,785,428             4         46199          46199
2   4097803110                  1,047             1         46200          46200
          -------------------------------------------------------------

Plan 1(PHV: 324636810)
----------------------

Plan Statistics                 DB/Inst: PRODBASE/prodbase  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                         4,785,428    1,196,357.0     4.3
CPU Time (ms)                             3,450,070      862,517.6     8.4
Executions                                        4            N/A     N/A
Buffer Gets                                 759,453      189,863.3     0.0
Disk Reads                                  683,619      170,904.8     2.8
Parse Calls                                      35            8.8     0.0
Rows                                              0            0.0     N/A

Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |     9 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |     1 |   240 |     9 |       |       |  Q1,02 | P->S | QC (ORDER)
|   4 |     VIEW                      |             |     1 |   240 |     9 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |     1 |   120 |     9 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |     1 |   240 |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |     1 |   240 |       |       |       |  Q1,01 | P->P | RANGE
|   8 |         SORT ORDER BY STOPKEY |             |     1 |   240 |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |     1 |   120 |     5 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |     1 |   109 |     3 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |     1 |   109 |     3 |       |       |  Q1,00 | P->P | BROADCAST
|  12 |             PX BLOCK ITERATOR |             |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| I_STAGE     |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | SML_TBL     |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

This is the one that was eating up all the temp! Note that even though I checked stats, stats were correct as of the previous day, no changes to the partition stats, the execution plan only shows one row, which anyone who listens to Maria Colgan knows, that’s just Oracle giving you the benefit of the doubt and saying, “I don’t think there’s any rows in this object, (or sub-object in this case..) but I’ll give you 1 row for the fun of it!”

Second execution plan in the report is the one I desired:

Plan 2(PHV: 4097803110)
-----------------------

Plan Statistics                 DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                             1,047        1,046.7     0.0
CPU Time (ms)                                   967          967.0     0.0
Executions                                        1            N/A     N/A
Buffer Gets                                   2,007        2,007.0     0.0
Disk Reads                                        3            3.0     0.0
Parse Calls                                       9            9.0     0.0
Rows                                             40           40.0     N/A
User I/O Wait Time (ms)                           1            N/A     N/A

Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |       |  2834 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |  1272K|   291M|       |  2834 |       |       |  Q1,02 | P->S | QC
|   4 |     VIEW                      |             |  1272K|   291M|       |  2834 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |  1272K|   152M|   389M|  2834 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |    40 |  9600 |       |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |    40 |  9600 |       |       |       |       |  Q1,01 | P->P | RA
|   8 |         SORT ORDER BY STOPKEY |             |    40 |  9600 |       |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |  1272K|   152M|       |     7 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |  5005 | 55055 |       |     2 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |  5005 | 55055 |       |     2 |       |       |  Q1,00 | P->P | BR
|  12 |             PX BLOCK ITERATOR |             |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| SML_TBL     |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | I_STAGE     |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

So what changed? What impacted my statistics?

Upon investigation, I came to the conclusion that it is a combination of a “feature” with what I think is a bug in 10g dbms_stats.

A search of stats processing showed that during the one process that was executing against the P170 partition on the I_STAGE, there were a number of other partitions in this same table having stats gathered post loading.

declare v_stage_table_name varchar2(64); begin select min(stage_table_name) into v_stage_table_name from stage_tables

where stage_table_type_cd = '<I_STAGE>'; dbms_stats.gather_table_stats (ownname => 'dw_user', tabname => v_stage_table_name,

partname => 'P450' ,estimate_percent =>.01, granularity=>'PARTITION', method_opt=>'for all columns size 1',

no_invalidate=> false, cascade=>false, degree=>4); end;

Now the key here in the statement above is:

no_invalidate=>false

If you read the description for this from Oracle:

no_invalidate Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE. Use DBMS_STATS.AUTO_INVALIDATE. to have Oracle decide when to invalidate dependent cursors. This is the default. The default can be changed using the SET_PARAM Procedure.

The surmised bug is one where even though the dbms_stats being performed by another process is partition level, the invalidation of the cursors is across all partitions in the object, causing them all to be invalidated, requiring them to re-parse the SQL. (There are a number of similar bugs already documented in 10.2.0.4.0 for partition level statistics gathering…)

The feature to allow Oracle to re-parse and take advantage of the newest statistics information in the data dictionary resulted in a poor performance challenge in this instance, as the cursors were invalidated on a process that needed no changes to statistics.

I tested repeatedly against partitions, collecting stats with the no_invalidate set to false or true and even to AUTO to see what would occur and it consistently impacted my cursors against other partitions. I can find not documented bug, but as many know, I’m about to move everything to 11g in short order and expect it would be a waste of time to pursue it to far…

I notified the Java developer who owns this code to please update the no_invalidate=>true to correct the performance impact short term and look forward to 11g bugs to replace my exhaustion on 10g ones! 🙂

5 Comments on “The NO_INVALIDATE Option in DBMS_STATS with 10g”

Wolfgang Breitling says:

October 24, 2011 at 6:52 pm

Is the optimizer using partition statistics or global statistics? The fact that I see the KEY – KEY in PSTART and PSTOP leads me to believe that the optimizer does not know at parse time which partition(s) will be required, it only knows that at execution time it can do partition pruning. Therefore it needs to use (pro-rated) global statistics and therefore the no_invalidate affects sql going after other partition(s) ( unbeknownst to the optimizer ). Would be interesting if the same no_invalidate behaviour is there in cases where the optimizer does do partition pruning at parse time.
Wolfgang Breitling says:

October 25, 2011 at 11:47 am

” they are truncating the partitions and re-using them ”
Since they are truncating partitions they must know the partition, either by name or by partitioning key values – actually, I think for truncating you need to know the name. Why can’t they use that to indicate which partition they are after during subsequent processing. Then the optimizer definitely knows the partition at parse time.
Wolfgang Breitling says:

October 26, 2011 at 7:04 am

Yep, like Cary is saying ” 90% of performance tuning is politics”. I am paraphrasing. Don’t hold me to the exact quote or percentage.
I don’t understand “priorities for the business may not be the same as a priority for the DBA’s”. If it is not a priority for the business why should it be a priority for the DBA? Politics work both ways. You are skating close to CTD. If nobody else cares about performance why should you?

Comments are closed.