DBA Kevlar

Tips, tricks, (and maybe a few rants) so more DBAs become bulletproof!

The NO_INVALIDATE Option in DBMS_STATS with 10g

I had worked hard on a report, 47 SQL statements in all to tune it down from 5 hrs to under 30 minutes.  The first runs had been quite successful, so when a third run sent an alert on temp usage, I knew something was wrong.

     SID PROCESS      MACHINE SQL_TEXT    SQL_ID     TOTAL MB
-------- ------------ -------------------- ---------------------
     507 1137	      HOST SELECT *** 7t3muww36xhzn  45516
     600 1139	      HOST SELECT *** 7t3muww36xhzn  45516
     525 1132	      HOST SELECT *** 7t3muww36xhzn  45516
     509 1135	      HOST SELECT *** 7t3muww36xhzn  45516

I checked the stats first, as one of the fixes was to ensure the staging tables in this process were collecting stats after the initial feeds came in, but both tables involved showed valid statistics:

SQL> select num_rows, last_analyzed from dba_tab_partitions 2 where table_name='<I_STAGE>' 3 and partition_name='P170';

  NUM_ROWS LAST_ANAL
---------- ---------
 480900000 17-OCT-11
SQL> select last_analyzed from dba_tables 2 where table_name='<SML_TBL>';

LAST_ANAL
---------
17-OCT-11

I ran a quick AWR report for the specific SQL_ID to see what I was dealing with, execution plan wise..


              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     46198 18-Oct-11 09:00:09       282       7.9
  End Snap:     46200 18-Oct-11 10:00:13       245       8.4
   Elapsed:               60.07 (mins)
   DB Time:            1,874.62 (mins)

 

SQL ID: 7t3muww36xhzn           DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> SELECT ***

    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   324636810               4,785,428             4         46199          46199
2   4097803110                  1,047             1         46200          46200
          -------------------------------------------------------------

Plan 1(PHV: 324636810)
----------------------

Plan Statistics                 DB/Inst: PRODBASE/prodbase  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                         4,785,428    1,196,357.0     4.3
CPU Time (ms)                             3,450,070      862,517.6     8.4
Executions                                        4            N/A     N/A
Buffer Gets                                 759,453      189,863.3     0.0
Disk Reads                                  683,619      170,904.8     2.8
Parse Calls                                      35            8.8     0.0
Rows                                              0            0.0     N/A
Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |     9 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |     1 |   240 |     9 |       |       |  Q1,02 | P->S | QC (ORDER)
|   4 |     VIEW                      |             |     1 |   240 |     9 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |     1 |   120 |     9 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |     1 |   240 |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |     1 |   240 |       |       |       |  Q1,01 | P->P | RANGE
|   8 |         SORT ORDER BY STOPKEY |             |     1 |   240 |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |     1 |   120 |     5 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |     1 |   109 |     3 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |     1 |   109 |     3 |       |       |  Q1,00 | P->P | BROADCAST
|  12 |             PX BLOCK ITERATOR |             |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| I_STAGE     |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | SML_TBL     |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

This is the one that was eating up all the temp! Note that even though I checked stats, stats were correct as of the previous day, no changes to the partition stats, the execution plan only shows one row, which anyone who listens to Maria Colgan knows, that’s just Oracle giving you the benefit of the doubt and saying, “I don’t think there’s any rows in this object, (or sub-object in this case..) but I’ll give you 1 row for the fun of it!”

Second execution plan in the report is the one I desired:

Plan 2(PHV: 4097803110)
-----------------------

Plan Statistics                 DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                             1,047        1,046.7     0.0
CPU Time (ms)                                   967          967.0     0.0
Executions                                        1            N/A     N/A
Buffer Gets                                   2,007        2,007.0     0.0
Disk Reads                                        3            3.0     0.0
Parse Calls                                       9            9.0     0.0
Rows                                             40           40.0     N/A
User I/O Wait Time (ms)                           1            N/A     N/A
Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |       |  2834 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |  1272K|   291M|       |  2834 |       |       |  Q1,02 | P->S | QC
|   4 |     VIEW                      |             |  1272K|   291M|       |  2834 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |  1272K|   152M|   389M|  2834 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |    40 |  9600 |       |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |    40 |  9600 |       |       |       |       |  Q1,01 | P->P | RA
|   8 |         SORT ORDER BY STOPKEY |             |    40 |  9600 |       |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |  1272K|   152M|       |     7 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |  5005 | 55055 |       |     2 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |  5005 | 55055 |       |     2 |       |       |  Q1,00 | P->P | BR
|  12 |             PX BLOCK ITERATOR |             |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| SML_TBL     |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | I_STAGE     |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

So what changed? What impacted my statistics?

Upon investigation, I came to the conclusion that it is a combination of a “feature” with what I think is a bug in 10g dbms_stats.

A search of stats processing showed that during the one process that was executing against the P170 partition on the I_STAGE, there were a number of other partitions in this same table having stats gathered post loading.

declare v_stage_table_name varchar2(64); begin select min(stage_table_name) into v_stage_table_name from stage_tables 
where stage_table_type_cd = '<I_STAGE>'; dbms_stats.gather_table_stats (ownname => 'dw_user', tabname => v_stage_table_name, 
partname => 'P450' ,estimate_percent =>.01, granularity=>'PARTITION', method_opt=>'for all columns size 1', 
no_invalidate=> false, cascade=>false, degree=>4); end;

Now the key here in the statement above is:

no_invalidate=>false

If you read the description for this from Oracle:

no_invalidate Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE. Use DBMS_STATS.AUTO_INVALIDATE. to have Oracle decide when to invalidate dependent cursors. This is the default. The default can be changed using the SET_PARAM Procedure.

The surmised bug is one where even though the dbms_stats being performed by another process is partition level, the invalidation of the cursors is across all partitions in the object, causing them all to be invalidated, requiring them to re-parse the SQL.  (There are a number of similar bugs already documented in 10.2.0.4.0 for partition level statistics gathering…)

The feature to allow Oracle to re-parse and take advantage of the newest statistics information in the data dictionary resulted in a poor performance challenge in this instance, as the cursors were invalidated on a process that needed no changes to statistics.

I tested repeatedly against partitions, collecting stats with the no_invalidate set to false or true and even to AUTO to see what would occur and it consistently impacted my cursors against other partitions.  I can find not documented bug, but as many know, I’m about to move everything to 11g  in short order and expect it would be a waste of time to pursue it to far…

I notified the Java developer who owns this code to please update the no_invalidate=>true to correct the performance impact short term and look forward to 11g bugs to replace my exhaustion on 10g ones! :)

Print Friendly

facebook comments:

5 thoughts on “The NO_INVALIDATE Option in DBMS_STATS with 10g
  • Wolfgang Breitling says:

    Is the optimizer using partition statistics or global statistics? The fact that I see the KEY – KEY in PSTART and PSTOP leads me to believe that the optimizer does not know at parse time which partition(s) will be required, it only knows that at execution time it can do partition pruning. Therefore it needs to use (pro-rated) global statistics and therefore the no_invalidate affects sql going after other partition(s) ( unbeknownst to the optimizer ). Would be interesting if the same no_invalidate behaviour is there in cases where the optimizer does do partition pruning at parse time.

  • Hi Wolfgang,
    I would be pleased to say that this process has “matured” as others have and partition pruning is occurring, but unfortunately, they are truncating the partitions and re-using them, (never been a fan of truncating, prefer pruning.) Global and partition level statistics are in place, global are utilized by default and I agree, this is the “feature” that is causing me performance degradation in the process I have detailed above.
    I am proud of how educated the developers have become in the CBO over the last two years at the company, but there is always more to learn and more to ponder…
    Thank you for your feedback, you have explained in one sentence what I did not in a number of paragraphs!

  • Wolfgang Breitling says:

    ” they are truncating the partitions and re-using them ”
    Since they are truncating partitions they must know the partition, either by name or by partitioning key values – actually, I think for truncating you need to know the name. Why can’t they use that to indicate which partition they are after during subsequent processing. Then the optimizer definitely knows the partition at parse time.

  • You are correct- they must know the partition name to truncate it, but the truncate happens at the end of the processing, which is then updated in a look up table for allocating available partitions for the next run. This is very old code and the business views it as “if it’s not broke, why fix it?” Other processes, we have matured to indicate the partition name, (table_name partition partition_name), these ones have not. As with many companies, priorities for the business may not be the same as a priority for the DBA’s… :)
    With the process type, where 95% of the processing involves CTAS, collect statistics, use the data, never use it again, resulting in a truncate or drop, (truncating the partition for the one described in this scenario…) the change to the no_invalidate was an option for a work around that made more sense to me than allocating a valuable development resource that was working on what the business needed him for now vs. a change in the logic and code that would have taken more time.

  • Wolfgang Breitling says:

    Yep, like Cary is saying ” 90% of performance tuning is politics”. I am paraphrasing. Don’t hold me to the exact quote or percentage.
    I don’t understand “priorities for the business may not be the same as a priority for the DBA’s”. If it is not a priority for the business why should it be a priority for the DBA? Politics work both ways. You are skating close to CTD. If nobody else cares about performance why should you?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Facebook
  • Google+
  • LinkedIn
  • Twitter