Locating UNKNOWN SQL_ID Info in OEM Through AWR

Rarely are reports based off large snapshot variances helpful to a DBA unless you come across an odd situation such as this one…Better yet, we need to know a little bit about our AWR tables behind our reports so we can piece together what the reports leave out…:)

Scenario:  After-hours support has killed a session after high temp usage has occurred.  You, as the primary DBA, are left to look into the issue the next day.
Your first attempt to inspect the issue is through Enterprise Manager, (OEM) and you are surprised that very little activity is actually showing up that
is of concern.  You can see pink areas, something that would flag you, but they aren’t showing up in the active sessions in the left-  only showing in the right.
What’s going on here?

Now, taking the right side into consideration, you do see four sessions, easily assumed to be parallel slaves, processing away.
When drilling down into the left side SQL_ID sessions, you not that none of the top right side sessions are showing up for the SQL_ID’s in question.

When drilling down into the right side four sessions, SID’s, all of them show no SQL_ID’s connected and are listed as “UNKNOWN”.

How as a DBA, do you locate this data?  Your first quest will be to run an ASH report, but it will only show you the problem waits, type of waits, not the SQL that was the
cause of the issue.  How do you find the SQL/SQL_ID for the SID’s?

ASH Report for the snapshot in question:

  DB Id    DB Name      Inst Num Instance
----------- ------------ -------- ------------
 xxxxxxxxx DBMART               1 mart1
Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type: text
Type Specified:  text
Defaults to current database
Using database id: xxxxxxxxxxxxxxx
Defaults to current instance
Using instance number: 1
ASH Samples in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oldest ASH sample available:  12-Sep-11 12:20:34   [  24306 mins in the past]
Latest ASH sample available:  29-Sep-11 09:26:08   [      0 mins in the past]
Specify the timeframe to generate the ASH report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter begin time for report:
--    Valid input formats:
--      To specify absolute begin time:
--        [MM/DD[/YY]] HH24:MI[:SS]
--        Examples: 02/23/03 14:30:15
--                  02/23 14:30:15
--                  14:30:15
--                  14:30
--      To specify relative begin time: (start with '-' sign)
--        -[HH24:]MI
--        Examples: -1:15  (SYSDATE - 1 Hr 15 Mins)
--                  -25    (SYSDATE - 25 Mins)
Defaults to -15 mins
Enter value for begin_time: 09/27/11 16:00:00
Report begin time specified: 09/27/11 16:00:00
Enter duration in minutes starting from begin time:
Defaults to SYSDATE - begin_time
Press Enter to analyze till current time
Enter value for duration: 240
Enter a name for the report or a default will be used, (I prefer a naming convention of:
<report type, (ash, awr, addm)>_<sid>_<snapshot>.<txt/html>
View the ASH report for the time period, going to the top events first:
Event                          % Event  P1 Value, P2 Value, P3 Value % Activity
------------------------------ ------- ----------------------------- ----------
Parameter 1                Parameter 2                Parameter 3
-------------------------- -------------------------- --------------------------
PX Deq Credit: send blkd         58.37           "268501009","3","0"       1.97
sleeptime/senderid         passes                     qref
                                                 "268501010","3","0"       1.97
                                                 "268501010","4","0"       1.97

There’s our issue!

Now using this info, we know we are looking at a parallel, (PX) issue that we can then take to the next
step to identify the process and see if it is connected to one of our SID’s we noted in the OEM console:

   Sid, Serial# % Activity Event                             % Event
--------------- ---------- ------------------------------ ----------
User                 Program                          # Samples Active     XIDs
-------------------- ------------------------------ ------------------ --------
     2170, 3249      16.27 PX Deq Credit: send blkd            14.75
DM_USER             oracle@host1 (P020)             11K/14K [ 74%]        0
                           CPU + Wait for CPU                   1.46
                                                      1,059/14K [  7%]        0
     2138,  585      16.13 PX Deq Credit: send blkd            14.55
DM_USER             oracle@host1 (P022)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.53
                                                      1,107/14K [  8%]        0
     2130,  581      16.11 PX Deq Credit: send blkd            14.53
DM_USER             oracle@host1 (P023)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.54
                                                      1,113/14K [  8%]        0
     2129,  533      16.03 PX Deq Credit: send blkd            14.54
DM_USER             oracle@host1 (P021)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.41
                                                      1,024/14K [  7%]        0

Note that is also reports back the same four sids, all showing as “UNKNOWN” (2170, 2129,2130 and 2138) as we see in the OEM console..)
It just doesn’t have any SQL or SQL_ID’s to connect with them.

The coordinating SID is caught in the Top sessions running PQ’s, so we will see this as our identifier, even if it wasn’t seen in the OEM console:

Sid,Srl# (Inst) % Activity SQL ID        Event                          % Event
--------------- ---------- ------------- ----------------------------- --------
User                 Program
-------------------- ------------------------------
2113,UNKNOWN(1)      80.89               PX Deq Credit: send blkd         58.37

This is very important, as we need this to track further, outside of the report.

Using the coordinator, then query the DBA_HIST_ACTIVE_SESS_HISTORY from during the snapshots to capture the main SQL_ID:

select * from sys.DBA_HIST_ACTIVE_SESS_HISTORY 
where session_id=2113
and snap_id between 717 and 727;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# USER_ID SQL_ID
27-SEP-11 10.33.39.946 PM 2113 143 255 5dpcst074nk81
27-SEP-11 10.33.49.976 PM 2113 143 255 5dpcst074nk81

You can then pull the data that is connected to the parent that will tell you what session experienced the problem:

select * from sys.WRH$_SQLTEXT 
where sql_id='5dpcst074nk81';
SNAP_ID DBID  SQL_ID  SQL_TEXT COMMAND_TYPE REF_COUNT
760 2778270765 5dpcst074nk81 <CLOB>   1            0

the clob then shows the SQL involved in the executed session, which then tells you what your pain is on:

create table dm_user.tbl_iot
(
  col1 ,
  col2 ,
  col3 ,
  col4....
constraint tbl_pk
primary key (col1))
organization index.... <--Here's our key, it's an IOT, which includes the PK!
tablespace mart_data1 pctfree 0 nologging parallel 4 as....

We can now find the secondary SQL_ID that correponds to most of the combined hit to the primary SQL that will show in the following statement:

select distinct SQL_ID, optimizer_cost from dba_hist_sqlstat 
where snap_id between 717 and 727
and plsexec_time_delta=0
order by optimizer_cost desc;
SQL_ID OPTIMIZER_COST
fjbbxnyc66t7c 256350
5dpcst074nk81 253906 <--Here was the one shown, it will be the one above!
1xg7tkc156a1m 5477
24hc2470c87up 5302
dyd4b36t1ppph 4865
gfjvxb25b773h 2331
3wy90uysqgcfx 1974
a55tay7577psc 1922
00a15fn17bx7p 1912
b92jqqxwt3tfd 1725
cf621qmts91wf 1028
5atpa8vj2gakz 1007
7qskskbx91q7t 992

Now run an awrsqrpt.sql to see the hit for the SQL_ID in question:

If you have never run one of these, it’s a great report for unique info on a SQL Report for a given run.  For this one, we just want to see WHAT was running:
You will need your snapshot ID’s and the SQL_ID to create the report and the report resides in $ORACLE_HOME/rdbms/admin.  Yes, I prefer the text version of the HTML, but to each their own… 🙂

 

WORKLOAD REPOSITORY SQL Report

Snapshot Period Summary
DB Name         DB Id    Instance     Inst Num Release     RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
DBMART         xxxxxxxxx mart1               1 10.2.0.4.0  NO  host1
              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       709 27-Sep-11 14:30:31        57       6.3
  End Snap:       727 27-Sep-11 23:30:53        37       5.1
   Elapsed:              540.37 (mins)
   DB Time:            2,726.16 (mins)
SQL Summary                               DB/Inst: MARTF/martf  Snaps: 709-727
                Elapsed
   SQL Id      Time (ms)
------------- ----------
fjbbxnyc66t7c ##########
CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("IBID") INDEX
ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NOLOGGING parallel 4 as select
col1 , col2 , col3 , col4 , col5 , col6
          -------------------------------------------------------------
SQL ID: fjbbxnyc66t7c                     DB/Inst: MARTF/martf  Snaps: 709-727
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("COL...
    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   3019078213            115,239,821             1           710            726
          -------------------------------------------------------------
Plan 1(PHV: 3019078213)
-----------------------
Plan Statistics                           DB/Inst: DBMART/mart1  Snaps: 709-727
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                        ##########  115,239,820.7    70.5
CPU Time (ms)                            ##########   23,668,620.1    45.7
Executions                                        1            N/A     N/A
Buffer Gets                              ##########  940,627,725.0    70.9
Disk Reads                                7,761,573    7,761,573.0    19.4
Parse Calls                                       9            9.0     0.0
Rows                                              0            0.0     N/A
User I/O Wait Time (ms)                   1,949,762            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                        0            N/A     N/A
Concurrency Wait Time (ms)                  301,390            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                    17            N/A     N/A
Sharable Mem(KB)                              9,285            N/A     N/A
          -------------------------------------------------------------
          -------------------------------------------------------------
Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT   |          |       |       |   256K|       |       |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |       |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |   637M|   229G|       |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    INDEX BUILD UNIQUE    | PRIM_PK  |       |       |       |       |       |  Q1,01 | PCWP |            |
|   4 |     SORT CREATE INDEX    |          |   637M|   229G|       |       |       |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE          |          |   637M|   229G|   136K|       |       |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANGE      | :TQ10000 |   637M|   229G|   136K|       |       |  Q1,00 | P->P | RANGE      |
|   7 |        PX BLOCK ITERATOR |          |   637M|   229G|   136K|     1 |     4 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| PRIM_TBL |   637M|   229G|   136K|     1 |     4 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------
Note
-----
   - cpu costing is off (consider enabling it)
   - estimated index size: 257G bytes
Full SQL Text
SQL ID       SQL Text
------------ -----------------------------------------------------------------
fjbbxnyc66t7 CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"
             "("COL1") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NO
             LOGGING parallel 4 as select ...<want to see the full SQL statement, yes
             this report will show you the output...>

We now can see that our “offender” was the primary key creation on the IOT create statement.

Now you have the Information on the “Unknown” offender that was showing in OEM.  Nothing can hide from a DBA for long! 🙂

Print Friendly
September 29th, 2011 by

facebook comments:

  • Thanks for the post! Very usefull for me 🙂

    Regards,
    Pavel.

  • Facebook
  • Google+
  • LinkedIn
  • Twitter