Locating UNKNOWN SQL_ID Info in OEM Through AWR
Rarely are reports based off large snapshot variances helpful to a DBA unless you come across an odd situation such as this one…Better yet, we need to know a little bit about our AWR tables behind our reports so we can piece together what the reports leave out…:)
Scenario: After-hours support has killed a session after high temp usage has occurred. You, as the primary DBA, are left to look into the issue the next day.
Your first attempt to inspect the issue is through Enterprise Manager, (OEM) and you are surprised that very little activity is actually showing up that
is of concern. You can see pink areas, something that would flag you, but they aren’t showing up in the active sessions in the left- only showing in the right.
What’s going on here?
Now, taking the right side into consideration, you do see four sessions, easily assumed to be parallel slaves, processing away.
When drilling down into the left side SQL_ID sessions, you not that none of the top right side sessions are showing up for the SQL_ID’s in question.
When drilling down into the right side four sessions, SID’s, all of them show no SQL_ID’s connected and are listed as “UNKNOWN”.
How as a DBA, do you locate this data? Your first quest will be to run an ASH report, but it will only show you the problem waits, type of waits, not the SQL that was the
cause of the issue. How do you find the SQL/SQL_ID for the SID’s?
ASH Report for the snapshot in question:
DB Id DB Name Inst Num Instance ----------- ------------ -------- ------------ xxxxxxxxx DBMART 1 mart1
Specify the Report Type ~~~~~~~~~~~~~~~~~~~~~~~ Enter 'html' for an HTML report, or 'text' for plain text Defaults to 'html' Enter value for report_type: text
Type Specified: text
Defaults to current database
Using database id: xxxxxxxxxxxxxxx
Defaults to current instance
Using instance number: 1
ASH Samples in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oldest ASH sample available: 12-Sep-11 12:20:34 [ 24306 mins in the past] Latest ASH sample available: 29-Sep-11 09:26:08 [ 0 mins in the past]
Specify the timeframe to generate the ASH report ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter begin time for report:
-- Valid input formats: -- To specify absolute begin time: -- [MM/DD[/YY]] HH24:MI[:SS] -- Examples: 02/23/03 14:30:15 -- 02/23 14:30:15 -- 14:30:15 -- 14:30 -- To specify relative begin time: (start with '-' sign) -- -[HH24:]MI -- Examples: -1:15 (SYSDATE - 1 Hr 15 Mins) -- -25 (SYSDATE - 25 Mins)
Defaults to -15 mins Enter value for begin_time: 09/27/11 16:00:00 Report begin time specified: 09/27/11 16:00:00
Enter duration in minutes starting from begin time: Defaults to SYSDATE - begin_time Press Enter to analyze till current time Enter value for duration: 240
Enter a name for the report or a default will be used, (I prefer a naming convention of: <report type, (ash, awr, addm)>_<sid>_<snapshot>.<txt/html>
View the ASH report for the time period, going to the top events first:
Event % Event P1 Value, P2 Value, P3 Value % Activity ------------------------------ ------- ----------------------------- ---------- Parameter 1 Parameter 2 Parameter 3 -------------------------- -------------------------- -------------------------- PX Deq Credit: send blkd 58.37 "268501009","3","0" 1.97 sleeptime/senderid passes qref
"268501010","3","0" 1.97
"268501010","4","0" 1.97
There’s our issue!
Now using this info, we know we are looking at a parallel, (PX) issue that we can then take to the next
step to identify the process and see if it is connected to one of our SID’s we noted in the OEM console:
Sid, Serial# % Activity Event % Event --------------- ---------- ------------------------------ ---------- User Program # Samples Active XIDs -------------------- ------------------------------ ------------------ -------- 2170, 3249 16.27 PX Deq Credit: send blkd 14.75 DM_USER oracle@host1 (P020) 11K/14K [ 74%] 0
CPU + Wait for CPU 1.46 1,059/14K [ 7%] 0
2138, 585 16.13 PX Deq Credit: send blkd 14.55 DM_USER oracle@host1 (P022) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.53 1,107/14K [ 8%] 0
2130, 581 16.11 PX Deq Credit: send blkd 14.53 DM_USER oracle@host1 (P023) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.54 1,113/14K [ 8%] 0
2129, 533 16.03 PX Deq Credit: send blkd 14.54 DM_USER oracle@host1 (P021) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.41 1,024/14K [ 7%] 0
Note that is also reports back the same four sids, all showing as “UNKNOWN” (2170, 2129,2130 and 2138) as we see in the OEM console..)
It just doesn’t have any SQL or SQL_ID’s to connect with them.
The coordinating SID is caught in the Top sessions running PQ’s, so we will see this as our identifier, even if it wasn’t seen in the OEM console:
Sid,Srl# (Inst) % Activity SQL ID Event % Event --------------- ---------- ------------- ----------------------------- -------- User Program -------------------- ------------------------------ 2113,UNKNOWN(1) 80.89 PX Deq Credit: send blkd 58.37
This is very important, as we need this to track further, outside of the report.
Using the coordinator, then query the DBA_HIST_ACTIVE_SESS_HISTORY from during the snapshots to capture the main SQL_ID:
select * from sys.DBA_HIST_ACTIVE_SESS_HISTORY where session_id=2113 and snap_id between 717 and 727;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# USER_ID SQL_ID 27-SEP-11 10.33.39.946 PM 2113 143 255 5dpcst074nk81 27-SEP-11 10.33.49.976 PM 2113 143 255 5dpcst074nk81
You can then pull the data that is connected to the parent that will tell you what session experienced the problem:
select * from sys.WRH$_SQLTEXT where sql_id='5dpcst074nk81';
SNAP_ID DBID SQL_ID SQL_TEXT COMMAND_TYPE REF_COUNT 760 2778270765 5dpcst074nk81 <CLOB> 1 0
the clob then shows the SQL involved in the executed session, which then tells you what your pain is on:
create table dm_user.tbl_iot ( col1 , col2 , col3 , col4.... constraint tbl_pk primary key (col1)) organization index.... <--Here's our key, it's an IOT, which includes the PK!
tablespace mart_data1 pctfree 0 nologging parallel 4 as....
We can now find the secondary SQL_ID that correponds to most of the combined hit to the primary SQL that will show in the following statement:
select distinct SQL_ID, optimizer_cost from dba_hist_sqlstat where snap_id between 717 and 727 and plsexec_time_delta=0 order by optimizer_cost desc;
SQL_ID OPTIMIZER_COST
fjbbxnyc66t7c 256350
5dpcst074nk81 253906 <--Here was the one shown, it will be the one above!
1xg7tkc156a1m 5477 24hc2470c87up 5302 dyd4b36t1ppph 4865 gfjvxb25b773h 2331 3wy90uysqgcfx 1974 a55tay7577psc 1922 00a15fn17bx7p 1912 b92jqqxwt3tfd 1725 cf621qmts91wf 1028 5atpa8vj2gakz 1007 7qskskbx91q7t 992
Now run an awrsqrpt.sql to see the hit for the SQL_ID in question:
If you have never run one of these, it’s a great report for unique info on a SQL Report for a given run. For this one, we just want to see WHAT was running:
You will need your snapshot ID’s and the SQL_ID to create the report and the report resides in $ORACLE_HOME/rdbms/admin. Yes, I prefer the text version of the HTML, but to each their own… 🙂
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ DBMART xxxxxxxxx mart1 1 10.2.0.4.0 NO host1
Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 709 27-Sep-11 14:30:31 57 6.3 End Snap: 727 27-Sep-11 23:30:53 37 5.1 Elapsed: 540.37 (mins) DB Time: 2,726.16 (mins)
SQL Summary DB/Inst: MARTF/martf Snaps: 709-727
Elapsed SQL Id Time (ms) ------------- ---------- fjbbxnyc66t7c ########## CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("IBID") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NOLOGGING parallel 4 as select col1 , col2 , col3 , col4 , col5 , col6
-------------------------------------------------------------
SQL ID: fjbbxnyc66t7c DB/Inst: MARTF/martf Snaps: 709-727 -> 1st Capture and Last Capture Snap IDs refer to Snapshot IDs witin the snapshot range -> CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("COL...
Plan Hash Total Elapsed 1st Capture Last Capture # Value Time(ms) Executions Snap ID Snap ID --- ---------------- ---------------- ------------- ------------- -------------- 1 3019078213 115,239,821 1 710 726 -------------------------------------------------------------
Plan 1(PHV: 3019078213) -----------------------
Plan Statistics DB/Inst: DBMART/mart1 Snaps: 709-727 -> % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap ---------------------------------------- ---------- -------------- ------- Elapsed Time (ms) ########## 115,239,820.7 70.5 CPU Time (ms) ########## 23,668,620.1 45.7 Executions 1 N/A N/A Buffer Gets ########## 940,627,725.0 70.9 Disk Reads 7,761,573 7,761,573.0 19.4 Parse Calls 9 9.0 0.0 Rows 0 0.0 N/A User I/O Wait Time (ms) 1,949,762 N/A N/A Cluster Wait Time (ms) 0 N/A N/A Application Wait Time (ms) 0 N/A N/A Concurrency Wait Time (ms) 301,390 N/A N/A Invalidations 0 N/A N/A Version Count 17 N/A N/A Sharable Mem(KB) 9,285 N/A N/A -------------------------------------------------------------
-------------------------------------------------------------
Execution Plan ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------ | 0 | CREATE INDEX STATEMENT | | | | 256K| | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10001 | 637M| 229G| | | | Q1,01 | P->S | QC (ORDER) | | 3 | INDEX BUILD UNIQUE | PRIM_PK | | | | | | Q1,01 | PCWP | | | 4 | SORT CREATE INDEX | | 637M| 229G| | | | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 637M| 229G| 136K| | | Q1,01 | PCWP | | | 6 | PX SEND RANGE | :TQ10000 | 637M| 229G| 136K| | | Q1,00 | P->P | RANGE | | 7 | PX BLOCK ITERATOR | | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWC | | | 8 | TABLE ACCESS FULL| PRIM_TBL | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------
Note ----- - cpu costing is off (consider enabling it) - estimated index size: 257G bytes
Full SQL Text
SQL ID SQL Text ------------ ----------------------------------------------------------------- fjbbxnyc66t7 CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT" "("COL1") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NO LOGGING parallel 4 as select ...<want to see the full SQL statement, yes this report will show you the output...>
We now can see that our “offender” was the primary key creation on the IOT create statement.
Now you have the Information on the “Unknown” offender that was showing in OEM. Nothing can hide from a DBA for long! 🙂
Thanks for the post! Very usefull for me 🙂
Regards,
Pavel.