Oct 04 2011

OOW11 Dinners

Category: DBA LifeKellyn Pot'Vin @ 12:27 pm

Yes, typing on my tablet screen again, so patience with my short posts…:-)

Had the pleasure of attending both the ACE and Oak table dinners the last two nights.  Wonderful, impressive and technically gifted people at every table and a fantastic opportunity to meet so many that I’ve only known virtually.  I enjoyed another set of high energy conversations with Gwen Shapira, Debra Lilley, Robyn Sands and Lisa Dobsen- all women who make me proud of the representatives of my gender in the technical world. 

Spent sometime with Mark Bobak, Kent G., Alex G., Craig S., Jeremy Schneider, Mike Swing and Yuri Y.  I was also so thrilled to spend time with a virtual team member, Andy Klock- great guy to work with and happy to meet in person.  Carol Dacko did a phenomenal job planning the Oak Table event and I know Robyn Sands helped with some of the arrangements, too.  Mogens N. Is beyond entertaining and his legend is intact another year.  I threatened to stalk Tanel unless he signed my Exadata book, (thank you, thank you Kerry Osbourne for the copy…) and still am missing mentioning many others that should be named here.  All made an impact and were a pleasure to meet.

Food and spirits pale in comparison to the wonderful opportunity these dinners offer us all to sit and speak with the peers we admire so much…


Oct 04 2011

First Day at Oracle Open World

Category: DBA LifeKellyn Pot'Vin @ 12:05 pm

After a long first day and a 33 hr train ride in, I’m ready for a fun and enjoyable Ace dinner that I’m a lucky guest to. 

First day, being Sunday, is commonly quiet, but Tim had two sessions that I wanted to attend and time flew by with meetups, expert panels, etc. I met a number of great new people today and saw some more old friends, (or as “new blood”, can I refer to them as old?) 

I was still recovering after a fantastic get-together at Graham Woods from lastnight.  I was impressed with the great group of people inattendance.  DBA Gods like Tim Gorman, Alex Gorbachev, Cary Millsap, Kerry Osbourne and even demi-Gods like Frits Hoogland and Greg Rahn were there.  I met,(finally) so many others that have been on my list.


Sep 29 2011

Locating UNKNOWN SQL_ID Info in OEM Through AWR

Category: OracleKellyn Pot'Vin @ 2:18 pm

Rarely are reports based off large snapshot variances helpful to a DBA unless you come across an odd situation such as this one…Better yet, we need to know a little bit about our AWR tables behind our reports so we can piece together what the reports leave out…:)

Scenario:  After-hours support has killed a session after high temp usage has occurred.  You, as the primary DBA, are left to look into the issue the next day.
Your first attempt to inspect the issue is through Enterprise Manager, (OEM) and you are surprised that very little activity is actually showing up that
is of concern.  You can see pink areas, something that would flag you, but they aren’t showing up in the active sessions in the left-  only showing in the right.
What’s going on here?

Now, taking the right side into consideration, you do see four sessions, easily assumed to be parallel slaves, processing away.
When drilling down into the left side SQL_ID sessions, you not that none of the top right side sessions are showing up for the SQL_ID’s in question.

When drilling down into the right side four sessions, SID’s, all of them show no SQL_ID’s connected and are listed as “UNKNOWN”.

How as a DBA, do you locate this data?  Your first quest will be to run an ASH report, but it will only show you the problem waits, type of waits, not the SQL that was the
cause of the issue.  How do you find the SQL/SQL_ID for the SID’s?

ASH Report for the snapshot in question:

  DB Id    DB Name      Inst Num Instance
----------- ------------ -------- ------------
 xxxxxxxxx DBMART               1 mart1
Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type: text
Type Specified:  text
Defaults to current database
Using database id: xxxxxxxxxxxxxxx
Defaults to current instance
Using instance number: 1
ASH Samples in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oldest ASH sample available:  12-Sep-11 12:20:34   [  24306 mins in the past]
Latest ASH sample available:  29-Sep-11 09:26:08   [      0 mins in the past]
Specify the timeframe to generate the ASH report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter begin time for report:
--    Valid input formats:
--      To specify absolute begin time:
--        [MM/DD[/YY]] HH24:MI[:SS]
--        Examples: 02/23/03 14:30:15
--                  02/23 14:30:15
--                  14:30:15
--                  14:30
--      To specify relative begin time: (start with '-' sign)
--        -[HH24:]MI
--        Examples: -1:15  (SYSDATE - 1 Hr 15 Mins)
--                  -25    (SYSDATE - 25 Mins)
Defaults to -15 mins
Enter value for begin_time: 09/27/11 16:00:00
Report begin time specified: 09/27/11 16:00:00
Enter duration in minutes starting from begin time:
Defaults to SYSDATE - begin_time
Press Enter to analyze till current time
Enter value for duration: 240
Enter a name for the report or a default will be used, (I prefer a naming convention of:
<report type, (ash, awr, addm)>_<sid>_<snapshot>.<txt/html>
View the ASH report for the time period, going to the top events first:
Event                          % Event  P1 Value, P2 Value, P3 Value % Activity
------------------------------ ------- ----------------------------- ----------
Parameter 1                Parameter 2                Parameter 3
-------------------------- -------------------------- --------------------------
PX Deq Credit: send blkd         58.37           "268501009","3","0"       1.97
sleeptime/senderid         passes                     qref
                                                 "268501010","3","0"       1.97
                                                 "268501010","4","0"       1.97

There’s our issue!

Now using this info, we know we are looking at a parallel, (PX) issue that we can then take to the next
step to identify the process and see if it is connected to one of our SID’s we noted in the OEM console:

   Sid, Serial# % Activity Event                             % Event
--------------- ---------- ------------------------------ ----------
User                 Program                          # Samples Active     XIDs
-------------------- ------------------------------ ------------------ --------
     2170, 3249      16.27 PX Deq Credit: send blkd            14.75
DM_USER             oracle@host1 (P020)             11K/14K [ 74%]        0
                           CPU + Wait for CPU                   1.46
                                                      1,059/14K [  7%]        0
     2138,  585      16.13 PX Deq Credit: send blkd            14.55
DM_USER             oracle@host1 (P022)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.53
                                                      1,107/14K [  8%]        0
     2130,  581      16.11 PX Deq Credit: send blkd            14.53
DM_USER             oracle@host1 (P023)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.54
                                                      1,113/14K [  8%]        0
     2129,  533      16.03 PX Deq Credit: send blkd            14.54
DM_USER             oracle@host1 (P021)             11K/14K [ 73%]        0
                           CPU + Wait for CPU                   1.41
                                                      1,024/14K [  7%]        0

Note that is also reports back the same four sids, all showing as “UNKNOWN” (2170, 2129,2130 and 2138) as we see in the OEM console..)
It just doesn’t have any SQL or SQL_ID’s to connect with them.

The coordinating SID is caught in the Top sessions running PQ’s, so we will see this as our identifier, even if it wasn’t seen in the OEM console:

Sid,Srl# (Inst) % Activity SQL ID        Event                          % Event
--------------- ---------- ------------- ----------------------------- --------
User                 Program
-------------------- ------------------------------
2113,UNKNOWN(1)      80.89               PX Deq Credit: send blkd         58.37

This is very important, as we need this to track further, outside of the report.

Using the coordinator, then query the DBA_HIST_ACTIVE_SESS_HISTORY from during the snapshots to capture the main SQL_ID:

select * from sys.DBA_HIST_ACTIVE_SESS_HISTORY 
where session_id=2113
and snap_id between 717 and 727;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# USER_ID SQL_ID
27-SEP-11 10.33.39.946 PM 2113 143 255 5dpcst074nk81
27-SEP-11 10.33.49.976 PM 2113 143 255 5dpcst074nk81

You can then pull the data that is connected to the parent that will tell you what session experienced the problem:

select * from sys.WRH$_SQLTEXT 
where sql_id='5dpcst074nk81';
SNAP_ID DBID  SQL_ID  SQL_TEXT COMMAND_TYPE REF_COUNT
760 2778270765 5dpcst074nk81 <CLOB>   1            0

the clob then shows the SQL involved in the executed session, which then tells you what your pain is on:

create table dm_user.tbl_iot
(
  col1 ,
  col2 ,
  col3 ,
  col4....
constraint tbl_pk
primary key (col1))
organization index.... <--Here's our key, it's an IOT, which includes the PK!
tablespace mart_data1 pctfree 0 nologging parallel 4 as....

We can now find the secondary SQL_ID that correponds to most of the combined hit to the primary SQL that will show in the following statement:

select distinct SQL_ID, optimizer_cost from dba_hist_sqlstat 
where snap_id between 717 and 727
and plsexec_time_delta=0
order by optimizer_cost desc;
SQL_ID OPTIMIZER_COST
fjbbxnyc66t7c 256350
5dpcst074nk81 253906 <--Here was the one shown, it will be the one above!
1xg7tkc156a1m 5477
24hc2470c87up 5302
dyd4b36t1ppph 4865
gfjvxb25b773h 2331
3wy90uysqgcfx 1974
a55tay7577psc 1922
00a15fn17bx7p 1912
b92jqqxwt3tfd 1725
cf621qmts91wf 1028
5atpa8vj2gakz 1007
7qskskbx91q7t 992

Now run an awrsqrpt.sql to see the hit for the SQL_ID in question:

If you have never run one of these, it’s a great report for unique info on a SQL Report for a given run.  For this one, we just want to see WHAT was running:
You will need your snapshot ID’s and the SQL_ID to create the report and the report resides in $ORACLE_HOME/rdbms/admin.  Yes, I prefer the text version of the HTML, but to each their own… :)

 

WORKLOAD REPOSITORY SQL Report

Snapshot Period Summary
DB Name         DB Id    Instance     Inst Num Release     RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
DBMART         xxxxxxxxx mart1               1 10.2.0.4.0  NO  host1
              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       709 27-Sep-11 14:30:31        57       6.3
  End Snap:       727 27-Sep-11 23:30:53        37       5.1
   Elapsed:              540.37 (mins)
   DB Time:            2,726.16 (mins)
SQL Summary                               DB/Inst: MARTF/martf  Snaps: 709-727
                Elapsed
   SQL Id      Time (ms)
------------- ----------
fjbbxnyc66t7c ##########
CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("IBID") INDEX
ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NOLOGGING parallel 4 as select
col1 , col2 , col3 , col4 , col5 , col6
          -------------------------------------------------------------
SQL ID: fjbbxnyc66t7c                     DB/Inst: MARTF/martf  Snaps: 709-727
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("COL...
    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   3019078213            115,239,821             1           710            726
          -------------------------------------------------------------
Plan 1(PHV: 3019078213)
-----------------------
Plan Statistics                           DB/Inst: DBMART/mart1  Snaps: 709-727
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                        ##########  115,239,820.7    70.5
CPU Time (ms)                            ##########   23,668,620.1    45.7
Executions                                        1            N/A     N/A
Buffer Gets                              ##########  940,627,725.0    70.9
Disk Reads                                7,761,573    7,761,573.0    19.4
Parse Calls                                       9            9.0     0.0
Rows                                              0            0.0     N/A
User I/O Wait Time (ms)                   1,949,762            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                        0            N/A     N/A
Concurrency Wait Time (ms)                  301,390            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                    17            N/A     N/A
Sharable Mem(KB)                              9,285            N/A     N/A
          -------------------------------------------------------------
          -------------------------------------------------------------
Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT   |          |       |       |   256K|       |       |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |       |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |   637M|   229G|       |       |       |  Q1,01 | P->S | QC (ORDER) |
|   3 |    INDEX BUILD UNIQUE    | PRIM_PK  |       |       |       |       |       |  Q1,01 | PCWP |            |
|   4 |     SORT CREATE INDEX    |          |   637M|   229G|       |       |       |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE          |          |   637M|   229G|   136K|       |       |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANGE      | :TQ10000 |   637M|   229G|   136K|       |       |  Q1,00 | P->P | RANGE      |
|   7 |        PX BLOCK ITERATOR |          |   637M|   229G|   136K|     1 |     4 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| PRIM_TBL |   637M|   229G|   136K|     1 |     4 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------
Note
-----
   - cpu costing is off (consider enabling it)
   - estimated index size: 257G bytes
Full SQL Text
SQL ID       SQL Text
------------ -----------------------------------------------------------------
fjbbxnyc66t7 CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"
             "("COL1") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NO
             LOGGING parallel 4 as select ...<want to see the full SQL statement, yes
             this report will show you the output...>

We now can see that our “offender” was the primary key creation on the IOT create statement.

Now you have the Information on the “Unknown” offender that was showing in OEM.  Nothing can hide from a DBA for long! :)


Sep 28 2011

Oracle Open World 2011, Prep Time!

Category: DBA Life,OracleKellyn Pot'Vin @ 8:52 pm

I’ve spent the week updating, packing and communicating in preparation for my trip to Oracle Open World 2011.  As with most folks that are attending, there is a lot to prepare for, but I have that added challenge of three kids who find this a fine opportunity to drive their father to distraction, as my ex-husband is responsible with managing their week, which he is unaccustomed to.   Although I’m thankful for Google calendar that will tell my ex where and when each child has to be at all times, what homework, social events, etc., it is not a lot of fun for him, as the schedule is commonly kept from my home and not his during the week so finding a way to keep it from becoming too chaotic for him may not be feasible…

My preparation has included a few last minute tasks for RMOUG, which turned out to be pretty painless, so that wasn’t impacting.  I think one of my biggest challenges was a final rush to try and get what identification I needed sync’d with the correct name, as we all know, the government agencies are far from efficient when someone changes their name, no matter if it’s marriage or in may case, a divorce, (ask me how my identity crisis is going, considering how often I need to use a different last name depending on what company/agency I’m speaking with at any given time…sigh…)

Upon checking my calendar, it appears that I have a dinner to attend every night that we are in San Francisco.  The company for each of these events, of course, is wonderful- some of the folks I know and many I only know virtually and look forward to meeting for the first time.  I’m looking forward to the Pythian Blogger’s meetup on Wednesday night.  It’s a great opportunity to meet up with ex-coworkers and meet fellow bloggers-  always a fun, enterprising group!

I’m looking forward to meeting a long list of folks, including, Dan Norris, Lisa Dobson, Eddie Awad, Yuri Velikanov, many more folks than I can possibly list…) and those that I haven’t seen in quite a while, Alex Gorbachev, Cary Milsap, Deb Lilley, Carol Dacko and again, more than I can list… :) )

There will be many that were unable to make it this year, including a last minute cancel due to family needs by Chet Justice.  He will be missed by many and want to pass a long a thank you for my daughter Cait and I for the picture on oraclenerd.com.  We love our matching “DBA Goth Princess” T-shirts.  When asking my 16yr old son to take our picture, his response was, “You know Mom, there is this great feature called a self-timer found on cameras…”  I had to respond with “You know, there is this great feature called, I keep feeding you…”  He took a great picture of Cait and I for Chet’s site… :)

Packing will continue tonight,  along with more planning in the calendars, an attempt to get my tasks for the latest Agile sprint at work done and a hope that what clothes I pack will work the slightly different weather in San Francisco.  Yes, I’m bringing my cowboy boots… :)

See you there!

 


Sep 22 2011

SQL Server and Distributed Transaction Tuning

Category: SQLServerKellyn Pot'Vin @ 2:17 pm

Users complained that a monthly financial report would no longer run SQLServer Reporting Services, (SSRS.)   Upon investigation, it was found that this was a stored procedure that ran from one Annex database, sourcing from another and outer joins to a SQLServer database on a remote server through a linked server configuration.  In attempts to run the report,  my SQL Server Profile traces on both the source SQL Server and the remote SQLServer resulted in consistent sp_reset_connection results from the source and no activity in the remote.

 

I ran my trusty and favorite script to tell me what processes were taking the most resources and quickly realized with statement was the cause:

SELECT  TOP 1000
[Object_Name] = object_name(st.objectid),
creation_time, last_execution_time, total_cpu_time = total_worker_time / 1000,
avg_cpu_time = (total_worker_time / execution_count) / 1000,
min_cpu_time = min_worker_time / 1000, max_cpu_time = max_worker_time / 1000,
last_cpu_time = last_worker_time / 1000, total_time_elapsed = total_elapsed_time / 1000 ,
avg_time_elapsed = (total_elapsed_time / execution_count) / 1000,
min_time_elapsed = min_elapsed_time / 1000, max_time_elapsed = max_elapsed_time / 1000,
avg_physical_reads = total_physical_reads / execution_count,
avg_logical_reads = total_logical_reads / execution_count,
execution_count,
SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END
-- qs.statement_start_offset
) /2) + 1) as statement_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY
sys.dm_exec_sql_text(qs.sql_handle) st
WHERE Object_Name(st.objectid) IS NOT NULL
--AND DB_NAME(st.dbid) = '
ORDER BY   total_cpu_time DESC
db_name(st.dbid), total_worker_time / execution_count  DESC

 

Now the important data from here is that the report came up at the top promptly, as it was an challenge to be sure:

Object_Name execution_time total_cpu_time min_cpu_time max_cpu_time last_cpu_time avg_logical_reads
Rpt_ClientRevenueDownloadKa

33:15.1

1040436

507989

532447

532447

4701439

Note that the min and max CPU times are very consistent and the average logical reads are out of this world!  The other nice thing about this report, is that it will tell you exactly, the code that is causing the issue.  You have the choice then of copying the code out of the final column and running and explain plan to see what is the issue or executing a test to see the problem.  The code in question was a distributed query, so an explain plan often does not show what a test of the query would, so I ran the one statement, which did return to the remote server, but with frustrating results:

What you see in the diagram above is both a choice by the server to parallelize the query, which is flooding the server with both waits on parallel and network IO.   The query did end up returning, but only after approximately 10 minutes.

Something is coming across the network to run against the remote server and after verifying statistics, etc. on the remote, I returned to the source server and looked at the query.

The query required a two step temp table to be created before joining to the tables across the linked server.  The issue here, is that this data was flooding both the network as well as the remote database.  The following, without the temp table, completes very quickly:

SELECT
--ISNULL(atv.NAME, 'UNKWN') AS name,
ISNULL(parentCrmA.NAME, 'UNKWN') AS parent,
month(t.DT1) AS pmonth, YEAR(t.DT2) AS pyear,
RIGHT('0' + CONVERT(VARCHAR(3), MONTH(t.DT2)) + CONVERT(VARCHAR(5), YEAR(t.DT2)), 6) AS fdte,
t.DT2 AS pdate, t.P_AMT*(-1) AS amt,
'rev_amt' AS r_amt, 'adj_amt' AS a_amt,
a.GLA_NO AS GLNo, --atv.crmVert AS vertical,
crmA.OID_NAME AS oidname, s.S_NAME AS spc,
c.C_NAME AS Cname, crmA.NC_NAME AS c_cname,
crmA.I_CD AS i_cd
--INTO      #FINData ß2

nd

 step temp table commented out, along with its columns…
FROM        R_SERVER.FIN_app.dbo.tglTbl t
INNER JOIN  R_SERVER.FIN_app.dbo.tglAtbl a ON t.GLAKey = a.GLAKey
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglItbl i ON t.TNo = i.TNo
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglCtbl c ON i.CKey = c.CKey
--LEFT OUTER JOIN @AccountToVerticalMap atv ON c.CID = atv.cName COLLATE SQL_Latin1_General_CP1_CI_AS
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglStbl s ON i.PSKey = s.SPKey
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglARtbl ar ON t.ARKey = ar.ARKey
LEFT OUTER JOIN   S_SERVER_DB.dbo.Atbl crmA ON c.CID = crmA.Name COLLATE SQL_Latin1_General_CP1_CI_AS AND crmA.DeletionS_CD = 0 AND crmA.S_CD = 0
LEFT OUTER JOIN   S_SERVER_DB.dbo.Atbl parentCrmA ON cA.PA_ID= parentCA.A_IDAND parentCA.DS_CD = 0 AND parentCrmA.S_CD = 0
WHERE a.GLANo between 4000 and 4999

:03 seconds, in fact…

The IO issue is the code creates the two temp objects, then attempts to send this data across the network to the linked server and onto the remote database to process the request.  As this is a monthly report, you have to ask yourself, is a temp table really required??  Not in this instance and it is actually the bottleneck to performance.

 

ORIGINAL:

with cAName(MemName)
AS (SELECT      c.CID
FROM  R_SERVER.FIN_app.dbo.tglCtbl c
WHERE c.CID is not null
UNION ALL
SELECT      ar.ARCode
FROM  R_SERVER.FIN_app.dbo.tglARtbl ar
WHERE ar.ARCode is not null
)
INSERT      INTO @ActVertMap
SELECT      DISTINCT
a.Name AS Name,
a.New_Vertical AS Vert
FROM  S_SERVER_DB.dbo.Atbl a
LEFT OUTER JOIN cAName c ON a.Name = c.Name COLLATE Latin1_General_CI_AS
WHERE a.Name is not null
order by a.Name

 

SIMPLE  REWRITE:

drop table cacct_wrk
go
SELECT      c.CID  as "MemName" into cacct_wrk
FROM  R_SERVER.FIN_app.dbo.tglCtbl c
WHERE c.CID is not null
UNION ALL
SELECT      ar.ARCode
FROM  R_SERVER.FIN_app.dbo.tglARtbl ar
WHERE ar.ARCode is not null
go
drop table ActVertMap_wrk
go
SELECT         DISTINCT
a.Name AS mapName,
a.New_Vertical AS mapVert
into ActVertMap_wrk
FROM  S_SERVER_DB.dbo.Atbl a
LEFT OUTER JOIN cacct_tst c ON a.Name = c.MemName COLLATE Latin1_General_CI_AS
WHERE a.Name is not null
order by a.Name
go

 

FINAL REPORTING QUERY REWRITE:

SELECT
ISNULL(atv.NAME, 'UNKWN') AS name,
ISNULL(parentCrmA.NAME, 'UNKWN') AS parent,
month(t.DT1) AS pmonth, YEAR(t.DT2) AS pyear,
RIGHT('0' + CONVERT(VARCHAR(3), MONTH(t.DT2)) + CONVERT(VARCHAR(5), YEAR(t.PDate)), 6) AS fdte,
t.DT2 AS pdate, t.P_AMT*(-1) AS amt,
'r_amt' AS r_amt, 'a_amt' AS a_amt,
a.GL_NO AS GLANo, atv.M_VERT AS vertical,
cA.O_ID AS oidname, s.S_NAME AS spc,
c.C_NAME AS FIN_cname, cA.NC_NAME AS ccname,
cA.I_CD AS icode --INTO      #FINData
FROM        R_SERVER.FIN_app.dbo.tglTbl t
INNER JOIN  R_SERVER.FIN_app.dbo.tglAtbl a ON t.GLAKey = a.GLAKey
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglItbl i ON t.TNo = i.TNo
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglCtbl c ON i.CKey = c.CKey
LEFT OUTER JOIN   ActVertMap_wrk atv ON c.CID = atv.mName
COLLATE SQL_Latin1_General_CP1_CI_AS
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglStbl s ON i.PSKey = s.SPKey
LEFT OUTER JOIN   R_SERVER.FIN_app.dbo.tglARtbl ar ON t.ARKey = ar.ARKey
LEFT OUTER JOIN   S_SERVER_DB.dbo.Atbl cA ON c.CID = cA.Name COLLATE SQL_Latin1_General_CP1_CI_AS AND cA.DeletionS_CD = 0 AND cA.S_CD = 0
LEFT OUTER JOIN   S_SERVER_DB.dbo.Atbl parentCA ON crmA.P_ID= parentCA.A_ID AND parentCA.DeletionS_CD = 0 AND parentCA.S_CD = 0
WHERE a.GLANo between 4000 and 4999

And this returns in .03 seconds!! :D

 

Using the correct feature for the goal and the use of the code is important.  Where using a temporary object might be a great option for local processing, the limitations of the optimizer in any platform to utilize that temporary object must be taken into consideration.  This was a complex process involving three databases that with growth, resulted in poor performance.


Sep 15 2011

Keep is Simple and Other Superman Challenges

Category: DBA LifeKellyn Pot'Vin @ 3:53 pm

“Oh what tangled web we weave when complexity is added we can’t see…”

How many times has complexity of design in an application, outside of the database, lead to the database blamed for slow performance?   This is where a manager thanks the technical Gods for a DBA with great Sherlock Holmes skills to track down and prove the database not only innocent, but figure out what the real problem is.  The database is guilty until proven innocent, we all know that, so when user’s come to the DBA and demands, “Why can’t I run this report?  I should be able to run a simple report, shouldn’t I?”  We have to take out the Holmes pipe and figure out what is going on.

My latest mystery landed me in a web, which would have been funny if not happening to me.  The exact demand from the users above was rightfully requested of the DBA group.  Being a good, responsible DBA,  we quickly took responsibility when we discovered a bug that was impacting stats collection with parallel degree set, provided a work around while the SR was being worked with MOS and proceeded forward to ensure the user received their report, ensuring business revenue.

As the trace is my first choice in this type of challenge, I attempted to trace the session numerous times, wanting to ensure I’d fixed the issue, along with a future goal of performance tuning, but found I was unable to pin it down and found that I was really seeing numerous sessions involved with this process, not just one to create the report.

The users, running the report to test my fix, experienced another failure with the following  error:  ” Failed to read the Report Stream from Report Server.”  Now this is a vague error and rarely one that *really* has to do with the Oracle database.  I could see the stats failure via initial data load logs but nothing really explained after removing this roadblock as to why the users were still experiencing failures.

As this report was kicked off through our web interface, supported by APEX, I went to the application support specialist for answers.   He appeared just as frustrated when he informed me that the report was actually called from APEX, but was phycially run from SSRS, (SQL Server Reporting Services.)  I promptly asked for the report and was dismayed to find out, depending on the parameters passed, it could contain anywhere from 24 to 50 database calls to complete the report.

What appeared to be one, simple, but slow report to the user, was actually a very large collection of queries, ran from a SQL Server Reporting Repository, submitted through an APEX web interface on a second server and then to the production database on a third.

Ahhh, but I’m not through yet.  I pulled the queries from the report, formatted them for a command line run and then passed in the parameters for the large feed that had been failing.  When I ran this from the command line, no failures occurred and I was able to trace everything for my future performance tuning.  I was surprised to see that there really wasn’t that much that was wrong with the SQL, only a few queries of the many that would require time and offer the biggest bang for the buck.   I then ran the report through the web interface hoping to find a way to trace it and my run through the web interface WAS SUCCESSFUL!  It also ran in 1/2 the time that it took for the version that reported in the logs that failed.   I already had the APEX and SSRS guys looking into anything on their side that might indicate a timeout or failure cause, but I still felt like we were missing something.

I approached the user and asked them where they were kicking the report off from and found out that they were executing it from a terminal server web session, not from their workstation’s web interface.  I quickly emailed the Windows Admin, who also administers the terminal servers and asked her if she was aware of any reporting session or terminal server timeout.  She  informed me that they disconnect any idle sessions after three hours.  As this session would show as idle, only submitting the report, but requiring a connection to be active for the remainder of the run, this was the actual issue.  Terminal server was dropping the session while it was attempting to complete a report that through it’s entire batch process across three servers, took almost five hours… :(

The users were then requested to run the report only from a workstation web interface.  The results were successful and the database was again innocent.

Moral of the story:  Complexity is the enemy of production processing.  As developers, application designers and DBA’s, it is often seen as proof to the world of our God-like creative powers.  This is where we also quickly move from being an asset to a liability to the business.  Do not create that which can not be easily supported and kept in the life-cycle of technology as technology grows and changes .  It is a fool-hardy endeavor that quickly costs the company money, time and nerves…  Want to be a superstar?  Stop being someone that to support his technical choices, requires a Superman effort…

 


Aug 31 2011

Disaster Recovery and Other Sacrifices

Category: RantsKellyn Pot'Vin @ 10:25 am

If you desire a subject that will invoke deep passion and
often combined with disgust from a group of DBA’s, disaster recovery is the
one.  It is the subject that rarely we
feel our butts are not out there hanging, no matter how much we’ve attempted to
secure our environment.

I’ve observed a consistent flow of articles, conversation
and email discussions on the subject and it is apparent that rarely is the
business as aware as the technical specialists, (aka the DBA) of just how
vulnerable their environments are.  Rarely
are the budget dollars allotted to the task of insuring that systems have the
proper disaster recovery hardware/software in place and/or testing performed in
a regular basis.

It’s easy for the business to see the value in the
production systems.  They create revenue
and their value is equal to the dollars they produce.  Development and test are more difficult for
them to understand, but most times, they can be justified the first time
production is undermined due to development or testing being performed in one
of them…:)

We now get into backup and recovery.  How many backups are impacted by 24X7 shops,
where the only thing viewed by the business is impact to revenue by having to
allocate resources to backing up production and placing it on disk/tape that
offers them no value to that revenue.
Yes, the DBA’s and technical management argue, “All it takes is one loss
of production and you will be thankful that we have that backup…”  Until that day comes, many business’ rely on
the robust nature of Oracle, the hardware it resides on and the technical
expertise of the folks they’ve hired to keep it running and never having to
rely on those backups.   DBA’s commonly fight on a regular basis for time
to allocate to testing, hardware to test recoveries to and explaining to the
business why it’s important.  The
business again looks on this as time that could be better allocated to creating
faster systems to create more revenue and again, impact to what the business is
there for- creating revenue.

The next level is then disaster recovery.  All DBA’s know this is the final
gauntlet.  We pray to the DBA Gods hoping
for a technical manager with the gift to motivate, sell and help the business
to understand why having standby’s of production databases to keep revenue
flowing in case of primary production going down is important.  We are willing to sacrifice small animals in
the name of a secondary data center for disaster recovery testing.  We want to know how long the business will be
down in case the unthinkable does happen and if all the documentation on what
it takes to create production will actually work when we do try to recreate it.  We would also like to have that answer
demanded of us when the unthinkable does happen and upper management is sitting
in front of us asking, “So HOW long are we down for???”

This is not rare, this is not uncommon, it is all too often
the norm for most DBA’s in the business world.
IT Managers, Network Administrators, Database Administrators often
battle day in and day out, not just for what they need to provide the growing
demands of the business, but what the business needs to survive in case of
disaster.  Our jobs are not just to
provide you with production, but to provide you with an ability to sustain your
business when the unthinkable happens.


Aug 24 2011

Tempfile Read /Writes and ASM

Category: OracleKellyn Pot'Vin @ 10:32 am

I truly believe that database myths are the scourge of the database world.  With that said, I have to blame myself for not being specific enough in one of my own blog posts and propagating one myself!  As it was brought to my attention by Tanel Poder and Greg Rahn, I hadn’t put a post out here until now, so apologies!

I have flipped back to an almost exclusively 10g environment the last three months, but even though I’m back to my old stomping grounds, now fully staffed with almost double the DBA’s that were present when I left, have been able to implement ASM.  With this change, I also need to update my data on how tempfiles are written to.  When you read my blog on temp usage in parallel, in must be reminded that this post is for a Non-ASM environment.  I also think for those that have high waits in temp read and writes, this should be an extra selling feature of ASM.

ASM, when implemented in any Oracle environment, 10g, included, will change the way that tempfiles are written to.  Instead of single file writes, even when multiple tempfiles per a temp tablespace are present, with ASM, it WILL stripe across the files efficiently, (unless statistics are in issue) with parallel.  There would then only be the benefit of having multiple temp tablespaces if you would like to allocate a different temp tablespace to different processes, but even then, the performance gain was marginal in my tests.

I did receive similar performance gains of temp tablespace groups without ASM as a single temp tablespace with multiple files with ASM.  I was only able to gain extra performance when spreading heavy hitting temp usage across temp tablespace groups during high concurrency situations.   the overall performance gain was only 12%, which was no where near what I experienced in the Non-ASM environment choice to go to temp tablespace groups with parallel processing.  These were processes that utilized over 50GB of temp per parallel process.

I still have processes that limit my tuning options that have me daydreaming about multiple temp tablespace groups on multiple ASM groups, residing on SSD, but again, I’m daydreaming … :)


Aug 08 2011

A Tale of Session Parameter Settings

Category: OracleKellyn Pot'Vin @ 3:47 pm

This is another blog post about how TEMP can kill you in performance.

The Program Global Area, aka PGA, is a memory region that allows Oracle to perform many processes that once fell to static calculations in parameters that had to be managed by a DBA.

As frustrating as it might be to a DBA to not have enough memory to allocate to the performance enhancing feature, I found it even more frustrating to find NO PGA allocation to window sorts.  We were experiencing poor performance in one of our environments, but I noted it was only during heavy workloads, resulting from heavy IO, NO execution plan changes, but outrageous waits on temp reads and writes.

db file sequential read   61,275,579   161,054      44.7   User I/O
direct path read temp     17,262,132    74,398      20.6   User I/O  <--TEMP is on SSD!!
CPU time                                71,971      20.0
direct path write temp    10,572,848    38,611      10.7   User I/O  <--TEMP is on SSD!!

 

Upon querying the work area, I discovered the following:

select vst.sql_text, swa.sql_id, swa.sid, swa.tablespace
, swa.operation_type
, trunc(swa.work_area_size/1024/1024) "PGA MB"
, trunc(swa.max_mem_used/1024/1024)"Mem MB"
, trunc(swa.tempseg_size/1024/1024)"Temp MB"
from v$sql_workarea_active swa, v$session vs, v$sqltext vst
where swa.sid=vs.sid
and swa.sql_id=vs.sql_id
and vs.sql_id=vst.sql_id
and vst.piece=0
order by swa.operation_type;

SQL_ID Temp Tablespace Operation Type PGA SGA TEMP
09am9sr9uca5y TEMP_4 WINDOW(SORT)

0

0

1375

09am9sr9uca5y TEMP_1 WINDOW(SORT)

0

0

1405

09am9sr9uca5y TEMP_2 WINDOW(SORT)

0

0

1415

09am9sr9uca5y TEMP_3 WINDOW(SORT)

0

0

1420

 

The process in question, as most processes in the database environments I work in was parallel, (degree 4) working with temp tablespace groups, (count of 4) on SSD, so quite fast, but to see this amount of processing “swapping” to temp was frustrating and painful to performance.

The IO was killing the performance, even with this running on SSD, we were 100% saturated on the disk!:

Device:       reads/s            writes/s              rsec/s                    wsec/s                  %util

Fio1              1584.00          1460.33                 101397.33            129344.00           73.97

Fio2              1603.33          1534.67                 102720.00            134776.00         100.03

 

FROM AWR:

%Time Total Wait wait Waits

Wait Class                      Waits  -outs         Time (s)    (ms)      /txn
-------------------- ---------------- ------ ---------------- ------- ---------
User I/O                   91,531,087     .0          279,954       3  56,223.0
Other                       4,067,544    2.9           12,718       3   2,498.5

Now where the key came in was the rest of the parallel processes when looking at their allocation:

SQL_ID Operation Type PGA SGA TEMP
09am9sr9uca5y   HASH-JOIN

511

23

09am9sr9uca5y   LOAD Write Buffers

0

0

09am9sr9uca5y   HASH-JOIN

511

23

09am9sr9uca5y   HASH-JOIN

511

23

Now even though the PGA is set to 32G and the allocating to numerous other processes, this one appeared very “odd” and why it’s the key the PGA.  Consider what the overall process would have to be to allocate 511 MB of PGA to the three processes above.   This would mean that 500MB was allocated to the process and then there was the approximate 10MB overhead.  The number was too evenly distributed and too high and no PGA was allocated to two newer features to 10g, such as analytical functions, (window sorts) .

I remembered that at one time, this company had a developer employed to it that had a penchant for the following manual session alteration:

EXECUTE IMMEDIATE 'ALTER SESSION SET WORKAREA_SIZE_POLICY = MANUAL';
EXECUTE IMMEDIATE 'ALTER SESSION SET SORT_AREA_SIZE = 536870912';
EXECUTE IMMEDIATE 'ALTER SESSION SET HASH_AREA_SIZE = 536870912';

 

Upon a search of the code, it was found that this was in the main package, but:

  1. The beginning procedure which executed from the package was actually in the center of the package, so developers never noted it as being an issue and the DBA’s never dug into the code deep enough before to know this manual alteration was present.
  2. Due to a huge process change to how the code produced work, there had been a large insertion of analytical functions that caused the performance degradation upon heavy loads.

The testing process was simple.

  1.  Removed the three lines from the initial procedure in the package.
  2. Run the same heavy load of processing from production in test with the new code.

Change in how the PGA is allocated:

SQL_ID Temp Tablespace Operation Type PGA SGA TEMP
09am9sr9uca5y TEMP_2 WINDOW (SORT)

26

95

598

09am9sr9uca5y TEMP_3 WINDOW (SORT)

26

95

598

09am9sr9uca5y TEMP_1 WINDOW (SORT)

26

95

598

09am9sr9uca5y TEMP_4 WINDOW (SORT)

26

95

598

I also found the PGA savings to the hash joins interesting, as what was statically “forced” in the manual allocation wasn’t what the database really required and was simply wasted with the old settings, where the PGA dynamically set much smaller amounts:

SQL_ID Temp Tablespace Operation Type PGA SGA TEMP
12bhautka6mvp LOAD WRITE BUFFERS

0

0

12bhautka6mvp HASH-JOIN

2

1

12bhautka6mvp HASH-JOIN

2

1

12bhautka6mvp HASH-JOIN

2

1

12bhautka6mvp HASH-JOIN

2

1

Result:

Increase in performance:  7 fold

The fix was moved to production that day, without hesitation and everyone shook their head as to why this was ever put into place by anyone,  but I want to follow up on that question that many folks, I’m sure have.  Although I do not support this, especially in 10g and 11g databases due to what we see here, you do have to understand what the developer’s goal was:

The developer :

  1.  Saw benefits to straight 500MB allocation of PGA to processing that over-rides the percentage allocation maximum per process.
  2. Experienced huge gains in performance on hash joins due to this change and we did suffer some small performance hits when it was removed, but the overall performance gain was still outstanding!
  3. Wasn’t aware of future enhancements and code changes that would have been vulnerable to his choice in coding session parameter changes.

 


Aug 02 2011

Warp Speed with Temp Tablespace Groups

Category: OracleKellyn Pot'Vin @ 7:47 pm

The introduction of the temp tablespace was a crucial step in the RDBMS to separating  the “work area” for sorting and specific join operations from permanent segments.   To eliminate the performance challenges of sorts and hash against the I/O threshold involved with performing this word in the temp tablespace, Oracle 9i introduced the Program Global Area, (PGA) to complete sorts and hashes within memory.

DBA’s spent considerable time tuning the PGA, attempting to continually ensure that as more complex sorting and join features emerged, that these processes would complete their work within the PGA and not require such a large work area to force a, “swap” to the temp tablespace, again returning to an I/O bound process.

Throughout the years of 10g release 2, larger databases, required more sorting and hash operations, which in turn required larger work areas that for particular requirements, no PGA could meet.  This would be seen in PGA cache hit graphs without a size recommendation to meet a 90% or better hit ratio.  These sorting and/or hash “heavy” processes would then become burdensome on the database, often encompassing the primary utilization of temporary “swap” and extended wait time.

 

When a DBA or Developer observes high waits on tempfile reads and writes, the challenge is often not understood.  Unlike a standard tablespace, that with high read and writes can have the I/O intensive demands spread across more area by adding multiple datafiles to write to simultaneously, a temp tablespace, when observed, will appear very different.

A simple ASH report can show how impacting this type of performance challenge can be to a VLDB:

 

Event

% Event

P1 Value, P2 Value, P3
Value

% Activity

Parameter 1

Parameter 2

Parameter 3

direct path read temp

43.27

“1001″,”3502″,”7″

0.01

file number first dba block cnt
PX Deq Credit: send blkd

26.18

“268501001″,”2″,”0″

1.50

sleeptime/senderid passes qref

“268501001″,”1″,”0″

1.18

“268501003″,”2″,”0″

1.06

direct path read

5.10

“50″,”11333″,”11″

0.01

file number first dba block cnt
direct path write temp

1.47

“1003″,”50949″,”7″

0.01

file number first dba block cnt

 

SQL ID

Planhash

% Activity

Event

% Event

SQL Text

dhytpraqah5c5

2880106315

100.00

direct path read temp

43.27

create table report_tbl1…
PX Deq Credit: send blkd

26.18

 

Temp tablespace groups have received little press and less discussion since their introduction in Oracle 10g.  When utilizing temp tablespace groups, a DBA can offer impressive performance increases by spreading the I/O across multiple files vs. one tempfile.   As tempfiles are utilized differently than datafiles, a temp tablespace group enables a process to consume temp space from multiple tablespaces.

Before the feature is implemented though, there are many considerations that need to be examined to ensure that benefits are seen and that complications are not introduced.

 

Myths about temptablespace groups:

Myth 1.  A temp tablespace group must have more than 1 tablespace.

Answer-  Untrue.  The tablespace group is created with the initial temp tablespace defined with the temp tablespace group clause.

 

Myth 2.  There is a limit to how many temp tablespaces can be added to a temp tablespace group.

Answer-  There is no explicit limit on the maximum number of tablespaces that can be allocated to a tablespace group.

 

Myth 3.  A temporary tablespace can belong to more than one temporary tablespace group.

Answer-  It can belong to only one group.

 

Myth 4.  Temporary tablespace groups improve all sorting issues.

Answer-  Windows sort rank often perform worse when utilizing temp tablespace groups and parallel.

The feature can be allocated to one user as its temporary tablespace or to an entire database to increase performance.

 

To create a temporary tablespace group is quite simple:

CREATE TEMPORARY TABLESPACE <tablespace_name>
TEMPFILE <ASM DISKGROUP >
SIZE <SIZE REQ>
TABLESPACE GROUP <group_name>;
CREATE TEMPORARY TABLESPACE TEMP_G1
TEMPFILE '+DATA'
SIZE 20000M
TABLESPACE GROUP TEMP_G;
desc dba_tablespace_groups

 

SELECT *
FROM dba_tablespace_groups;
ALTER TABLESPACE <tablespace_name>
TABLESPACE GROUP <group_name>;
ALTER TABLESPACE DM_DATA1 TABLESPACE GROUP TEMP_G;
ALTER USER DM_USER TEMPORARY TABLESPACE TEMP_G;
SELECT TEMPORARY_TABLESPACE FROM DBA_USERS
WHERE USERNAME='DM_USER';
ALTER SYSTEM TEMPORARY TABLESPACE TEMP_G;

 

Planing to implement temp tablespace group(s)
1.  Plan your temp tablespace group with careful consideration-  the tablespaces should be of the same size, have the same amount of files with the tempfiles each the same size.
2.  Keep in mind- inconsistent sizes across the tablespaces can affect the efficiency of the striping when using a non-ASM managed environment.
3.  The temp work will stripe across the tablespaces quite evenly.  When using a temp tablespace group, it will no longer simply use one temp tablespace at a time, so keep this in mind when monitoring temp tablespace usage.
4.  If in a non-ASM environments, stripe your files in differing orders to avoid having the same # file on the same drive as another temp tablespace from the temp tablespace group-  this will eliminate more I/O “hot spots”.
4.  You should estimate the required size of each tablespace to be approximately 40-60% or your current temp tablespace needs, depending on your database usage, (parallel processing, rowid range scans and types of sorts…)
5.  Temp tablespace groups should not be a replacement for SQL tuning. Larger than acceptable temp tablespace usage in correlation to the data set size the code is working with is a clear indicator that the code should be examined for tuning opportunities.

6.   Match your quantity of temp tablespaces contained in a tablespace group to the amount of parallel you will be hinting in your statements using it.

7.   Poor performance from statistics issues is only aggravated by temp tablespace groups.  Sort and hash operations are two of the most sensitive tasks to poor statistics, so it is easy to understand that extensive performance issues should be expected if statistics are not addressed before temp tablespace group introduction.

 

Temp tablespace groups and parallelism

DBA’s who utilize parallel query efficiently in their environments often follow a strict set of guidelines on what scenarios, objects and design must be in place before parallel will offer a true benefit in performance.

  • Partitioned Objects
  • Where clause that utilizes a partitioning key.
  • Large insert/update with defined variables.
  • Non-stale statistics
  • Setting the degree of parallel, (DOP) on the object.

Parallel can be utilized with temp tablespace groups to enhance performance to an incredible degree as well, but of the above “rules”, only one is preferable and it is most likely not the one most would guess from the above list.

Temp Tablespace groups should have the following for positive performance gains when utilizing parallel:

  • Non-stale statistics
  • Set the degree of parallel, (DOP) as a hint in the query, not at the object level.
  • Combination works best with OLAP and DSS systems, rare occasion where an OLTP would benefit.
  • Unless the insert and update is performing large sort or hash, it would offer little performance gain.
  • Partitioned objects are not a requirement to receive performance benefit.

The last bullet point is often difficult to understand, but the mindset must be re-aligned from the idea that you are using parallel to benefit a “read or write on multiple objects” i.e partitioned objects and that we are using the parallel process to benefit “read and write to multiple tempfiles” which is the temp tablespace group.  Visualizing the temp tablespace group as a “partitioned object” may assist with comprehending the benefits.

 

Performance Example

The process example:

  • utilizes 720GB of temp tablespace.
  • has an execution time of 16 hrs.
  • Source tables are 472GB in a 10TB datamart environment.

The Goal to attain:

  • Cut the execution time in 1/2 to re-allocate the resources to other processes.
  • Minimize temp usage by 60%.
  • Use temp tablespace groups to eliminate temp read/write waits in existing process.

Note**  Second time documenting this test.  Current iteration has enhancements (i.e. Added fourth temp tablespaces and changing amount of parallel used in the statements, etc…)

Create Temp Tablespace Group, three temp tablespaces total, 150GB each-
 CREATE TEMPORARY TABLESPACE TEMP_G1 TEMPFILE ‘/u06/oradata/sid1/temp_g1_01.dbf’ SIZE 51200M,
 ‘/u10/oradata/sid1/temp_g1_02.dbf’ SIZE 51200M,
 ‘/u20/oradata/sid1/temp_g1_03.dbf’ SIZE 51200M
 TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G2 TEMPFILE ‘/u30/oradata/sid1/temp_g2_01.dbf’ SIZE 51200M,
 ‘/u20/oradata/sid1/temp_g2_02.dbf’ SIZE 51200M,
 ‘/u10/oradata/sid1/temp_g2_03.dbf’ SIZE 51200M;
 ALTER TABLESPACE TEMP_G2 TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G3 TEMPFILE ‘/u25/oradata/martd/temp_g3_01.dbf’ SIZE 51200M,
 ‘/u10/oradata/martd/temp_g3_02.dbf’ SIZE 51200M,
 ‘/u04/oradata/martd/temp_g3_03.dbf’ SIZE 51200M;
 ALTER TABLESPACE TEMP_G3 TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G4 TEMPFILE ‘/u20/oradata/martd/temp_g4_01.dbf’ SIZE 51200M,
 ‘/u15/oradata/martd/temp_g4_02.dbf’ SIZE 51200M,
 ‘/u06/oradata/martd/temp_g4_03.dbf’ SIZE 51200M;
 ALTER TABLESPACE TEMP_G4 TABLESPACE GROUP TEMP_G;
Switch over database and user to new temp tablespace group—
 ALTER DATABASE DEFAULT TEMPORARY TABLESPACE TEMP_G;
Break up original process from one process with complex aggregating to two CTAS, less one process sorting:
 SQL> create table orpt_test
 tablespace tbl_data1 compress nologging parallel 4
 as
 SELECT /*+ full(c) parallel (c 4) */ <–Note Parallel here, same requested parallel as # of temp
 c.m_id,

 PERCENT_RANK() OVER (ORDER BY c.orders_by_amt) pct_rnk
 FROM all_transactions c
 WHERE trans_type_cd != ‘F’;
 Table created.
 Elapsed: 02:56:46:22

Estimated time run was 16 hrs old way

SQL>CREATE INDEX orpt_tst_idx1
 on orpt_test(m_id, file_id) PARALLEL 4 nologging
 tablespace idx_data1;

Index created.
Elapsed: 01:42:15:43

Set table to unparallel and gather stats—
 ALTER TABLE orpt_test noparallel;
 exec dbms_stats.gather_table_stats( ownname=>’OWNER’,tabname=>’ORPT_TEST’, ESTIMATE_PERCENT=>.00001, METHOD_OPT=> ‘FOR ALL INDEXED COLUMNS’, GRANULARITY=>’ALL’,CASCADE=>TRUE, DEGREE=>4);

Create the secondary table that utilizes the index to ease into the hash and limit the temp usage on the amount of data it ends up having to perform in the sort if it was performed in one CTAS—

create table NEW_RPT_TBL
tablespace tbl_data2 compress nologging pctfree 0 parallel 4
 as
 SELECT /*+ parallel(o,4) */
 b_id,
 MIN(t_dt) this one,
 MAX(t_dt)that one,
 COUNT(*) ,
 SUM(o_amt) ,
 SUM(o_amt) this one,
 ….so on and so forth, more and more aggregation
 FROM
 (
 SELECT — /*+ parallel(c,4)  index(c,orpt_tst_idx1) use_hash(c cm mm m) */
 c.m_id,

 FROM orpt_test c , < Here’s our new table, already much of the aggregation performed!
 m_tbl1 mm,
 crm_mtbl2 cm,
 files_form3 m
 where c.m_id = mm.m_id
 AND mm.m_cd = cm.m_cd
 AND c.file_id = m.file_id (+)
 AND c.m_id = m.m_id (+)
 ) o
 GROUP BY m_id,
 c_cd,
 m_cd,
 l_name,
 f_id,
 b_id;

Table created.

Elapsed: 02:12:32:56

A quick observations of the work area verifies how the temp sorts are spread across the temp tablespaces that encompass the temp tablespace group to eliminate waiting as we write to one tempfile at a time:

SQL_TEXT SID TABLESPACE OPERATION_TYPE Temp MB
create table orpt_test tbl_data1

803

TEMP_G1 WINDOW (SORT)

10229

create table orpt_test tbl_data1

835

TEMP_G4 WINDOW (SORT)

13872

create table orpt_test tbl_data1

836

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

847

TEMP_G1 WINDOW (SORT)

12974

create table orpt_test tbl_data1

803

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

847

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

835

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

827

TEMP_G3 WINDOW (SORT)

11408

create table orpt_test tbl_data1

837

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

837

TEMP_G4 WINDOW (SORT)

11535

create table orpt_test tbl_data1

827

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

828

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

828

TEMP_G3 WINDOW (SORT)

15940

create table orpt_test tbl_data1

853

LOAD WRITE BUFFERS
create table orpt_test tbl_data1

836

TEMP_G2 WINDOW (SORT)

10807

create table orpt_test tbl_data1

853

TEMP_G2 WINDOW (SORT)

18028

 

Drop “work” Table/Indexes post final table creation.
 DROP TABLE ORPT_TEST PURGE;

NEW PROCESS COMPLETE…

Was the goal met?
Largest amount of temp space used:  182GB,  (Old way: 721GB)
Time to complete process:  5hrs, 23 min., (Old way: 16 hrs)

Temp Tablespace Group Benefits to this entire tuning process:
Temp space usage without temp tablespace group, new process: 512GB, (over twice as much!)
Process completion time of new process without temp tablespace group: 11hrs, (5 hrs longer!)
Temp Tablespace Waits Eliminated with temp tablespace groups from original process:
Reads-  66%
Writes- 29%

Where are the improvements seen?
1.  I/O is distributed evenly across multiple datafiles instead of one datafile when presented with onle a single temp tablespace.

2.  Sorts in parallel performance increased, on average, over 20% for window and buffer sorts, (improvement was not seen for sorts involving rowid range scans…)
3.  Database overall improved as resources were re-allocated to all processes that may have been hindered by I/O bottleneck by heavy swapping to temp tablespace.

4.  For large hash processes, where temp tablespace usage is unavoidable, temp tablespace groups offer significant improvement in I/O performance and more CPU utilization.

 


« Previous PageNext Page »