RMOUG 2012 Presentation Slides!
Graham Wood was kind enough to send me a quick email and let me know that my presentation slides weren’t downloading correctly from RMOUG’s website, so I’ve put them here for anyone that may like to have them.
I’m still recovering from the post RMOUG Training Day festivities, aka Foak Table, which was held up in Breckenridge again this year. I will make sure to blog about Training Days and the post event at a later date, but until then, here is a link to my slides. Please email me with any questions or comments you may have- always glad to chat!
EM12C
Sherlock Holmes for DBA’s
~Kellyn
Getting the Most Out of Enterprise Manager and Notifications
I ran out of time before I was able to provide an adequate white paper this year for my EM12c presentation, but there was some valuable info in what I had started, so thought I’d turn it into a mulit-part blog post…
The Oracle Enterprise Manager, (OEM) is the standard monitoring tool for Enterprise Edition Oracle databases. The interface allows the DBA to manage the entire Oracle stack using a single console. The installation and interface is easy for most DBA’s to implement and utilize. In the newest EM12c version, it encompasses integrated systems management, application management, application-to-disk and cloud management , the following documentation will include some 10g but mostly, the EM12c version of the product.
The goal for any DBA is to be notified of an issue and only notified when there is an actual issue. One of the most common downfalls of a monitored environment is the misconception that receiving emails upon success or checks stating that a process is running, is correctly configured monitoring. This produces an environment that leads to a “white noise” effect, where DBA’s may misinterpret a notification as one of the success notifications when a real issue has actually arisen.
The optimal design is one where redundancy checks of the monitoring system is included to ensure that if there is an issue with the monitoring environment that deters it from monitoring and sending alerts, the system has a redundant check on a secondary server that is notifying the DBA on call of the issue. Multiple Oracle Management Server Repositories , residing on separate servers can address this, but in my opinion, would be overkill when simple additional scripts run from a cron would suffice.
OEM Basics
The Oracle Enterprise Manager, 10g and EM12c comprises of the following, basic components:
- The Oracle Management Server/Service, (OMS).
- The OMS repository database.
- The OMS Home, aka the EM state directory, which contains the bin files, log files, collection files and configuration files.
- The Agent installation, application and configuration on each monitored host server.
EM12c has the additional weblogic components included automatically, along with the Cloud support features which can be installed.
Licensing
As long as the OMS is on its own server and is only used for the OMS repository and/or an RMAN backup catalog repository, individual oracle licensing IS NOT required for the Oracle database utilized for the repositories, (Please see pg. 15 of the following PDF from Oracle.)
http://download.oracle.com/docs/cd/B19306_01/license.102/b40010.pdf
Monitoring the OEM from a secondary server:
This can be performed easily from a shell script and allows the DBA(s) to rest easy, knowing that the interface to their database environment, if impacted, will notify them from a secondary server. This allows for redundant checks without sending an “I’m OK” notification to grant comfort:
#!/usr/bin/ksh#----------------------------------------------------------------------------# Author: Kellyn Pot’Vin# Redundancy Check to OEM Server to ensure EM is up and Running!# Verify that all parameters are set in the remote host env. vars...#----------------------------------------------------------------------------if (( $# != 2 ))thenecho "usage: $0 SID hostname"exit 1fi##----------------------------------------------------------------------------# Set up Oracle environment...#----------------------------------------------------------------------------export ORACLE_SID=$1export who_to_ping=$2echo "Oracle SID: "${ORACLE_SID export AVL_LOG=${LOG_DIR}/oem_avl.logexport AVL_ERR=${LOG_DIR}/oem_avl.errexport AVL_PNG_ERR=${LOG_DIR}/ping_avl.err#Check Repository DB for Access$ORACLE_HOME/bin/sqlplus oem_chk/"${pass}"@${ORACLE_SID} <<EOFspool ${AVL_LOG};select sum(1+1) from dual@grid_chk;spool off;exit;EOFcat ${AVL_LOG} | grep "ORA-" > ${AVL_ERR}if [ -s ${AVL_ERR} ]thenecho|mail -s "No Response from Grid Control from Oracle Management Server!" "<EML_Address>" < ${AVL_LOG}fi#Check to verify that EM12C is up! This requires SSH authentication from remote server. ssh oracle/n0c1u3ata11@ "$OMS_HOME/bin/emctl status oms" | grep Down > ${EM_LOG} if [ -s ${EM_LOG} ] then echo|mail -s "No Response from EM12C Grid Control!" "<EML_ADDRESS>" < ${EM_LOG} exit fi#Check Grid Server, ensure that you can ping it as welldateping -c 3 ${who_to_ping}if [ $? -ne 0 ]thensleep 5ping -c 3 ${who_to_ping}if [ $? -ne 0 ]thenecho "`hostname` CANNOT PING ${who_to_ping} the EM Server!" > /tmp/ping.$$echo|mail -s "`hostname` CANNOT PING ${who_to_ping} from Oracle Managent Server!" "<EML_Address>"rm -f /tmp/ping.$$fifirm -f ${AVL_LOG}rm -f ${AVL_ERR}exit
Pretty simple to schedule in cron:
0,15,30,45 * * * * /home/oracle/scripts/admin/chk_grid.ksh <dbname> <servername> > /dev/null 2>&1
I’ve chosen a 15 minute interval on the checks, but this can be done with any interval as requirements are set.
Escalation
Due to Sarbanes-Oxley and/or outside support contracts, an enhanced escalation process may be required. One that can offer more choices and escalation paths then what is currently offered in the 10g and EM12c console. A simple package/support object implementation can be created to support this type of requirement that works with OEM. The code presented here will allow one to set the on-call DBA, scheduler and escalation outside of the OEM interface, but will all OEM alerts and escalation from the OMS will utilize the data found in the supporting tables.
I will try to upload and post the supporting schema and code soon on dbakevlar.com
Blacking out DB from Agent Side with Shell Scripts:
Blackouts can be performed via a shell script to assist in automated processes that could trigger OEM alerts, sending false notifications when a blackout script is all that is required for Unix Admin or Application support personnel.
#!/usr/local/bin/ksh# ######################################################## start_blackout.ksh# Usage ./start_blackout.ksh <oracle_sid># Rewrite Date: 4/22/2011# Modified by: reckl#########################################################usage="$0 <db_name>"if (($# != 1))thenprint $usageexit 1fiORACLE_SID=$1sudo su - oracle -c "$AGENT_HOME/bin/emctl start blackout ${ORACLE_SID}_blackout ${ORACLE_SID}"exit
Patching
I am a supporter of patch deployments through OEM. If you have not configured this or are working to get this feature approved in your database environments, I highly recommend it. In the “Deployments” tab of the EM console, first ensure that the MOS credentials is configured:
Once this has been set up for your environment, you can then designate a patching strategy to deploy to development, test and then production with a full testing cycle that will make any DBA stop quaking in their boots when they receive the notification that new patches have arrived from Oracle Support.
The Deployment Procedure Manager allows the DBA group to schedule deployments of necessary patching with the most effective schedule and little DBA involvement required.
The DBA can then set up patching resource allocation and requirements from the “Offline Patching” UI and choose what to install for automatically patching:
To be continued in next post….
RMOUG 2012!
As busy as I am with the 11g project, (no real weekends off for 9 weekends and counting…
) I wanted to take some time out to write on the upcoming RMOUG Training Days 2012.
For those of you Oracle techies, DBA or Developer, this is a must attend and for any who do choose to travel and attend- kudos to you, good choice. As one of the directors on the RMOUG board, I can attest to the incredible amount of time and resources that have been invested into what is easily, the best grass-roots, Oracle conference around. John Jeunette, the Training Days Director for the 2012 event has done a bang-up job with planning and none of us could get along without the continued support from those, such as Peggy King and Team YCC.
Oracle folks who do attend are going to be treated to a keynote from one of my favorite DBA Gods, Cary Millsap, along with presentations from some of the greats in the DBA world, including Jonathan Lewis, Debra Lilley, Mark Farnham, John King, Alex Gorbachev, Guy Harrison, Dan Morgan, Marco Gralike, James Morle and Graham Woods. We also can’t forget the local favorites, like Tim Gorman and Randy Cunningham.
Upon quick count, I realized we have 14 Oracle ACE’s and 14 ACE Directors speaking this year. With all these ACE folks, we’ve decided to create a special event with them, something to really find out what it means to be an Oracle ACE. Stay tuned, it’s shaping up to be an awesome session.
We also have the benefit for those interested in Oracle RAC of having “RAC Attack”, a great workshop, first offered at Oracle Open World and UKOUG, now also offered at RMOUG Training Days for 2012! This is a great opportunity to get your “RAC on” and learn from some of the best on how to properly build a RAC environment and when. Pythian and Apress will be sharing in the sponsorship of this great workshop at the RMOUG training days event. Show up with a laptop that meets the requirements for the workshop and build your own, how great is that?
That’s a pretty impressive count for a two day Oracle conference when you think of it! I’m thrilled with the quality on the content of the presentations this year and how the event is coming together. We had a record year for abstract submissions and it was a difficult decision deciding who would be in the schedule, so many great, solid abstracts submitted!
So if you are interested in attending, here’s the link to RMOUG’s Training Days Event. You can view the current schedule, registration and biographies for the speakers. There’s only a short time left to take advantage of the advance registration rate, saving even more if you become a member, (that’s me as the membership director just selling it a bit!
)
http://www.teamycc.com/RMOUG_2012_Conference/Registration.html
Solid Choices for Oracle Tuning on Solid State Disk
As I continue to work on very large databases, (VLDB), I am exposed to more opportunities to speed up IO. This can involve Oracle’s solution of Exadata or stand alone improvements with options such as SSD, (Solid State Disk) which can offer faster IO performance at a fraction of the price. When this option becomes a reality, there will always be non-DBA’s that advise what would best benefit from the hardware, but to take the time to research what would truly benefit is important for the DBA to perform.
The Just the Facts on Solid State Disk:
There are several types of SSD available:
- Flash memory-based
- DRAM-based
- Cache or Buffer
The SSD can have different types of host interfaces, depending on the main hardware you are interfacing with and/or vendor choices:
- PCI
- Fibre Channel
- ATA, (Serial or Parallel)
- SCSI, (Serial or Parallel)
- USB
Rarely do we get a chance to move entire Terabytes of data onto fast disk, but rather are offered limited, faster disk to utilize for crucial objects that can give us the “most bang for the buck.” Commonly this is due to the price of these specialized and impressive IO read/write drives, but it can also be due to limitations on the hardware they are interfacing with.
As I started working on databases that utilized faster disk, with or without ASM, it became apparent that what these speedy disks were allocated to wasn’t always what SHOULD have been placed in the new location. Where indexes, look up tables and temp tablespace experienced impressive gains vs. the standard disk drives they had formerly resided on, I have been quick to dissuade anyone from placing redo logs on SSD.
I’m going to go through what data, reports and queries that I utilize to decide what should be on fast disk, along with my benchmark findings when I did have the opportunity to create an entire database on Fusion Octal fast disk.
Getting the most out of SSD is all about getting what won’t fit in memory, (SGA and PGA) onto a faster disk. All consistently large, [consistent] read tasks that the database must direct to disk for, but doesn’t write as often to disk, (visualizing batch loads vs. heavy transactional) and ONCE TUNING OPPORTUNITIES HAVE BEEN EXHAUSTED, are excellent choices for research when deciding what should be placed on SSD. This information can be achieved multiple ways as a DBA. AWR/ADDM and ASH reports can provide solid, high level data to direct you in the right direction if you are not as familiar with your data or wish to validate some of what you already know. For those of you that do not have the tuning pack license, then Statspack can do the same. Tracing can offer a detailed output that will tell you about objects that you are often going to slower disk for. OEM can provide graphs that will show IO demands on a heavily “weighted” system, as can other GUI tools in the market.
AWR/Statspack and I/O Wait Indicators
Your group has already decided that IO is an issue and should have verified this in the top 5 wait events that can be seen through AWR or statspack. The snapshots utilized for this examination should be times of heavy IO in the database environment as can be seen in the example Table 1.
Table 1
| Top 5 Timed Events Avg %Total | |||||
| ~~~~~~~~~~~~~~~~~~ wait Call | |||||
| Event | Waits | Time (s) | (ms) | Time | Wait Class |
| —————————— | ———— | ———– | —— | —— | ———- |
| db file sequential read |
979,382 |
36,066 |
37 |
45.1 |
User I/O |
| db file scattered read |
5,083,058 |
22,401 |
4 |
28 |
User I/O |
| Direct path write temp |
13,577 |
17 |
User I/O | ||
| db file parallel write |
464,287 |
5,136 |
11 |
6.4 |
System I/O |
| direct path read temp |
366,956 |
2,671 |
7 |
3.3 |
User I/O |
From here, we inspect our AWR or statspack reports, there is a section that should be inspected first and foremost, referred to as Segments by Physical Reads the output from this section can be seen in Table 2.
Table 2
| Tablespace | Obj. | Physical | ||||
| Owner | Name | Object Name | Type | Reads | %Total | |
| ———- | ———- | ——————– | —– | ———— | ——- | |
| SCHM_OWNR | TBLSPC1_DATA | TBL1_FILE_1 | TABLE |
86,788,592 |
47.87 |
|
| SCHM_OWNR | TBLSPC2_DATA | TBL1_FILE_PK | INDEX |
80,544,192 |
46.59 |
|
| SCHM_OWNR | TBLSPC1_IDX | TBL2_MR_PK | INDEX |
74,742,752 |
45.39 |
|
| SCHM_OWNR | TBLSPC1_IDX | TBL3_M_PK | INDEX |
40,924,576 |
28.43 |
|
| SCHM_OWNR | TBLSPC2_DATA | TBL4 | TABLE |
26,790,464 |
15.52 |
Tuning, Always the First Step
The first step in the process is to inspect I/O issues with large objects. Is there a partitioning strategy that can take the physical reads and IO down for the objects in question? If there is not or there is still a requirement for full scans or large index or partition scans, then you need to look and see what tuning options there are for the code involved. If there is already partitioning in place, is it the right partitioning key and/or is sub-partitioning in order.
Once this process has completed, then inspect performance for physical reads again and verify the objects in question are still a bottleneck for IO. If so, then they may be a valid choice to relocate to a new ASM diskgroup residing on SSD.
Creating a specific ASM disk group for the SSD disk is the obvious choice, as the SSD will not be part of the standard disk groups without performance and rebalance challenges. Once complete, you will then have the new SSD diskgroup available for use.
Inspect the sizes of the objects in left in your “top 5 physical IO objects” and decide what you move over for initial testing. I commonly make a copy and test a copy of the objects against the code to test true performance gains, ensuring that there are no required physical storage required changes as well. ***over what you need for capacity growth estimates. What should you bring over next? Now if we are still using the same reports that are showing above, I would look carefully at what I have available and would start to inspect temp usage as a possible next candidate.
It is important that if you consider temp, that it is in a “controlled” state for your environment. It is not uncommon for many DBA’s to set TEMP to autoextend and not pay attention to temp tablespace usage. I fully advocate the opposite and track temp usage, along with monitor alerts with scripts for anytime any user or process consumes a certain threshold per process on any of my production systems.
Considering the amount of waits on temp read and writes, tuning opportunities may be boundless on hash joins and sorting. Low hanging fruit in these categories will involve looking for “order by’s” that have been left in for insert statements, (not sure how often I’ve seen this, but it’s a very common and an unfortunate occurrence…) In regards to hash joins, there can be examples of wide reporting tables only one or two columns are actually required for the results and the join. A choice of CTAS, (create table as select) of only the columns required for the process, dropping post the join to the second table, can drastically trim time and temp usage for a hash of tables that involve only a few columns on a wide table where an index is a less than efficient answer. This choice allows the performance gain of the hash without the performance hit of swapping to temp when wide tables cause PGA to never be enough.
After tuning temp usage due to large hash joins and sorting outside of PGA, inspect the max temp tablespace required. If this will now fit without impacting capacity planning requirements for the SSD, move the temp tablespace onto the SSD ASM disk group.
Scripts to Inspect IO Usage
There are many scripts that can be written or available on the web and in reports to inspect IO usage. The following is a good example of one:
select
io.cnt Count,
io.event Event,
substr(io.obj,1,20) Object_Name,
io.p1 P1_Value,
f.tablespace_name Tablespace_Name
from
(
select
count(*) cnt,
round(count(*)/(60*60),2) aas,
substr(event,0,15) event,
nvl(o.object_name,decode(CURRENT_OBJ#,-1,0,CURRENT_OBJ#)) obj,
ash.p1,
o.object_type otype
from v$active_session_history ash,
all_objects o
where ( event like 'db file s%' or event like 'direct%' )
and o.object_id (+)= ash.CURRENT_OBJ#
and sample_time > sysdate - 7/(60*24)
group by
substr(event,0,15) ,
CURRENT_OBJ#, o.object_name ,
o.object_type ,
ash.p1
) io,
dba_data_files f
where
f.file_id = io.p1
and f.tablespace_name not like '%RAM%' –-exclude SSD objects
Order by io.cnt desc
/
| COUNT | EVENT | OBJECT_NAME | P1_Value | TABLESPACE_NAME |
| 122 | db file sequent | TBL1_CHAIN | 102 | N_DATA |
| 33 | db file sequent | HH_TBL1_FDX01 | 161 | H_INDX1 |
| 28 | db file sequent | CA_TBL2_PK | 270 | C_INDX |
| 25 | db file sequent | I_TBL3_IDX02 | 225 | I_INDX2 |
| 21 | db file sequent | E_TBL4 | 43 | E_DATA |
| 20 | direct path rea | I_MRG_TBL | 75 | M_DATA |
| 23 | db file scatter | C_TBL3 | 50 | C_DATA |
The above script gives you clear examples of what objects you should point your research to, first indexes, (sequential) and in this case, a look up table, (direct path read).
Building a Database Entirely on SSD
We were given this opportunity recently to test performance gains and decide if budget should be set aside for investing in the hardware to build entire databases on SSD vs. strategic objects within a database. We have a process that takes approximately five days to aggregate a snapshot in time, up to 12TB of data. The goal was to see, could we accomplish this in two days if given all SSD for the database vs. a combination of standard disks on a disk array and SSD for high read/write data.
This sounds like a slam dunk, but it is more challenging than one might think. There are small things to that have to be updated in the database, such as system statistics in 10g to ensure the database knows fully the gift you have granted it, but then you may also need to make significant logical changes to take advantage of the hardware due to limitations in CPU and memory per process. The build was on a server that utilized hyper-threading and some of the “performance settings” actually appeared to work against the database vs. the lesser setting that might stripe the CPU usage more efficiently. The graph below show the hits against the first 32 of “hyper-threaded” 64 CPU’s:
Figure 1
This graph in Figure 1 only shows CPU usage over a small snapshot of time, but over long intervals, it showed the same differing data vs. SAR or other reports from the Admin side- the database continued to hit the same CPU’s over and over, leaving other CPU’s untouched for extended periods of time. This did not bode well for the database performance, high read/write capability or not.
The build time improvements were impressive, but the one thing that must be included is that the improvement in performance was not just a hardware improvement step. There was first the additional hardware and then a tuning process at the database level to ensure the processes were able to achieve the best performance the solid state disk offered it, (comparison of columns New Run Time against the Final Run Time in Figure 2.)
| Process Step | Standard Disk/SSD | Total Min. | SSD Without Tuning | New Run Time | Initial Perf. Gain | SSD With Tuning | Final Run Time | Total Perf. Gain |
| DIM Table 1 CTAS | 4 HOURS 20 MINUTES 27 SECS | 260 min | 3 HOURS 38 MINUTES 24 SECS | 218 min |
19% |
2 HOURS 43 MINUTES 45 SECS | 164 min |
58% |
| CTAS Table 2 | 4 HOURS 23 MINUTES 11 SECS | 263 min | 0 HOURS 16 MINUTES 2 SECS | 16 min | 16 Times Perf | 0 HOURS 19 MINUTES 8 SECS | No Tuning | N/A |
| CTAS Table 3 | 1 HOURS 29 MINUTES 21 SECS | 89 min | 0 HOURS 44 MINUTES 27 SECS | 44 min | 2 Times Perf. | 0 HOURS 57 MINUTES 19 SECS | No Tuning | N/A |
| CTAS Table 4 | 2 HOURS 55 MINUTES 58 SECS | 175 min | 0 HOURS 42 MINUTES 16 SECS | 42 min | 4 Times Perf. | 0 HOURS 47 MINUTES 55 SECS | No Tuning | N/A |
| CTAS Table 5 | 10 HOURS 7 MINUTES 41 SECS | 607 min | 1 HOURS 50 MINUTES 7 SECS | 110 min | 6 Times Perf. | 1 HOURS 42 MINUTES 6 SECS | No Tuning | N/A |
| CTAS Table 6 | 11 HOURS 32 MINUTES 40 SECS | 692 min | 4 HOURS 51 MINUTES 17 SECS | 291 min | 2 Times Perf. | 5 HOURS 9 MINUTES 26 SECS | No Tuning | N/A |
| Multiple Table Aggregation | 25 HOURS 15 MINUTES 3 SECS | 1515 min | 9 HOURS 58 MINUTES 1 SECS | 598 min | 3 Times Perf. | 5 HOURS 16 MINUTES 31 SECS | 316 min | 5 Times Perf. |
| Summary Table 1 Agg. | 25 HOURS 24 MINUTES 35 SECS | 1524 min | 10 HOURS 0 MINUTES 20 SECS | 600 min | 3 Times Perf. | 5 HOURS 18 MINUTES 14 SECS | 318 min | 5 Times Perf. |
| Summary Table 2 Agg. | 25 HOURS 23 MINUTES 56 SECS | 1523 min | 10 HOURS 7 MINUTES 22 SECS | 607 min | 3 Times Perf. | 5 HOURS 25 MINUTES 54 SECS | 325 min | 5 Times Perf. |
| Index Creation Table 1 | 1 HOURS 16 MINUTES 33 SECS | 76 min | 0 HOURS 53 MINUTES 42 SECS | 54 min |
39% |
0 HOURS 53 MINUTES 14 SECS | No Tuning | N/A |
| Index Creation Table 2 | 1 HOURS 22 MINUTES 55 SECS | 82 min | 0 HOURS 59 MINUTES 55 SECS | 60 min |
28% |
0 HOURS 59 MINUTES 6 SECS | No Tuning | N/A |
| CTAS Aggr Table 3 | 6 HOURS 36 MINUTES 20 SECS | 396 min | 3 HOURS 21 MINUTES 18 SECS | 201 min |
50% |
3 HOURS 13 MINUTES 38 SECS | No Tuning | N/A |
| Index Creation Table 3 | 0 HOURS 52 MINUTES 2 SECS | 52 min | 0 HOURS 40 MINUTES 3 SECS | 40 min |
24% |
0 HOURS 48 MINUTES 15 SECS | No Tuning | N/A |
| CTAS Aggr. Table 4 | 2 HOURS 41 MINUTES 13 SECS | 161 min | 1 HOURS 32 MINUTES 8 SECS | 92 min |
43% |
1 HOURS 28 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Aggr Table 5 | 3 HOURS 46 MINUTES 59 SECS | 226 min | 2 HOURS 58 MINUTES 29 SECS | 179 min |
21% |
2 HOURS 55 MINUTES 20 SECS | No Tuning | N/A |
| CTAS Aggr. Table 6 | 0 HOURS 51 MINUTES 27 SECS | 51 min | 0 HOURS 36 MINUTES 46 SECS | 37 min |
28% |
0 HOURS 34 MINUTES 33 SECS | No Tuning | N/A |
| Insert to Table 6 | 0 HOURS 5 MINUTES 24 SECS | 5 min | 0 HOURS 5 MINUTES 6 SECS | 5 min | NONE | 0 HOURS 4 MINUTES 52 SECS | 5 min | NONE |
| Update to Table 6 | 26 HOURS 40 MINUTES 41 SECS | 1640 min | 25 HOURS 9 MINUTES 52 SECS | 1510 min |
8% |
17 HOURS 44 MINUTES 2 SECS | 1084 min |
44% |
| CTAS Table 7 | 1 HOURS 1 MINUTES 48 SECS | 61 min | 0 HOURS 7 MINUTES 43 SECS | 8 min | 13 Times Perf. | 0 HOURS 6 MINUTES 37 SECS | No Tuning | N/A |
| CTAS Aggr Table 8 | 0 HOURS 28 MINUTES 31 SECS | 28 min | 0 HOURS 22 MINUTES 12 SECS | 22 min |
22% |
0 HOURS 19 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Mod TBLS 9/10 | 1 HOURS 42 MINUTES 36 SECS | 102 min | 1 HOURS 42 MINUTES 22 SECS | 102 min | NONE | 1 HOURS 39 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Table Aggr. 11 | 2 HOURS 26 MINUTES 58 SECS | 147 min | 1 HOURS 29 MINUTES 53 SECS | 90 min |
49% |
1 HOURS 24 MINUTES 42 SECS | No Tuning | N/A |
| CTAS Aggr. Table 12 | 7 HOURS 24 MINUTES 44 SECS | 445 min | 6 HOURS 7 MINUTES 48 SECS | 368 min |
18% |
6 HOURS 6 MINUTES 40 SECS | No Tuning | N/A |
| CTAS Aggr. Table 13 | 6 HOURS 47 MINUTES 31 SECS | 408 min | 4 HOURS 38 MINUTES 1 SECS | 278 min |
32% |
5 HOURS 5 MINUTES 32 SECS | No Tuning | N/A |
| CTAS Aggr. Table 14 | 25 HOURS 23 MINUTES 32 SECS | 1524 min | 10 HOURS 9 MINUTES 51 SECS | 610 min | 3 Times Perf. | 5 HOURS 27 MINUTES 17 SECS | 327 min | 5 Times Perf. |
| CTAS Aggr. Table 15 | 1 HOURS 21 MINUTES 59 SECS | 82 min | 0 HOURS 22 MINUTES 49 SECS | 23 min |
65% |
0 HOURS 4 MINUTES 33 SECS | 4 min | 20 Times Perf. |
| Update to Table 13 | 0 HOURS 12 MINUTES 45 SECS | 13 min | 0 HOURS 49 MINUTES 58 SECS | 50 min | 3 Times LOSS!! | 0 HOURS 1 MINUTES 22 SECS | 1 min | 9 Times Perf. |
Figure 2
I must note that what challenged us in unresolved issues were waits on CPU due to hyper-threaded CPU issues.
Tuning involved for the third columns time elapsed involved the following:
- Bind variable additions
- Literal additions where bind peeking was an issue.
- A change from ASSM, (Automatic Segment Space Management) to manual segment space management where freelists could be set at the object level, (dynamically allocated freelists were not able to adjust quickly enough for some of the load processes…)
- Changes to initial transactions, percent free and parallel that made sense, (upping it for some, downgrading it for others that did not work with the partitioning or a need for partitioning…)
Inspecting I/O by SQL_ID
This script, (adopted from Tim Gorman’s sqlhistory.sql from, www.evdbt.com) does a wonderful job of pulling a clean, clear picture of what physical and logical I/O is occurring in a single SQL_ID, seen here in Table 3 :
Table 3
| +————————————————————————————————–+ | |||||||||
| Plan HV Min Snap Max Snap Execs LIO PIO CPU Elapsed | |||||||||
| +————————————————————————————————–+ | |||||||||
| 1766271350 659 659 1 593,134,283 12,961,814 14,657.45 15,067.05 | |||||||||
| +————————————————————————————————–+ | |||||||||
| ========== PHV = 1766271350========== | |||||||||
| First seen from “07/15/11 13:00:31″ (snap #659) | |||||||||
| Last seen from “07/15/11 13:00:31″ (snap #659) | |||||||||
| Execs LIO PIO CPU Elapsed | |||||||||
| ===== === === === ======= | |||||||||
| 1 593,134,283 12,961,814 14,657.45 15,067.05 | |||||||||
| Plan hash value: 1766271350 | |||||||||
| TQ | IN-OUT | PQ Distrib | ||||||
|
0 |
CREATE TABLE STATEMENT | 1543M(100) | ||||||
|
1 |
PX COORDINATOR | |||||||
|
2 |
PX SEND QC (RANDOM) | :TQ10001 | 464M | 397G | 4128K (7) | |||
| Q1,01 | P->S | QC (RAND) | ||||||
|
3 |
LOAD AS SELECT | |||||||
| Q1,01 | PCWP | |||||||
|
4 |
PX RECEIVE | 464M | 397G | 4128K (7) | ||||
| Q1,01 | PCWP | |||||||
|
5 |
PX SEND RANDOM LOCAL | :TQ10000 | 464M | 397G | 4128K (7) | |||
| Q1,00 | P->P | RANDOM LOCA | ||||||
|
6 |
PX PARTITION LIST ALL | 464M | 397G | 4128K (7) |
1 |
1000 |
||
| Q1,00 | PCWC | |||||||
|
7 |
HASH JOIN RIGHT OUTER | 464M | 397G | 14G | 4128K (7) | |||
| Q1,00 | PCWP | |||||||
|
8 |
TABLE ACCESS FULL | HDN_TBL | 231M | 112G | 576K (22) |
1 |
1000 |
|
| Q1,00 | PCWP | |||||||
|
9 |
HASH JOIN RIGHT OUTER | 464M | 171G | 6967M | 1551K (7) | |||
| Q1,00 | PCWP | |||||||
|
10 |
TABLE ACCESS FULL | HD_TBL | 310M | 50G | 144K (34) |
1 |
1000 |
|
| Q1,00 | PCWP | |||||||
|
11 |
TABLE ACCESS FULL | H_TBL | 464M | 95G | 339K (13) |
1 |
1000 |
|
| Q1,00 | PCWP |
| Summary Execution Statistics Over Time | |||||||||
| Avg Avg | |||||||||
| Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs) | |||||||||
| Time Execs Per Exec Per Exec Per Exec Per Exec | |||||||||
| ———— ——– ——————- ——————- ——————- ——————- | |||||||||
| 15-JUL 13:00 1 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| ——– ——————- ——————- ——————- ——————- | |||||||||
| avg 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| sum 1 | |||||||||
| Per-Plan Execution Statistics Over Time | |||||||||
| Avg Avg | |||||||||
| Plan Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs) | |||||||||
| Hash Value Time Execs Per Exec Per Exec Per Exec Per Exec | |||||||||
| ———- ———— ——– ——————- ——————- ——————- ——————- | |||||||||
| 1766271350 15-JUL 13:00 1 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| ********** ——– ——————- ——————- ——————- ——————- | |||||||||
| avg 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| sum 1 | |||||||||
+—————————————————————————————————————————
This report clearly shows the amount of logical vs. physical I/O coming from the statement in question. This gives the DBA a clear indicator if any object in the poor performing process would benefit a move to SSD or if tuning is in order to eliminate the I/O performance challenge. A combination of both may be chosen, as there are multiple right outer hash-joins which clearly show as the performance hit in the time elapsed and in the temp tablespace usage/significant I/O categories, (note that the process needs to scan ALL the partitions for the objects in question…)
SSD and Forced Hash Joins on Indexes
When a database design is impacted by the front-end tool required to present data in a proper format, such as Business Analytics Software, the price can be high to the DBA who has to manage resource usage. Many times the data must be presented in a very flat, wide format and requires a large amount of data pulled across a network interface. This can be in anywhere from a couple 100GB’s to multiple Terabytes. When you are the DBA looking at ways to increase performance when logical performance tuning is limited, solid state disk can offer you gains not offered anywhere else.
Business Analytics Software often will query a few 100GB to 1TB objects, hash join and then perform an order by. For the DBA, to create an index, then using a hint to force a hash join between an index and the large table can improve performance greatly, but to move the index onto SSD can increase the hash and limit the requirements for SSD at the same time.
create table new_ordertmp_tbl compress pctfree 0 tablespace data_1 as
SELECT /*+ USE_HASH(t,i) INDEX_FFS(i,I_TBL2_IDX) INDEX(t,CT1) */
cast(MOD(t.i_id, 1000) as number(3)) im_key
, LEAST(ROUND(MONTHS_BETWEEN(:b1, t.t_dt) + .4999 ), 48) AS r_key , t.i_id AS ib_id
, t.m_id, t.t_dt, cast(:b5 as varchar2(5)) m_cd, FIRST_VALUE(i.ib_id) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS ibcid, t.t_nbr, cast(TO_NUMBER(TO_CHAR(FIRST_VALUE(t.t_dt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
), ''YYYYMMDD'')) as number(8)) AS d_id,
FIRST_VALUE(DECODE(t.oct_cd, NULL, 'O','W', 'O', 'E', 'O', 'R', 'R', 'F')
) OVER(PARTITION BY t.d_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS tct_cd, SUM(t.ot_amt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS ot_amt
, FIRST_VALUE(NVL(t.pmt_cd, ''U'')) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS pmt_cd, SUM(t.i_cnt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS i_cnt
, FIRST_VALUE(t.cs_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cs_cd, FIRST_VALUE(t.cc_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cc_cd, t.oct_cd
FROM CT_TBL1 t, I_TBL2 i
WHERE t.m_id = :b5
AND t.t_dt BETWEEN :b1 AND :b2 AND i.ibid = t.i_id
order by i.i_id;
Object Sizes:
CT_TBL1, partition 7= 800GB
I_TBL2=1.2TB
While the I_TBL2_IDX, the index created on the I_TBL2 and possessing only the columns required for this routinely run query and leading with the I_ID column, is only 200GB.
Execution Plan for Query:
Table 4
| Description | Object | Cost | Cardinality | Bytes | PartitionID |
| SELECT STATEMENT, GOAL = ALL_ROWS |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| WINDOW BUFFER |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| FILTER | |||||
| HASH JOIN |
107371 |
16356015 |
10079496 |
||
| PARTITION LIST SINGLE |
330 |
16356015 |
8166868 |
7 |
|
| TABLE ACCESS FULL | CT_TBL1 |
330 |
16356015 |
8166868 |
7 |
| INDEX FAST FULL SCAN | I_TBL2_IDX |
23597 |
6399400008 |
2120000 |
The hash join is thus, decreased to a total size of 1TB, vs. the much larger size it would have been if the hash join would have been run against the table. By running it with the index residing on solid state disks, the actual performance to create the table from the CTAS in question was increased by 12 fold.
What does the IO look like on the solid state disk vs. the old standard disk? The differences are startling when viewed through iostat, (table 5).
Table 5
| Device: | rsec/s | wsec/s | avgqu-sz | %util |
| Raid 5 Disk |
55200 |
30224 |
215.72 |
84.03 |
| SSD |
52394.67 |
41306 |
223.74 |
7.49 |
As you can see, the IO is much less impacting on the SSD than the standard disk.
Via graphs, such as from Cacti, the differences in IO throughput can be seen for standard disk, (figure 3) and solid state disk, (figure 4.)
Figure 3
Figure 4
Summary
Solid state disk is here to stay and often will be seen as a “silver bullet” for production I/O issues. The goal of the DBA is to utilize this technology in a way that does not replace logical tuning and focus instead, in ways that may actually support positive changes enforcing both physical and logical tuning to get the most out of the new hardware available on the market today.
Locating UNKNOWN SQL_ID Info in OEM Through AWR
Rarely are reports based off large snapshot variances helpful to a DBA unless you come across an odd situation such as this one…Better yet, we need to know a little bit about our AWR tables behind our reports so we can piece together what the reports leave out…:)
Scenario: After-hours support has killed a session after high temp usage has occurred. You, as the primary DBA, are left to look into the issue the next day.
Your first attempt to inspect the issue is through Enterprise Manager, (OEM) and you are surprised that very little activity is actually showing up that
is of concern. You can see pink areas, something that would flag you, but they aren’t showing up in the active sessions in the left- only showing in the right.
What’s going on here?

Now, taking the right side into consideration, you do see four sessions, easily assumed to be parallel slaves, processing away.
When drilling down into the left side SQL_ID sessions, you not that none of the top right side sessions are showing up for the SQL_ID’s in question.
When drilling down into the right side four sessions, SID’s, all of them show no SQL_ID’s connected and are listed as “UNKNOWN”.
How as a DBA, do you locate this data? Your first quest will be to run an ASH report, but it will only show you the problem waits, type of waits, not the SQL that was the
cause of the issue. How do you find the SQL/SQL_ID for the SID’s?
ASH Report for the snapshot in question:
DB Id DB Name Inst Num Instance ----------- ------------ -------- ------------ xxxxxxxxx DBMART 1 mart1
Specify the Report Type ~~~~~~~~~~~~~~~~~~~~~~~ Enter 'html' for an HTML report, or 'text' for plain text Defaults to 'html' Enter value for report_type: text
Type Specified: text
Defaults to current database
Using database id: xxxxxxxxxxxxxxx
Defaults to current instance
Using instance number: 1
ASH Samples in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oldest ASH sample available: 12-Sep-11 12:20:34 [ 24306 mins in the past] Latest ASH sample available: 29-Sep-11 09:26:08 [ 0 mins in the past]
Specify the timeframe to generate the ASH report ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter begin time for report:
-- Valid input formats: -- To specify absolute begin time: -- [MM/DD[/YY]] HH24:MI[:SS] -- Examples: 02/23/03 14:30:15 -- 02/23 14:30:15 -- 14:30:15 -- 14:30 -- To specify relative begin time: (start with '-' sign) -- -[HH24:]MI -- Examples: -1:15 (SYSDATE - 1 Hr 15 Mins) -- -25 (SYSDATE - 25 Mins)
Defaults to -15 mins Enter value for begin_time: 09/27/11 16:00:00 Report begin time specified: 09/27/11 16:00:00
Enter duration in minutes starting from begin time: Defaults to SYSDATE - begin_time Press Enter to analyze till current time Enter value for duration: 240
Enter a name for the report or a default will be used, (I prefer a naming convention of: <report type, (ash, awr, addm)>_<sid>_<snapshot>.<txt/html>
View the ASH report for the time period, going to the top events first:
Event % Event P1 Value, P2 Value, P3 Value % Activity ------------------------------ ------- ----------------------------- ---------- Parameter 1 Parameter 2 Parameter 3 -------------------------- -------------------------- -------------------------- PX Deq Credit: send blkd 58.37 "268501009","3","0" 1.97 sleeptime/senderid passes qref
"268501010","3","0" 1.97
"268501010","4","0" 1.97
There’s our issue!
Now using this info, we know we are looking at a parallel, (PX) issue that we can then take to the next
step to identify the process and see if it is connected to one of our SID’s we noted in the OEM console:
Sid, Serial# % Activity Event % Event --------------- ---------- ------------------------------ ---------- User Program # Samples Active XIDs -------------------- ------------------------------ ------------------ -------- 2170, 3249 16.27 PX Deq Credit: send blkd 14.75 DM_USER oracle@host1 (P020) 11K/14K [ 74%] 0
CPU + Wait for CPU 1.46 1,059/14K [ 7%] 0
2138, 585 16.13 PX Deq Credit: send blkd 14.55 DM_USER oracle@host1 (P022) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.53 1,107/14K [ 8%] 0
2130, 581 16.11 PX Deq Credit: send blkd 14.53 DM_USER oracle@host1 (P023) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.54 1,113/14K [ 8%] 0
2129, 533 16.03 PX Deq Credit: send blkd 14.54 DM_USER oracle@host1 (P021) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.41 1,024/14K [ 7%] 0
Note that is also reports back the same four sids, all showing as “UNKNOWN” (2170, 2129,2130 and 2138) as we see in the OEM console..)
It just doesn’t have any SQL or SQL_ID’s to connect with them.
The coordinating SID is caught in the Top sessions running PQ’s, so we will see this as our identifier, even if it wasn’t seen in the OEM console:
Sid,Srl# (Inst) % Activity SQL ID Event % Event --------------- ---------- ------------- ----------------------------- -------- User Program -------------------- ------------------------------ 2113,UNKNOWN(1) 80.89 PX Deq Credit: send blkd 58.37
This is very important, as we need this to track further, outside of the report.
Using the coordinator, then query the DBA_HIST_ACTIVE_SESS_HISTORY from during the snapshots to capture the main SQL_ID:
select * from sys.DBA_HIST_ACTIVE_SESS_HISTORY where session_id=2113 and snap_id between 717 and 727;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# USER_ID SQL_ID 27-SEP-11 10.33.39.946 PM 2113 143 255 5dpcst074nk81 27-SEP-11 10.33.49.976 PM 2113 143 255 5dpcst074nk81
You can then pull the data that is connected to the parent that will tell you what session experienced the problem:
select * from sys.WRH$_SQLTEXT where sql_id='5dpcst074nk81';
SNAP_ID DBID SQL_ID SQL_TEXT COMMAND_TYPE REF_COUNT 760 2778270765 5dpcst074nk81 <CLOB> 1 0
the clob then shows the SQL involved in the executed session, which then tells you what your pain is on:
create table dm_user.tbl_iot ( col1 , col2 , col3 , col4.... constraint tbl_pk primary key (col1)) organization index.... <--Here's our key, it's an IOT, which includes the PK!
tablespace mart_data1 pctfree 0 nologging parallel 4 as....
We can now find the secondary SQL_ID that correponds to most of the combined hit to the primary SQL that will show in the following statement:
select distinct SQL_ID, optimizer_cost from dba_hist_sqlstat where snap_id between 717 and 727 and plsexec_time_delta=0 order by optimizer_cost desc;
SQL_ID OPTIMIZER_COST
fjbbxnyc66t7c 256350
5dpcst074nk81 253906 <--Here was the one shown, it will be the one above!
1xg7tkc156a1m 5477 24hc2470c87up 5302 dyd4b36t1ppph 4865 gfjvxb25b773h 2331 3wy90uysqgcfx 1974 a55tay7577psc 1922 00a15fn17bx7p 1912 b92jqqxwt3tfd 1725 cf621qmts91wf 1028 5atpa8vj2gakz 1007 7qskskbx91q7t 992
Now run an awrsqrpt.sql to see the hit for the SQL_ID in question:
If you have never run one of these, it’s a great report for unique info on a SQL Report for a given run. For this one, we just want to see WHAT was running:
You will need your snapshot ID’s and the SQL_ID to create the report and the report resides in $ORACLE_HOME/rdbms/admin. Yes, I prefer the text version of the HTML, but to each their own…
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ DBMART xxxxxxxxx mart1 1 10.2.0.4.0 NO host1
Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 709 27-Sep-11 14:30:31 57 6.3 End Snap: 727 27-Sep-11 23:30:53 37 5.1 Elapsed: 540.37 (mins) DB Time: 2,726.16 (mins)
SQL Summary DB/Inst: MARTF/martf Snaps: 709-727
Elapsed
SQL Id Time (ms)
------------- ----------
fjbbxnyc66t7c ##########
CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("IBID") INDEX
ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NOLOGGING parallel 4 as select
col1 , col2 , col3 , col4 , col5 , col6
-------------------------------------------------------------
SQL ID: fjbbxnyc66t7c DB/Inst: MARTF/martf Snaps: 709-727
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("COL...
Plan Hash Total Elapsed 1st Capture Last Capture # Value Time(ms) Executions Snap ID Snap ID --- ---------------- ---------------- ------------- ------------- -------------- 1 3019078213 115,239,821 1 710 726 -------------------------------------------------------------
Plan 1(PHV: 3019078213) -----------------------
Plan Statistics DB/Inst: DBMART/mart1 Snaps: 709-727 -> % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap ---------------------------------------- ---------- -------------- ------- Elapsed Time (ms) ########## 115,239,820.7 70.5 CPU Time (ms) ########## 23,668,620.1 45.7 Executions 1 N/A N/A Buffer Gets ########## 940,627,725.0 70.9 Disk Reads 7,761,573 7,761,573.0 19.4 Parse Calls 9 9.0 0.0 Rows 0 0.0 N/A User I/O Wait Time (ms) 1,949,762 N/A N/A Cluster Wait Time (ms) 0 N/A N/A Application Wait Time (ms) 0 N/A N/A Concurrency Wait Time (ms) 301,390 N/A N/A Invalidations 0 N/A N/A Version Count 17 N/A N/A Sharable Mem(KB) 9,285 N/A N/A -------------------------------------------------------------
-------------------------------------------------------------
Execution Plan ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------ | 0 | CREATE INDEX STATEMENT | | | | 256K| | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10001 | 637M| 229G| | | | Q1,01 | P->S | QC (ORDER) | | 3 | INDEX BUILD UNIQUE | PRIM_PK | | | | | | Q1,01 | PCWP | | | 4 | SORT CREATE INDEX | | 637M| 229G| | | | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 637M| 229G| 136K| | | Q1,01 | PCWP | | | 6 | PX SEND RANGE | :TQ10000 | 637M| 229G| 136K| | | Q1,00 | P->P | RANGE | | 7 | PX BLOCK ITERATOR | | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWC | | | 8 | TABLE ACCESS FULL| PRIM_TBL | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------
Note ----- - cpu costing is off (consider enabling it) - estimated index size: 257G bytes
Full SQL Text
SQL ID SQL Text
------------ -----------------------------------------------------------------
fjbbxnyc66t7 CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"
"("COL1") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NO
LOGGING parallel 4 as select ...<want to see the full SQL statement, yes
this report will show you the output...>
We now can see that our “offender” was the primary key creation on the IOT create statement.
Now you have the Information on the “Unknown” offender that was showing in OEM. Nothing can hide from a DBA for long!
Oracle Open World 2011, Prep Time!
I’ve spent the week updating, packing and communicating in preparation for my trip to Oracle Open World 2011. As with most folks that are attending, there is a lot to prepare for, but I have that added challenge of three kids who find this a fine opportunity to drive their father to distraction, as my ex-husband is responsible with managing their week, which he is unaccustomed to. Although I’m thankful for Google calendar that will tell my ex where and when each child has to be at all times, what homework, social events, etc., it is not a lot of fun for him, as the schedule is commonly kept from my home and not his during the week so finding a way to keep it from becoming too chaotic for him may not be feasible…
My preparation has included a few last minute tasks for RMOUG, which turned out to be pretty painless, so that wasn’t impacting. I think one of my biggest challenges was a final rush to try and get what identification I needed sync’d with the correct name, as we all know, the government agencies are far from efficient when someone changes their name, no matter if it’s marriage or in may case, a divorce, (ask me how my identity crisis is going, considering how often I need to use a different last name depending on what company/agency I’m speaking with at any given time…sigh…)
Upon checking my calendar, it appears that I have a dinner to attend every night that we are in San Francisco. The company for each of these events, of course, is wonderful- some of the folks I know and many I only know virtually and look forward to meeting for the first time. I’m looking forward to the Pythian Blogger’s meetup on Wednesday night. It’s a great opportunity to meet up with ex-coworkers and meet fellow bloggers- always a fun, enterprising group!
I’m looking forward to meeting a long list of folks, including, Dan Norris, Lisa Dobson, Eddie Awad, Yuri Velikanov, many more folks than I can possibly list…) and those that I haven’t seen in quite a while, Alex Gorbachev, Cary Milsap, Deb Lilley, Carol Dacko and again, more than I can list…
)
There will be many that were unable to make it this year, including a last minute cancel due to family needs by Chet Justice. He will be missed by many and want to pass a long a thank you for my daughter Cait and I for the picture on oraclenerd.com. We love our matching “DBA Goth Princess” T-shirts. When asking my 16yr old son to take our picture, his response was, “You know Mom, there is this great feature called a self-timer found on cameras…” I had to respond with “You know, there is this great feature called, I keep feeding you…” He took a great picture of Cait and I for Chet’s site…
Packing will continue tonight, along with more planning in the calendars, an attempt to get my tasks for the latest Agile sprint at work done and a hope that what clothes I pack will work the slightly different weather in San Francisco. Yes, I’m bringing my cowboy boots…
See you there!
Tempfile Read /Writes and ASM
I truly believe that database myths are the scourge of the database world. With that said, I have to blame myself for not being specific enough in one of my own blog posts and propagating one myself! As it was brought to my attention by Tanel Poder and Greg Rahn, I hadn’t put a post out here until now, so apologies!
I have flipped back to an almost exclusively 10g environment the last three months, but even though I’m back to my old stomping grounds, now fully staffed with almost double the DBA’s that were present when I left, have been able to implement ASM. With this change, I also need to update my data on how tempfiles are written to. When you read my blog on temp usage in parallel, in must be reminded that this post is for a Non-ASM environment. I also think for those that have high waits in temp read and writes, this should be an extra selling feature of ASM.
ASM, when implemented in any Oracle environment, 10g, included, will change the way that tempfiles are written to. Instead of single file writes, even when multiple tempfiles per a temp tablespace are present, with ASM, it WILL stripe across the files efficiently, (unless statistics are in issue) with parallel. There would then only be the benefit of having multiple temp tablespaces if you would like to allocate a different temp tablespace to different processes, but even then, the performance gain was marginal in my tests.
I did receive similar performance gains of temp tablespace groups without ASM as a single temp tablespace with multiple files with ASM. I was only able to gain extra performance when spreading heavy hitting temp usage across temp tablespace groups during high concurrency situations. the overall performance gain was only 12%, which was no where near what I experienced in the Non-ASM environment choice to go to temp tablespace groups with parallel processing. These were processes that utilized over 50GB of temp per parallel process.
I still have processes that limit my tuning options that have me daydreaming about multiple temp tablespace groups on multiple ASM groups, residing on SSD, but again, I’m daydreaming …
A Tale of Session Parameter Settings
This is another blog post about how TEMP can kill you in performance.
The Program Global Area, aka PGA, is a memory region that allows Oracle to perform many processes that once fell to static calculations in parameters that had to be managed by a DBA.
As frustrating as it might be to a DBA to not have enough memory to allocate to the performance enhancing feature, I found it even more frustrating to find NO PGA allocation to window sorts. We were experiencing poor performance in one of our environments, but I noted it was only during heavy workloads, resulting from heavy IO, NO execution plan changes, but outrageous waits on temp reads and writes.
db file sequential read 61,275,579 161,054 44.7 User I/O direct path read temp 17,262,132 74,398 20.6 User I/O <--TEMP is on SSD!! CPU time 71,971 20.0 direct path write temp 10,572,848 38,611 10.7 User I/O <--TEMP is on SSD!!
Upon querying the work area, I discovered the following:
select vst.sql_text, swa.sql_id, swa.sid, swa.tablespace , swa.operation_type , trunc(swa.work_area_size/1024/1024) "PGA MB" , trunc(swa.max_mem_used/1024/1024)"Mem MB" , trunc(swa.tempseg_size/1024/1024)"Temp MB" from v$sql_workarea_active swa, v$session vs, v$sqltext vst where swa.sid=vs.sid and swa.sql_id=vs.sql_id and vs.sql_id=vst.sql_id and vst.piece=0 order by swa.operation_type;
| SQL_ID | Temp Tablespace | Operation Type | PGA | SGA | TEMP |
| 09am9sr9uca5y | TEMP_4 | WINDOW(SORT) |
0 |
0 |
1375 |
| 09am9sr9uca5y | TEMP_1 | WINDOW(SORT) |
0 |
0 |
1405 |
| 09am9sr9uca5y | TEMP_2 | WINDOW(SORT) |
0 |
0 |
1415 |
| 09am9sr9uca5y | TEMP_3 | WINDOW(SORT) |
0 |
0 |
1420 |
The process in question, as most processes in the database environments I work in was parallel, (degree 4) working with temp tablespace groups, (count of 4) on SSD, so quite fast, but to see this amount of processing “swapping” to temp was frustrating and painful to performance.
The IO was killing the performance, even with this running on SSD, we were 100% saturated on the disk!:
Device: reads/s writes/s rsec/s wsec/s %util
Fio1 1584.00 1460.33 101397.33 129344.00 73.97
Fio2 1603.33 1534.67 102720.00 134776.00 100.03
FROM AWR:
%Time Total Wait wait Waits
Wait Class Waits -outs Time (s) (ms) /txn -------------------- ---------------- ------ ---------------- ------- --------- User I/O 91,531,087 .0 279,954 3 56,223.0 Other 4,067,544 2.9 12,718 3 2,498.5
Now where the key came in was the rest of the parallel processes when looking at their allocation:
| SQL_ID | Operation Type | PGA | SGA | TEMP | |
| 09am9sr9uca5y | HASH-JOIN |
511 |
23 |
||
| 09am9sr9uca5y | LOAD Write Buffers |
0 |
0 |
||
| 09am9sr9uca5y | HASH-JOIN |
511 |
23 |
||
| 09am9sr9uca5y | HASH-JOIN |
511 |
23 |
Now even though the PGA is set to 32G and the allocating to numerous other processes, this one appeared very “odd” and why it’s the key the PGA. Consider what the overall process would have to be to allocate 511 MB of PGA to the three processes above. This would mean that 500MB was allocated to the process and then there was the approximate 10MB overhead. The number was too evenly distributed and too high and no PGA was allocated to two newer features to 10g, such as analytical functions, (window sorts) .
I remembered that at one time, this company had a developer employed to it that had a penchant for the following manual session alteration:
EXECUTE IMMEDIATE 'ALTER SESSION SET WORKAREA_SIZE_POLICY = MANUAL';
EXECUTE IMMEDIATE 'ALTER SESSION SET SORT_AREA_SIZE = 536870912';
EXECUTE IMMEDIATE 'ALTER SESSION SET HASH_AREA_SIZE = 536870912';
Upon a search of the code, it was found that this was in the main package, but:
- The beginning procedure which executed from the package was actually in the center of the package, so developers never noted it as being an issue and the DBA’s never dug into the code deep enough before to know this manual alteration was present.
- Due to a huge process change to how the code produced work, there had been a large insertion of analytical functions that caused the performance degradation upon heavy loads.
The testing process was simple.
- Removed the three lines from the initial procedure in the package.
- Run the same heavy load of processing from production in test with the new code.
Change in how the PGA is allocated:
| SQL_ID | Temp Tablespace | Operation Type | PGA | SGA | TEMP |
| 09am9sr9uca5y | TEMP_2 | WINDOW (SORT) |
26 |
95 |
598 |
| 09am9sr9uca5y | TEMP_3 | WINDOW (SORT) |
26 |
95 |
598 |
| 09am9sr9uca5y | TEMP_1 | WINDOW (SORT) |
26 |
95 |
598 |
| 09am9sr9uca5y | TEMP_4 | WINDOW (SORT) |
26 |
95 |
598 |
I also found the PGA savings to the hash joins interesting, as what was statically “forced” in the manual allocation wasn’t what the database really required and was simply wasted with the old settings, where the PGA dynamically set much smaller amounts:
| SQL_ID | Temp Tablespace | Operation Type | PGA | SGA | TEMP |
| 12bhautka6mvp | LOAD WRITE BUFFERS |
0 |
0 |
||
| 12bhautka6mvp | HASH-JOIN |
2 |
1 |
||
| 12bhautka6mvp | HASH-JOIN |
2 |
1 |
||
| 12bhautka6mvp | HASH-JOIN |
2 |
1 |
||
| 12bhautka6mvp | HASH-JOIN |
2 |
1 |
Result:
Increase in performance: 7 fold
The fix was moved to production that day, without hesitation and everyone shook their head as to why this was ever put into place by anyone, but I want to follow up on that question that many folks, I’m sure have. Although I do not support this, especially in 10g and 11g databases due to what we see here, you do have to understand what the developer’s goal was:
The developer :
- Saw benefits to straight 500MB allocation of PGA to processing that over-rides the percentage allocation maximum per process.
- Experienced huge gains in performance on hash joins due to this change and we did suffer some small performance hits when it was removed, but the overall performance gain was still outstanding!
- Wasn’t aware of future enhancements and code changes that would have been vulnerable to his choice in coding session parameter changes.
Warp Speed with Temp Tablespace Groups
The introduction of the temp tablespace was a crucial step in the RDBMS to separating the “work area” for sorting and specific join operations from permanent segments. To eliminate the performance challenges of sorts and hash against the I/O threshold involved with performing this word in the temp tablespace, Oracle 9i introduced the Program Global Area, (PGA) to complete sorts and hashes within memory.
DBA’s spent considerable time tuning the PGA, attempting to continually ensure that as more complex sorting and join features emerged, that these processes would complete their work within the PGA and not require such a large work area to force a, “swap” to the temp tablespace, again returning to an I/O bound process.
Throughout the years of 10g release 2, larger databases, required more sorting and hash operations, which in turn required larger work areas that for particular requirements, no PGA could meet. This would be seen in PGA cache hit graphs without a size recommendation to meet a 90% or better hit ratio. These sorting and/or hash “heavy” processes would then become burdensome on the database, often encompassing the primary utilization of temporary “swap” and extended wait time.
When a DBA or Developer observes high waits on tempfile reads and writes, the challenge is often not understood. Unlike a standard tablespace, that with high read and writes can have the I/O intensive demands spread across more area by adding multiple datafiles to write to simultaneously, a temp tablespace, when observed, will appear very different.
A simple ASH report can show how impacting this type of performance challenge can be to a VLDB:
|
Event |
% Event |
P1 Value, P2 Value, P3 |
% Activity |
Parameter 1 |
Parameter 2 |
Parameter 3 |
| direct path read temp |
43.27 |
“1001″,”3502″,”7″ |
0.01 |
file number | first dba | block cnt |
| PX Deq Credit: send blkd |
26.18 |
“268501001″,”2″,”0″ |
1.50 |
sleeptime/senderid | passes | qref |
|
“268501001″,”1″,”0″ |
1.18 |
|||||
|
“268501003″,”2″,”0″ |
1.06 |
|||||
| direct path read |
5.10 |
“50″,”11333″,”11″ |
0.01 |
file number | first dba | block cnt |
| direct path write temp |
1.47 |
“1003″,”50949″,”7″ |
0.01 |
file number | first dba | block cnt |
|
SQL ID |
Planhash |
% Activity |
Event |
% Event |
SQL Text |
|
2880106315 |
100.00 |
direct path read temp |
43.27 |
create table report_tbl1… | |
| PX Deq Credit: send blkd |
26.18 |
Temp tablespace groups have received little press and less discussion since their introduction in Oracle 10g. When utilizing temp tablespace groups, a DBA can offer impressive performance increases by spreading the I/O across multiple files vs. one tempfile. As tempfiles are utilized differently than datafiles, a temp tablespace group enables a process to consume temp space from multiple tablespaces.
Before the feature is implemented though, there are many considerations that need to be examined to ensure that benefits are seen and that complications are not introduced.
Myths about temptablespace groups:
Myth 1. A temp tablespace group must have more than 1 tablespace.
Answer- Untrue. The tablespace group is created with the initial temp tablespace defined with the temp tablespace group clause.
Myth 2. There is a limit to how many temp tablespaces can be added to a temp tablespace group.
Answer- There is no explicit limit on the maximum number of tablespaces that can be allocated to a tablespace group.
Myth 3. A temporary tablespace can belong to more than one temporary tablespace group.
Answer- It can belong to only one group.
Myth 4. Temporary tablespace groups improve all sorting issues.
Answer- Windows sort rank often perform worse when utilizing temp tablespace groups and parallel.
The feature can be allocated to one user as its temporary tablespace or to an entire database to increase performance.
To create a temporary tablespace group is quite simple:
CREATE TEMPORARY TABLESPACE <tablespace_name>
TEMPFILE <ASM DISKGROUP >
SIZE <SIZE REQ>
TABLESPACE GROUP <group_name>;
CREATE TEMPORARY TABLESPACE TEMP_G1
TEMPFILE '+DATA'
SIZE 20000M
TABLESPACE GROUP TEMP_G;
desc dba_tablespace_groups
SELECT *
FROM dba_tablespace_groups;
ALTER TABLESPACE <tablespace_name>
TABLESPACE GROUP <group_name>;
ALTER TABLESPACE DM_DATA1 TABLESPACE GROUP TEMP_G;
ALTER USER DM_USER TEMPORARY TABLESPACE TEMP_G;
SELECT TEMPORARY_TABLESPACE FROM DBA_USERS
WHERE USERNAME='DM_USER';
ALTER SYSTEM TEMPORARY TABLESPACE TEMP_G;
Planing to implement temp tablespace group(s)
1. Plan your temp tablespace group with careful consideration- the tablespaces should be of the same size, have the same amount of files with the tempfiles each the same size.
2. Keep in mind- inconsistent sizes across the tablespaces can affect the efficiency of the striping when using a non-ASM managed environment.
3. The temp work will stripe across the tablespaces quite evenly. When using a temp tablespace group, it will no longer simply use one temp tablespace at a time, so keep this in mind when monitoring temp tablespace usage.
4. If in a non-ASM environments, stripe your files in differing orders to avoid having the same # file on the same drive as another temp tablespace from the temp tablespace group- this will eliminate more I/O “hot spots”.
4. You should estimate the required size of each tablespace to be approximately 40-60% or your current temp tablespace needs, depending on your database usage, (parallel processing, rowid range scans and types of sorts…)
5. Temp tablespace groups should not be a replacement for SQL tuning. Larger than acceptable temp tablespace usage in correlation to the data set size the code is working with is a clear indicator that the code should be examined for tuning opportunities.
6. Match your quantity of temp tablespaces contained in a tablespace group to the amount of parallel you will be hinting in your statements using it.
7. Poor performance from statistics issues is only aggravated by temp tablespace groups. Sort and hash operations are two of the most sensitive tasks to poor statistics, so it is easy to understand that extensive performance issues should be expected if statistics are not addressed before temp tablespace group introduction.
Temp tablespace groups and parallelism
DBA’s who utilize parallel query efficiently in their environments often follow a strict set of guidelines on what scenarios, objects and design must be in place before parallel will offer a true benefit in performance.
- Partitioned Objects
- Where clause that utilizes a partitioning key.
- Large insert/update with defined variables.
- Non-stale statistics
- Setting the degree of parallel, (DOP) on the object.
Parallel can be utilized with temp tablespace groups to enhance performance to an incredible degree as well, but of the above “rules”, only one is preferable and it is most likely not the one most would guess from the above list.
Temp Tablespace groups should have the following for positive performance gains when utilizing parallel:
- Non-stale statistics
- Set the degree of parallel, (DOP) as a hint in the query, not at the object level.
- Combination works best with OLAP and DSS systems, rare occasion where an OLTP would benefit.
- Unless the insert and update is performing large sort or hash, it would offer little performance gain.
- Partitioned objects are not a requirement to receive performance benefit.
The last bullet point is often difficult to understand, but the mindset must be re-aligned from the idea that you are using parallel to benefit a “read or write on multiple objects” i.e partitioned objects and that we are using the parallel process to benefit “read and write to multiple tempfiles” which is the temp tablespace group. Visualizing the temp tablespace group as a “partitioned object” may assist with comprehending the benefits.
Performance Example
The process example:
- utilizes 720GB of temp tablespace.
- has an execution time of 16 hrs.
- Source tables are 472GB in a 10TB datamart environment.
The Goal to attain:
- Cut the execution time in 1/2 to re-allocate the resources to other processes.
- Minimize temp usage by 60%.
- Use temp tablespace groups to eliminate temp read/write waits in existing process.
Note** Second time documenting this test. Current iteration has enhancements (i.e. Added fourth temp tablespaces and changing amount of parallel used in the statements, etc…)
Create Temp Tablespace Group, three temp tablespaces total, 150GB each- CREATE TEMPORARY TABLESPACE TEMP_G1 TEMPFILE ‘/u06/oradata/sid1/temp_g1_01.dbf’ SIZE 51200M, ‘/u10/oradata/sid1/temp_g1_02.dbf’ SIZE 51200M, ‘/u20/oradata/sid1/temp_g1_03.dbf’ SIZE 51200M TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G2 TEMPFILE ‘/u30/oradata/sid1/temp_g2_01.dbf’ SIZE 51200M, ‘/u20/oradata/sid1/temp_g2_02.dbf’ SIZE 51200M, ‘/u10/oradata/sid1/temp_g2_03.dbf’ SIZE 51200M; ALTER TABLESPACE TEMP_G2 TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G3 TEMPFILE ‘/u25/oradata/martd/temp_g3_01.dbf’ SIZE 51200M, ‘/u10/oradata/martd/temp_g3_02.dbf’ SIZE 51200M, ‘/u04/oradata/martd/temp_g3_03.dbf’ SIZE 51200M; ALTER TABLESPACE TEMP_G3 TABLESPACE GROUP TEMP_G;
CREATE TEMPORARY TABLESPACE TEMP_G4 TEMPFILE ‘/u20/oradata/martd/temp_g4_01.dbf’ SIZE 51200M, ‘/u15/oradata/martd/temp_g4_02.dbf’ SIZE 51200M, ‘/u06/oradata/martd/temp_g4_03.dbf’ SIZE 51200M; ALTER TABLESPACE TEMP_G4 TABLESPACE GROUP TEMP_G;
Switch over database and user to new temp tablespace group—
ALTER DATABASE DEFAULT TEMPORARY TABLESPACE TEMP_G;
Break up original process from one process with complex aggregating to two CTAS, less one process sorting: SQL> create table orpt_test tablespace tbl_data1 compress nologging parallel 4 as SELECT /*+ full(c) parallel (c 4) */ <–Note Parallel here, same requested parallel as # of temp c.m_id, … PERCENT_RANK() OVER (ORDER BY c.orders_by_amt) pct_rnk FROM all_transactions c WHERE trans_type_cd != ‘F’; Table created. Elapsed: 02:56:46:22
Estimated time run was 16 hrs old way
SQL>CREATE INDEX orpt_tst_idx1 on orpt_test(m_id, file_id) PARALLEL 4 nologging tablespace idx_data1;
Index created.
Elapsed: 01:42:15:43
Set table to unparallel and gather stats— ALTER TABLE orpt_test noparallel; exec dbms_stats.gather_table_stats( ownname=>’OWNER’,tabname=>’ORPT_TEST’, ESTIMATE_PERCENT=>.00001, METHOD_OPT=> ‘FOR ALL INDEXED COLUMNS’, GRANULARITY=>’ALL’,CASCADE=>TRUE, DEGREE=>4);
Create the secondary table that utilizes the index to ease into the hash and limit the temp usage on the amount of data it ends up having to perform in the sort if it was performed in one CTAS—
create table NEW_RPT_TBL
tablespace tbl_data2 compress nologging pctfree 0 parallel 4 as SELECT /*+ parallel(o,4) */ b_id, MIN(t_dt) this one, MAX(t_dt)that one, COUNT(*) , SUM(o_amt) , SUM(o_amt) this one, ….so on and so forth, more and more aggregation FROM ( SELECT — /*+ parallel(c,4) index(c,orpt_tst_idx1) use_hash(c cm mm m) */ c.m_id, … FROM orpt_test c , < Here’s our new table, already much of the aggregation performed! m_tbl1 mm, crm_mtbl2 cm, files_form3 m where c.m_id = mm.m_id AND mm.m_cd = cm.m_cd AND c.file_id = m.file_id (+) AND c.m_id = m.m_id (+) ) o GROUP BY m_id, c_cd, m_cd, l_name, f_id, b_id;
Table created.
Elapsed: 02:12:32:56
A quick observations of the work area verifies how the temp sorts are spread across the temp tablespaces that encompass the temp tablespace group to eliminate waiting as we write to one tempfile at a time:
| SQL_TEXT | SID | TABLESPACE | OPERATION_TYPE | Temp MB |
| create table orpt_test tbl_data1 |
803 |
TEMP_G1 | WINDOW (SORT) |
10229 |
| create table orpt_test tbl_data1 |
835 |
TEMP_G4 | WINDOW (SORT) |
13872 |
| create table orpt_test tbl_data1 |
836 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
847 |
TEMP_G1 | WINDOW (SORT) |
12974 |
| create table orpt_test tbl_data1 |
803 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
847 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
835 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
827 |
TEMP_G3 | WINDOW (SORT) |
11408 |
| create table orpt_test tbl_data1 |
837 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
837 |
TEMP_G4 | WINDOW (SORT) |
11535 |
| create table orpt_test tbl_data1 |
827 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
828 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
828 |
TEMP_G3 | WINDOW (SORT) |
15940 |
| create table orpt_test tbl_data1 |
853 |
LOAD WRITE BUFFERS | ||
| create table orpt_test tbl_data1 |
836 |
TEMP_G2 | WINDOW (SORT) |
10807 |
| create table orpt_test tbl_data1 |
853 |
TEMP_G2 | WINDOW (SORT) |
18028 |
Drop “work” Table/Indexes post final table creation.
DROP TABLE ORPT_TEST PURGE;
NEW PROCESS COMPLETE…
Was the goal met?
Largest amount of temp space used: 182GB, (Old way: 721GB)
Time to complete process: 5hrs, 23 min., (Old way: 16 hrs)
Temp Tablespace Group Benefits to this entire tuning process:
Temp space usage without temp tablespace group, new process: 512GB, (over twice as much!)
Process completion time of new process without temp tablespace group: 11hrs, (5 hrs longer!)
Temp Tablespace Waits Eliminated with temp tablespace groups from original process:
Reads- 66%
Writes- 29%
Where are the improvements seen?
1. I/O is distributed evenly across multiple datafiles instead of one datafile when presented with onle a single temp tablespace.
2. Sorts in parallel performance increased, on average, over 20% for window and buffer sorts, (improvement was not seen for sorts involving rowid range scans…)
3. Database overall improved as resources were re-allocated to all processes that may have been hindered by I/O bottleneck by heavy swapping to temp tablespace.
4. For large hash processes, where temp tablespace usage is unavoidable, temp tablespace groups offer significant improvement in I/O performance and more CPU utilization.
Addressing a Deadlock
I’ve always had a *thing* for trace files. They serve me well- I like them and they seem to really like me. I have a special affinity for deadlock trace files and someone needs to remind me to pester Cary Millsap about a tool to decipher them so I have even more data to go through…:) Even without any tools, I can easily pull out what is important and work from this data to run queries and reports to drill down to find work-arounds when changes to logic or design are not an option.
Deadlocks, although quite unpleasant, I find occur often in mature environments where the code, logic or design may have become complicated over time. Often it’s not anything that was intentional, but simply due to the natural complexity that can occur in database logic and as objects become, what I like to refer to as “heavy”.
An object, when receiving high traffic , both due to concurrent reporting and/or transactional, can become, what I like to refer to as “heavy” or “weighted”. As a DBA, it’s my job to find ways to lighten the object, by logical or physical tuning, often both. As deadlocks occur when more than one session is waiting on data that is locked by the other and often due to high complexity logic holding locks consistently for long periods of time, (or longer than the logic can tolerate…) I find I can often help eliminate deadlocking by eliminating anything that causes those longer lock times.
One such issue arose this week, was not available for a code rewrite, but I was able to offer some physical tuning. What the trace files and AWR reports below offer is that one of the major pieces of the complex puzzle wasn’t part of the original deadlock query.
__________________________________________________________________________________________________________________________
So here is the process and the fix…
- Step through the deadlock trace file and document the logic “tangle” and waits that lead to the deadlock. If the waits do not correspond to the statement review, then go onto the steps below, (which did occur…)
- Take an AWR for the snapshot in time of the deadlock.
- Take an AWR sql_id specific report for each of the top queries involving the object that also was involved in the deadlock query.
- Remove the “weight” from the objects that are causing the waits in the statements involved in the deadlock snapshot scenario.
- Inspect the improvements in the explain plans, the execution times and monitor for an occurrence of any new deadlocking.
TRACE FILE Example:
Tue Jul 5 09:01:07 2011
ORA-00060: Deadlock detected. More info in file
/u01/app/oracle/admin/xxxx/udump/xxxx_ora_20960.trc.
The first thing anyone will notice who reads a deadlock trace file is the following statement, (or should notice!):
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL.
This is Oracle telling you that it is not responsible for this error and you should automatically assume that something is wrong with the code, design, logic or all three. Now there are some bugs involved that will create a deadlock, but please do not go to Oracle and try to convince them you have one of these bugs unless you have clear proof. It will waste your time and only annoy Oracle Support…
The trace file will tell you about the deadlocked session and the other waiting session.
Information on the OTHER waiting sessions:
Session 779:
pid=83 serial=41199 audsid=117763439 user: 49/DW_USR
O/S info: user: dw_usr, term: unknown, ospid: 1234, machine: linuxora01
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current SQL Statement:
DELETE FROM MFG_TBL
WHERE mfg_id=:1
End of information on OTHER waiting sessions.
Current SQL statement for this session:
DELETE FROM MFG_TBL
WHERE mfg_id=:1
The trace file does not stop here- It will take you through the datafiles, objects involved and then take you through the logic in the session that were involved in the deadlock scenario. This allows you to see what occurred in the database, along with the timing and the wait time. This is incredibly valuable to a DBA to see how the logic is flowing, (or not flowing!) and how a code change would benefit and in the case where code changes are not an option, where removing “weight” may help significantly.
Pay attention to the bind variable values that are presented to you in these trace files. They can assist you in finding where bind peek issues may have caused a deadlock or when an explain plan is off and statistics are the cause of your troubles.
Doing a search for “sqltxt” and “Peeked Binds”, you will step through the trace file and see the statement s trapped and the bind variables “peeked” for each.
Peeked Binds
============
Bind variable information
position=1
datatype(code)=2
datatype(string)=NUMBER
precision=0
scale=0
max length=22
value=140344 <--This is my value for :1
The plan table for the statement that was utilized for the execution is shown in the trace, too:
============
Plan Table
============
------------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------+-----------------------------------+
| 0 | DELETE STATEMENT | | | | 1 | |
| 1 | DELETE | MFG_TBL | | | | |
| 2 | INDEX UNIQUE SCAN | MFG_PK | 1 | 12 | 1 | |
------------------------------------------------+-----------------------------------+
Was an outline or profile used for the statement? Yes, this is displayed as well..
Outline Data:
/*+BEGIN_OUTLINE_DATAIGNORE_OPTIM_EMBEDDED_HINTSOPTIMIZER_FEATURES_ENABLE('10.2.0.4')OPT_PARAM('_optimizer_cost_model' 'io')OPT_PARAM('optimizer_dynamic_sampling' 1)OPT_PARAM('optimizer_index_cost_adj' 10)OPT_PARAM('optimizer_index_caching' 80)ALL_ROWSOUTLINE_LEAF(@"DEL$1")INDEX(@"DEL$1" "MFG_TBL"@"DEL$1" ("MFG_TBL"."MFG_ID"))END_OUTLINE_DATA*/
The query that due to concurrency added “weight” to my deletes that ended up deadlocking on the MFG_TBL was actually a distinct select with joins to a number of tables during that hour, which sourced to another table in the distinct query, we will refer to in this article as FF_TBL, (I’m not telling you what I think FF stands for…J)
Along with the deadlock trace file, I utilized both AWR reports and , (awrsqrpt.sql) to drill down and gather more details about individual sql_id’s from the snapshot period.
__________________________________________________________________________________________________________________________
The AWR report from the time in question shows numerous queries with massive joins involving the table in question:
Elapsed CPU Elap per % Total
Time (s) Time (s) Executions Exec (s) DB Time SQL Id
---------- ---------- ------------ ---------- ------- -------------
272 222 55 5.0 2.8 55dnjt1894pw0
Module: JDBC Thin Client
SELECT DISTINCT mf.f_id, mf.l_dir, mf.f_name, st.stg_ type_cd,
pe.pr_stp_sts_cd, mfo.ll_dir AS odir, mfo.f_name AS oname, peo.pr_
stp_sts_cd AS ostat FROM MF_TBL mf JOIN STG_TBL st ON mf.b_rec_cd
= st.b_rec_cd AND st.cla_flag = 'Y' JOIN FF_TBL f
152 152 109 1.4 1.6 bc88sd1b2v10q
Module: oracle@server3 (S003)
SELECT DISTINCT NVL("A3"."M_ID",NVL("A5"."M_ID","A1"."M_ID")) FRO
M "MF_TBL" "A5","PRC_TBL" "A4","MF_TBL" "A3","MFG_TBL
" "A2","MF_TBL2" "A1" WHERE NVL("A4"."MFG_ID","A5"."M
FG_ID") IS NOT NULL AND "A1"."MF_ID"(+)="A2"."MF_ID
151 150 7 21.5 1.5 4vqa0hz2qgmr7
Module: oracle@server3 (S011)
SELECT "A1"."RCVD","A1"."MR_TBL","A1"."MIP_TBL","A1"."MA_TBL
","A1"."NC_TBL","A1"."NNC_TBL","A1"."WM_TBL","A1"."CF_TBL
","A1"."WM_TBL","A1"."DAR_TBL","A1"."WH_U_R","A1"."RC
VDS",COUNT(*) OVER () FROM (SELECT "A5"."RCVD" "RCVD",
__________________________________________________________________________________________________________________________
My First concern was the select distinct queries, which one output from below for a larger snapshot in time shows how poorly the first one had been performing.
| Stat Name | Statement | Per Execution | % Snap |
| Elapsed Time (ms) | 924,866 | 2,306.40 | 11.1 |
| CPU Time (ms) | 873,535 | 2,178.40 | 11.3 |
| Executions | 401 | N/A | N/A |
| Buffer Gets | ########## | 752,796.80 | 18.9 |
| Disk Reads | 24 | 0.1 | 8 |
| Parse Calls | 401 | 1 | 8 |
| Rows | 356 | 0.9 | N/A |
| User I/O Wait Time (ms) | 124 | N/A | N/A |
| Cluster Wait Time (ms) | 0 | N/A | N/A |
| Application Wait Time (ms) | 0 | N/A | N/A |
| Concurrency Wait Time (ms) | 64,100 | N/A | N/A |
| Invalidations | 0 | N/A | N/A |
| Version Count | 160 | N/A | N/A |
| Sharable Mem(KB) | 12,611 | N/A | N/A |
Global Parameter that may be affecting the deadlock scenario:
PARAMETER VALUE
_optimizer_cost_model IO
Session parameter changes that may have affected the performance(seen in the outline..):
PARAMETER VALUE
Dynamic_sampling 1
Optimizer_index_cost_adj 1
Optimizer_index_caching 80
Due to hardware options, the parameter, _optimizer_cost_model change offered a huge performance boost, allocating the database with huge I/O constraints to always prefer costs that favored less I/O hits than CPU when disk I/O was costly to the database.
Over time, investments have been made in hardware improvements that now make it pertinent for the DBA team to investigate if this parameter setting is still a good idea. During this exercise, it is valuable to look into it as well.
SQL> explain plan for
SELECT DISTINCT mf.f_id, mf.l_dir, mf.f_name, st.stg_type_cd, pe.pr_stp_cd,
mfo.l_dir AS odir, mfo.f_name AS oname, peo.pr_stp_cd AS ostat FROM MR_TBL mf
JOIN ST_TBL st ON mf.b_rec _cd = st.b_rec_cd AND st.ca_flag = 'Y'
JOIN FF_TBL ff ON ff.fr_type_id = mf.fr_type_id
JOIN PE_TBL pe ON pe.f_id = mf.pf_id
AND (pe.pr_stp_name = ‘<req 1>’ OR pe.pr_stp_name = '<req2>')
LEFT JOIN MF_TBL mfo ON mfo.pf_id = mf.f_id
LEFT JOIN PE_TBL peo
ON peo.f_id = mf.f_id AND peo.pr_stp_name = '<req3>'
WHERE mf.f_id IN ( SELECT f_id FROM MF_TBL mf
WHERE MIF_id IS NULL START WITH f_id
IN ( SELECT mfgi.f_id FROM MFGI_TBL mfgi
WHERE mfgi.mfgi_id = :1)
CONNECT BY PRIOR mf.f_id = mf.pf_id);
Explained.
SQL> set linesize 240
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————
| Operation | Name | Rows | Bytes | Cost |
| SELECT STATEMENT | 1 | 242 | 26 | |
| HASH UNIQUE | 1 | 242 | 26 | |
| HASH JOIN | 1 | 242 | 13 | |
| HASH JOIN | 1 | 237 | 9 | |
| NESTED LOOPS OUTER | 1 | 227 | 6 | |
| NESTED LOOPS | 1 | 147 | 5 | |
| NESTED LOOPS OUTER | 4 | 492 | 4 | |
| NESTED LOOPS | 4 | 396 | 2 | |
| VIEW | VW_NSO_1 | 4 | 20 | 1 |
| FILTER | ||||
| CONNECT BY WITH FILTERING | ||||
| FILTER | ||||
| TABLE ACCESS FULL | MF_TBL | 4 | 48 | 1 |
| INDEX UNIQUE SCAN | MFGI_PK | 1 | 9 | 1 |
| NESTED LOOPS | ||||
| CONNECT BY PUMP | ||||
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 4 | 48 | 1 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 3 | 1 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 1 | 94 | 1 |
| INDEX UNIQUE SCAN | MF_PK | 1 | 1 | |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 | |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 1 | 80 | 1 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 4 | 1 | |
| TABLE ACCESS FULL | ST_TBL | 10 | 100 | 2 |
| TABLE ACCESS FULL | FF_TBL | 9304 | 46520 | 3 |
Plan hash value: 2540126887
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("FF"."F_REC_ID"="MF"."F_REC_ID")
3 - access("MF"."BR_TYPE_CD"="ST"."BR_TYPE_CD")
9 - filter("MIF_ID" IS NULL)
10 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
11 - filter( EXISTS (SELECT 0 FROM "DW_USR"."MFG_TBL" "MFGI"
WHERE "MFGI"."F_ID"=:B1 AND "MFGI"."MFG_ID"=TO_NUMBER(:1)))
12 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
13 - access("MFGI"."MFG_ID"=TO_NUMBER(:1) AND "MFGI"."F_ID"=:B1)
17 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
18 - filter("MF"."PF_ID" IS NOT NULL AND "MF"."F_REC_ID" IS
NOT NULL AND "MF"."BR_TYPE_CD" IS NOT NULL)
19 - access("MF"."F_ID"="$nso_col_1")
20 - filter("PEO"."PS_NAME"(+)='<req2>')
21 - access("PEO"."F_ID"(+)="MF"."F_ID")
filter("PEO"."F_ID"(+) IS NOT NULL)
22 - filter("PE"."PS_NAME"='Concatenate' OR "PE"."PS_NAME"='File
Conversion')
23 - access("PE"."F_ID"="MF"."PF_ID")
filter("PE"."F_ID" IS NOT NULL)
25 - access("MFO"."PF_ID"(+)="MF"."F_ID")
filter("MFO"."PF_ID"(+) IS NOT NULL)
26 - filter("ST"."CA_FLAG"='Y')
Note
-----
- cpu costing is off (consider enabling it)
64 rows selected.
CREATE the stupid index for the quantity they run this silly thing for…
SQL> create index DW_USR.ff_idx1 on DW_USR.FF_TBL(F_REC_ID)
tablespace mf_idx;
Index created.
SQL> explain plan for
SELECT /*+ CPU_COSTING */ DISTINCT mf.f_id, mf.l_dir, mf.f_name,
st.stg_type_cd, pe.pr_stp_cd,
mfo.l_dir AS odir, mfo.f_name AS oname, peo.pr_stp_cd AS ostat FROM MR_TBL mf
JOIN ST_TBL st ON mf.b_rec _cd = st.b_rec_cd AND st.ca_flag = 'Y'
JOIN FF_TBL ff ON ff.fr_type_id = mf.fr_type_id
JOIN PE_TBL pe ON pe.f_id = mf.pf_id
AND (pe.pr_stp_name = ‘<req 1>’ OR pe.pr_stp_name = '<req2>')
LEFT JOIN MF_TBL mfo ON mfo.pf_id = mf.f_id
LEFT JOIN PE_TBL peo
ON peo.f_id = mf.f_id AND peo.pr_stp_name = '<req3>'
WHERE mf.f_id IN ( SELECT f_id FROM MF_TBL mf
WHERE MIF_id IS NULL START WITH f_id
IN ( SELECT mfgi.f_id FROM MFGI_TBL mfgi
WHERE mfgi.mfgi_id = :1)
CONNECT BY PRIOR mf.f_id = mf.pf_id);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————
| Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
| SELECT STATEMENT | 1 | 242 | 39 (11) | 00:00:01 | |
| HASH UNIQUE | 1 | 242 | 39 (11) | 00:00:01 | |
| NESTED LOOPS OUTER | 1 | 242 | 38 (8) | 00:00:01 | |
| NESTED LOOPS OUTER | 1 | 162 | 37 (9) | 00:00:01 | |
| HASH JOIN | 1 | 138 | 36 (9) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 (0) | 00:00:01 |
| NESTED LOOPS | 1 | 133 | 9 (12) | 00:00:01 | |
| HASH JOIN | 2 | 218 | 8 (13) | 00:00:01 | |
| NESTED LOOPS | 4 | 396 | 2 (0) | 00:00:01 | |
| VIEW | VW_NSO_1 | 4 | 20 | 1 (0) | 00:00:01 |
| FILTER | |||||
| CONNECT BY WITH FILTERING | |||||
| FILTER | |||||
| TABLE ACCESS FULL | MF_TBL | 4 | 48 | 1 (0) | 00:00:01 |
| INDEX UNIQUE SCAN | MFGI_PK | 1 | 9 | 1 (0) | 00:00:01 |
| NESTED LOOPS | |||||
| CONNECT BY PUMP | |||||
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 4 | 48 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 3 | 1 (0) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 1 | 94 | 1 (0) | 00:00:01 |
| INDEX UNIQUE SCAN | MF_PK | 1 | 1 (0) | 00:00:01 | |
| TABLE ACCESS FULL | ST_TBL | 10 | 100 | 5 (0) | 00:00:01 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 (0) | 00:00:01 | |
| TABLE ACCESS FULL | FF_TBL | 9304 | 46520 | 26 (4) | 00:00:01 |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 (0) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 2 | 160 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 4 | 1 (0) | 00:00:01 |
Plan hash value: 2202465807
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("FF"."F_REC_ID"="MF"."F_REC_ID")
5 - filter("PE"."PS_NAME"='Concatenate' OR "PE"."PS_NAME"='<req1>')
7 - access("MF"."BR_TYPE_CD"="ST"."BR_TYPE_CD")
10 - filter("MIF_ID" IS NULL)
11 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
12 - filter( EXISTS (SELECT 0 FROM "DW_USR"."MFG_TBL" "MFGI" WHERE
"MFGI"."F_ID"=:B1 AND "MFGI"."MFG_ID"=TO_NUMBER(:1)))
13 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
14 - access("MFGI"."MFG_ID"=TO_NUMBER(:1) AND "MFGI"."F_ID"=:B1)
18 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
19 - filter("MF"."PF_ID" IS NOT NULL AND "MF"."F_REC_ID" IS NOT NULL AND
"MF"."BR_TYPE_CD" IS NOT NULL)
20 - access("MF"."F_ID"="$nso_col_1")
21 - filter("ST"."CA_FLAG"='Y')
22 - access("PE"."F_ID"="MF"."PF_ID")
filter("PE"."F_ID" IS NOT NULL)
24 - filter("PEO"."PS_NAME"(+)='<req2>')
25 - access("PEO"."F_ID"(+)="MF"."F_ID")
filter("PEO"."F_ID"(+) IS NOT NULL)
27 - access("MFO"."PF_ID"(+)="MF"."F_ID")
filter("MFO"."PF_ID"(+) IS NOT NULL)
59 rows selected.
COLUMN STATS are obviously off by the behavior here… just not for the join, but why won’t it look at the index?
| OWNER | TABLE_NAME | COLUMN_NAME | NUM_BUCKETS | LAST_ANALYZED | SAMPLE_SIZE | HISTOGRAM |
| DW_USR | FF_TBL | FF_ID | 75 | 7/3/2011 18:41 | 4961 | HEIGHT BALANCED |
| DW_USR | FF_TBL | F_REC_ID | 1 | 7/6/2011 11:33 | 4450 | NONE |
| DW_USR | FF_TBL | H_CNT | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | CET_CD | 3 | 9/29/2009 11:22 | 4450 | FREQUENCY |
| DW_USR | FF_TBL | F_SEP | 1 | 9/29/2009 11:22 | 4430 | NONE |
| DW_USR | FF_TBL | F_EXT | 8 | 9/29/2009 11:22 | 4308 | FREQUENCY |
| DW_USR | FF_TBL | FFS_CD | 2 | 9/29/2009 11:22 | 4450 | FREQUENCY |
| DW_USR | FF_TBL | LU_DT | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | LM_BY | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | LM_DT | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | RT_SEP | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | FNP | 1 | 9/29/2009 11:22 | 4423 | NONE |
| DW_USR | FF_TBL | ATM_ID | 254 | 9/29/2009 11:22 | 1075 | HEIGHT BALANCED |
| DW_USR | FF_TBL | FC_CD | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | FC_ARG | 1 | 9/29/2009 11:22 | 160 | NONE |
| DW_USR | FF_TBL | CN_FLAG | 1 | 9/29/2009 11:22 | 4450 | NONE |
| DW_USR | FF_TBL | APTP_FLAG | 1 | 9/29/2009 11:22 | 4450 | NONE |
Yeah, so only the index columns are being updated…not so good as this will start to make the index appear less attractive as the row count gets higher and none of the others do when working with more advance joins…
Fix the stats and all are updated.
EXEC DBMS_STATS.GATHER_TABLE_STATS( ownname=> 'DW_USR', tabname=>'FF_TBL', ESTIMATE_PERCENT=>10,
METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO',CASCADE=>TRUE,DEGREE=>6);
EXEC DBMS_STATS.GATHER_TABLE_STATS( ownname=> 'DW_USR', tabname=>'MF_TBL', ESTIMATE_PERCENT=>10,
METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO',CASCADE=>TRUE,DEGREE=>6);
EXEC DBMS_STATS.GATHER_TABLE_STATS( ownname=> 'DW_USR', tabname=>'MFG_TBL', ESTIMATE_PERCENT=>10,
METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO',CASCADE=>TRUE,DEGREE=>6);
DB chose to drop the historgram on the FF_ID, too…now… **Note that the histograms that you see here are mostly shadows of their former selves…(so in my opinion is that they are doing more harm than good and I want to be rid of them…)
So how did we do?
SQL> explain plan for SELECT /*+ CPU_COSTING */ DISTINCT mf.f_id, mf.l_dir,
mf.f_name, st.stg_type_cd, pe.pr_stp_cd,
mfo.l_dir AS odir, mfo.f_name AS oname, peo.pr_stp_cd AS ostat FROM MR_TBL mf
JOIN ST_TBL st ON mf.b_rec _cd = st.b_rec_cd AND st.ca_flag = 'Y'
JOIN FF_TBL ff ON ff.fr_type_id = mf.fr_type_id
JOIN PE_TBL pe ON pe.f_id = mf.pf_id
AND (pe.pr_stp_name = ‘<req 1>’ OR pe.pr_stp_name = '<req2>')
LEFT JOIN MF_TBL mfo ON mfo.pf_id = mf.f_id
LEFT JOIN PE_TBL peo
ON peo.f_id = mf.f_id AND peo.pr_stp_name = '<req3>'
WHERE mf.f_id IN ( SELECT f_id FROM MF_TBL mf
WHERE MIF_id IS NULL START WITH f_id
IN ( SELECT mfgi.f_id FROM MFGI_TBL mfgi
WHERE mfgi.mfgi_id = :1)
CONNECT BY PRIOR mf.f_id = mf.pf_id);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
| Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
| SELECT STATEMENT | 1 | 242 | 13 (16) | 00:00:01 | |
| HASH UNIQUE | 1 | 242 | 13 (16) | 00:00:01 | |
| NESTED LOOPS OUTER | 1 | 242 | 12 (9) | 00:00:01 | |
| NESTED LOOPS OUTER | 1 | 162 | 11 (10) | 00:00:01 | |
| NESTED LOOPS | 1 | 138 | 10 (10) | 00:00:01 | |
| NESTED LOOPS | 2 | 228 | 9 (12) | 00:00:01 | |
| HASH JOIN | 2 | 218 | 8 (13) | 00:00:01 | |
| NESTED LOOPS | 4 | 396 | 2 (0) | 00:00:01 | |
| VIEW | VW_NSO_1 | 4 | 20 | 1 (0) | 00:00:01 |
| FILTER | |||||
| CONNECT BY WITH FILTERING | |||||
| FILTER | |||||
| TABLE ACCESS FULL | MF_TBL | 4 | 48 | 1 (0) | 00:00:01 |
| INDEX UNIQUE SCAN | MFGI_PK | 1 | 9 | 1 (0) | 00:00:01 |
| NESTED LOOPS | |||||
| CONNECT BY PUMP | |||||
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 4 | 48 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 3 | 1 (0) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 1 | 94 | 1 (0) | 00:00:01 |
| INDEX UNIQUE SCAN | MF_PK | 1 | 1 (0) | 00:00:01 | |
| TABLE ACCESS FULL | ST_TBL | 10 | 100 | 5 (0) | 00:00:01 |
| INDEX RANGE SCAN | FF_IDX1 | 1 | 5 | 1 (0) | 00:00:01 |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 (0) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | PE_TBL | 1 | 24 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | PE__IDX01 | 4 | 1 (0) | 00:00:01 | |
| TABLE ACCESS BY INDEX ROWID | MF_TBL | 2 | 160 | 1 (0) | 00:00:01 |
| INDEX RANGE SCAN | MF_TBL_IDX01 | 4 | 1 (0) | 00:00:01 |
Plan hash value: 1233783205
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("MF"."BR_TYPE_CD"="ST"."BR_TYPE_CD")
9 - filter("MIF_ID" IS NULL)
10 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
11 - filter( EXISTS (SELECT 0 FROM "DW_USR"."MFG_TBL" "MFGI" WHERE
"MFGI"."F_ID"=:B1 AND "MFGI"."MFG_ID"=TO_NUMBER(:1)))
12 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
13 - access("MFGI"."MFG_ID"=TO_NUMBER(:1) AND "MFGI"."F_ID"=:B1)
17 - access("MF"."PF_ID"=PRIOR "MF"."F_ID")
18 - filter("MF"."PF_ID" IS NOT NULL AND "MF"."F_REC_ID" IS NOT NULL AND
"MF"."BR_TYPE_CD" IS NOT NULL)
19 - access("MF"."F_ID"="$nso_col_1")
20 - filter("ST"."CA_FLAG"='Y')
21 - access("FF"."F_REC_ID"="MF"."F_REC_ID")
22 - filter("PE"."PS_NAME"='Concatenate' OR "PE"."PS_NAME"='<req1>')
23 - access("PE"."F_ID"="MF"."PF_ID")
filter("PE"."F_ID" IS NOT NULL)
24 - filter("PEO"."PS_NAME"(+)='<req2>')
25 - access("PEO"."F_ID"(+)="MF"."F_ID")
filter("PEO"."F_ID"(+) IS NOT NULL)
27 - access("MFO"."PF_ID"(+)="MF"."F_ID")
filter("MFO"."PF_ID"(+) IS NOT NULL)
59 rows selected.
Yes, 1 row is so much better than 9213…even with good stats… This will take a lot more “weight” off the MF_TBL we are seeing due to concurrency by this query when joining to FF_TBL, as well as the ones involving the MFG_TBL.
I will now follow up in the next couple weeks monitoring for any new deadlocks, as well as performance gains through execution times.





