Solid Choices for Oracle Tuning on Solid State Disk
As I continue to work on very large databases, (VLDB), I am exposed to more opportunities to speed up IO. This can involve Oracle’s solution of Exadata or stand alone improvements with options such as SSD, (Solid State Disk) which can offer faster IO performance at a fraction of the price. When this option becomes a reality, there will always be non-DBA’s that advise what would best benefit from the hardware, but to take the time to research what would truly benefit is important for the DBA to perform.
The Just the Facts on Solid State Disk:
There are several types of SSD available:
- Flash memory-based
- DRAM-based
- Cache or Buffer
The SSD can have different types of host interfaces, depending on the main hardware you are interfacing with and/or vendor choices:
- PCI
- Fibre Channel
- ATA, (Serial or Parallel)
- SCSI, (Serial or Parallel)
- USB
Rarely do we get a chance to move entire Terabytes of data onto fast disk, but rather are offered limited, faster disk to utilize for crucial objects that can give us the “most bang for the buck.” Commonly this is due to the price of these specialized and impressive IO read/write drives, but it can also be due to limitations on the hardware they are interfacing with.
As I started working on databases that utilized faster disk, with or without ASM, it became apparent that what these speedy disks were allocated to wasn’t always what SHOULD have been placed in the new location. Where indexes, look up tables and temp tablespace experienced impressive gains vs. the standard disk drives they had formerly resided on, I have been quick to dissuade anyone from placing redo logs on SSD.
I’m going to go through what data, reports and queries that I utilize to decide what should be on fast disk, along with my benchmark findings when I did have the opportunity to create an entire database on Fusion Octal fast disk.
Getting the most out of SSD is all about getting what won’t fit in memory, (SGA and PGA) onto a faster disk. All consistently large, [consistent] read tasks that the database must direct to disk for, but doesn’t write as often to disk, (visualizing batch loads vs. heavy transactional) and ONCE TUNING OPPORTUNITIES HAVE BEEN EXHAUSTED, are excellent choices for research when deciding what should be placed on SSD. This information can be achieved multiple ways as a DBA. AWR/ADDM and ASH reports can provide solid, high level data to direct you in the right direction if you are not as familiar with your data or wish to validate some of what you already know. For those of you that do not have the tuning pack license, then Statspack can do the same. Tracing can offer a detailed output that will tell you about objects that you are often going to slower disk for. OEM can provide graphs that will show IO demands on a heavily “weighted” system, as can other GUI tools in the market.
AWR/Statspack and I/O Wait Indicators
Your group has already decided that IO is an issue and should have verified this in the top 5 wait events that can be seen through AWR or statspack. The snapshots utilized for this examination should be times of heavy IO in the database environment as can be seen in the example Table 1.
Table 1
| Top 5 Timed Events Avg %Total | |||||
| ~~~~~~~~~~~~~~~~~~ wait Call | |||||
| Event | Waits | Time (s) | (ms) | Time | Wait Class |
| —————————— | ———— | ———– | —— | —— | ———- |
| db file sequential read |
979,382 |
36,066 |
37 |
45.1 |
User I/O |
| db file scattered read |
5,083,058 |
22,401 |
4 |
28 |
User I/O |
| Direct path write temp |
13,577 |
17 |
User I/O | ||
| db file parallel write |
464,287 |
5,136 |
11 |
6.4 |
System I/O |
| direct path read temp |
366,956 |
2,671 |
7 |
3.3 |
User I/O |
From here, we inspect our AWR or statspack reports, there is a section that should be inspected first and foremost, referred to as Segments by Physical Reads the output from this section can be seen in Table 2.
Table 2
| Tablespace | Obj. | Physical | ||||
| Owner | Name | Object Name | Type | Reads | %Total | |
| ———- | ———- | ——————– | —– | ———— | ——- | |
| SCHM_OWNR | TBLSPC1_DATA | TBL1_FILE_1 | TABLE |
86,788,592 |
47.87 |
|
| SCHM_OWNR | TBLSPC2_DATA | TBL1_FILE_PK | INDEX |
80,544,192 |
46.59 |
|
| SCHM_OWNR | TBLSPC1_IDX | TBL2_MR_PK | INDEX |
74,742,752 |
45.39 |
|
| SCHM_OWNR | TBLSPC1_IDX | TBL3_M_PK | INDEX |
40,924,576 |
28.43 |
|
| SCHM_OWNR | TBLSPC2_DATA | TBL4 | TABLE |
26,790,464 |
15.52 |
Tuning, Always the First Step
The first step in the process is to inspect I/O issues with large objects. Is there a partitioning strategy that can take the physical reads and IO down for the objects in question? If there is not or there is still a requirement for full scans or large index or partition scans, then you need to look and see what tuning options there are for the code involved. If there is already partitioning in place, is it the right partitioning key and/or is sub-partitioning in order.
Once this process has completed, then inspect performance for physical reads again and verify the objects in question are still a bottleneck for IO. If so, then they may be a valid choice to relocate to a new ASM diskgroup residing on SSD.
Creating a specific ASM disk group for the SSD disk is the obvious choice, as the SSD will not be part of the standard disk groups without performance and rebalance challenges. Once complete, you will then have the new SSD diskgroup available for use.
Inspect the sizes of the objects in left in your “top 5 physical IO objects” and decide what you move over for initial testing. I commonly make a copy and test a copy of the objects against the code to test true performance gains, ensuring that there are no required physical storage required changes as well. ***over what you need for capacity growth estimates. What should you bring over next? Now if we are still using the same reports that are showing above, I would look carefully at what I have available and would start to inspect temp usage as a possible next candidate.
It is important that if you consider temp, that it is in a “controlled” state for your environment. It is not uncommon for many DBA’s to set TEMP to autoextend and not pay attention to temp tablespace usage. I fully advocate the opposite and track temp usage, along with monitor alerts with scripts for anytime any user or process consumes a certain threshold per process on any of my production systems.
Considering the amount of waits on temp read and writes, tuning opportunities may be boundless on hash joins and sorting. Low hanging fruit in these categories will involve looking for “order by’s” that have been left in for insert statements, (not sure how often I’ve seen this, but it’s a very common and an unfortunate occurrence…) In regards to hash joins, there can be examples of wide reporting tables only one or two columns are actually required for the results and the join. A choice of CTAS, (create table as select) of only the columns required for the process, dropping post the join to the second table, can drastically trim time and temp usage for a hash of tables that involve only a few columns on a wide table where an index is a less than efficient answer. This choice allows the performance gain of the hash without the performance hit of swapping to temp when wide tables cause PGA to never be enough.
After tuning temp usage due to large hash joins and sorting outside of PGA, inspect the max temp tablespace required. If this will now fit without impacting capacity planning requirements for the SSD, move the temp tablespace onto the SSD ASM disk group.
Scripts to Inspect IO Usage
There are many scripts that can be written or available on the web and in reports to inspect IO usage. The following is a good example of one:
select
io.cnt Count,
io.event Event,
substr(io.obj,1,20) Object_Name,
io.p1 P1_Value,
f.tablespace_name Tablespace_Name
from
(
select
count(*) cnt,
round(count(*)/(60*60),2) aas,
substr(event,0,15) event,
nvl(o.object_name,decode(CURRENT_OBJ#,-1,0,CURRENT_OBJ#)) obj,
ash.p1,
o.object_type otype
from v$active_session_history ash,
all_objects o
where ( event like 'db file s%' or event like 'direct%' )
and o.object_id (+)= ash.CURRENT_OBJ#
and sample_time > sysdate - 7/(60*24)
group by
substr(event,0,15) ,
CURRENT_OBJ#, o.object_name ,
o.object_type ,
ash.p1
) io,
dba_data_files f
where
f.file_id = io.p1
and f.tablespace_name not like '%RAM%' –-exclude SSD objects
Order by io.cnt desc
/
| COUNT | EVENT | OBJECT_NAME | P1_Value | TABLESPACE_NAME |
| 122 | db file sequent | TBL1_CHAIN | 102 | N_DATA |
| 33 | db file sequent | HH_TBL1_FDX01 | 161 | H_INDX1 |
| 28 | db file sequent | CA_TBL2_PK | 270 | C_INDX |
| 25 | db file sequent | I_TBL3_IDX02 | 225 | I_INDX2 |
| 21 | db file sequent | E_TBL4 | 43 | E_DATA |
| 20 | direct path rea | I_MRG_TBL | 75 | M_DATA |
| 23 | db file scatter | C_TBL3 | 50 | C_DATA |
The above script gives you clear examples of what objects you should point your research to, first indexes, (sequential) and in this case, a look up table, (direct path read).
Building a Database Entirely on SSD
We were given this opportunity recently to test performance gains and decide if budget should be set aside for investing in the hardware to build entire databases on SSD vs. strategic objects within a database. We have a process that takes approximately five days to aggregate a snapshot in time, up to 12TB of data. The goal was to see, could we accomplish this in two days if given all SSD for the database vs. a combination of standard disks on a disk array and SSD for high read/write data.
This sounds like a slam dunk, but it is more challenging than one might think. There are small things to that have to be updated in the database, such as system statistics in 10g to ensure the database knows fully the gift you have granted it, but then you may also need to make significant logical changes to take advantage of the hardware due to limitations in CPU and memory per process. The build was on a server that utilized hyper-threading and some of the “performance settings” actually appeared to work against the database vs. the lesser setting that might stripe the CPU usage more efficiently. The graph below show the hits against the first 32 of “hyper-threaded” 64 CPU’s:
Figure 1
This graph in Figure 1 only shows CPU usage over a small snapshot of time, but over long intervals, it showed the same differing data vs. SAR or other reports from the Admin side- the database continued to hit the same CPU’s over and over, leaving other CPU’s untouched for extended periods of time. This did not bode well for the database performance, high read/write capability or not.
The build time improvements were impressive, but the one thing that must be included is that the improvement in performance was not just a hardware improvement step. There was first the additional hardware and then a tuning process at the database level to ensure the processes were able to achieve the best performance the solid state disk offered it, (comparison of columns New Run Time against the Final Run Time in Figure 2.)
| Process Step | Standard Disk/SSD | Total Min. | SSD Without Tuning | New Run Time | Initial Perf. Gain | SSD With Tuning | Final Run Time | Total Perf. Gain |
| DIM Table 1 CTAS | 4 HOURS 20 MINUTES 27 SECS | 260 min | 3 HOURS 38 MINUTES 24 SECS | 218 min |
19% |
2 HOURS 43 MINUTES 45 SECS | 164 min |
58% |
| CTAS Table 2 | 4 HOURS 23 MINUTES 11 SECS | 263 min | 0 HOURS 16 MINUTES 2 SECS | 16 min | 16 Times Perf | 0 HOURS 19 MINUTES 8 SECS | No Tuning | N/A |
| CTAS Table 3 | 1 HOURS 29 MINUTES 21 SECS | 89 min | 0 HOURS 44 MINUTES 27 SECS | 44 min | 2 Times Perf. | 0 HOURS 57 MINUTES 19 SECS | No Tuning | N/A |
| CTAS Table 4 | 2 HOURS 55 MINUTES 58 SECS | 175 min | 0 HOURS 42 MINUTES 16 SECS | 42 min | 4 Times Perf. | 0 HOURS 47 MINUTES 55 SECS | No Tuning | N/A |
| CTAS Table 5 | 10 HOURS 7 MINUTES 41 SECS | 607 min | 1 HOURS 50 MINUTES 7 SECS | 110 min | 6 Times Perf. | 1 HOURS 42 MINUTES 6 SECS | No Tuning | N/A |
| CTAS Table 6 | 11 HOURS 32 MINUTES 40 SECS | 692 min | 4 HOURS 51 MINUTES 17 SECS | 291 min | 2 Times Perf. | 5 HOURS 9 MINUTES 26 SECS | No Tuning | N/A |
| Multiple Table Aggregation | 25 HOURS 15 MINUTES 3 SECS | 1515 min | 9 HOURS 58 MINUTES 1 SECS | 598 min | 3 Times Perf. | 5 HOURS 16 MINUTES 31 SECS | 316 min | 5 Times Perf. |
| Summary Table 1 Agg. | 25 HOURS 24 MINUTES 35 SECS | 1524 min | 10 HOURS 0 MINUTES 20 SECS | 600 min | 3 Times Perf. | 5 HOURS 18 MINUTES 14 SECS | 318 min | 5 Times Perf. |
| Summary Table 2 Agg. | 25 HOURS 23 MINUTES 56 SECS | 1523 min | 10 HOURS 7 MINUTES 22 SECS | 607 min | 3 Times Perf. | 5 HOURS 25 MINUTES 54 SECS | 325 min | 5 Times Perf. |
| Index Creation Table 1 | 1 HOURS 16 MINUTES 33 SECS | 76 min | 0 HOURS 53 MINUTES 42 SECS | 54 min |
39% |
0 HOURS 53 MINUTES 14 SECS | No Tuning | N/A |
| Index Creation Table 2 | 1 HOURS 22 MINUTES 55 SECS | 82 min | 0 HOURS 59 MINUTES 55 SECS | 60 min |
28% |
0 HOURS 59 MINUTES 6 SECS | No Tuning | N/A |
| CTAS Aggr Table 3 | 6 HOURS 36 MINUTES 20 SECS | 396 min | 3 HOURS 21 MINUTES 18 SECS | 201 min |
50% |
3 HOURS 13 MINUTES 38 SECS | No Tuning | N/A |
| Index Creation Table 3 | 0 HOURS 52 MINUTES 2 SECS | 52 min | 0 HOURS 40 MINUTES 3 SECS | 40 min |
24% |
0 HOURS 48 MINUTES 15 SECS | No Tuning | N/A |
| CTAS Aggr. Table 4 | 2 HOURS 41 MINUTES 13 SECS | 161 min | 1 HOURS 32 MINUTES 8 SECS | 92 min |
43% |
1 HOURS 28 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Aggr Table 5 | 3 HOURS 46 MINUTES 59 SECS | 226 min | 2 HOURS 58 MINUTES 29 SECS | 179 min |
21% |
2 HOURS 55 MINUTES 20 SECS | No Tuning | N/A |
| CTAS Aggr. Table 6 | 0 HOURS 51 MINUTES 27 SECS | 51 min | 0 HOURS 36 MINUTES 46 SECS | 37 min |
28% |
0 HOURS 34 MINUTES 33 SECS | No Tuning | N/A |
| Insert to Table 6 | 0 HOURS 5 MINUTES 24 SECS | 5 min | 0 HOURS 5 MINUTES 6 SECS | 5 min | NONE | 0 HOURS 4 MINUTES 52 SECS | 5 min | NONE |
| Update to Table 6 | 26 HOURS 40 MINUTES 41 SECS | 1640 min | 25 HOURS 9 MINUTES 52 SECS | 1510 min |
8% |
17 HOURS 44 MINUTES 2 SECS | 1084 min |
44% |
| CTAS Table 7 | 1 HOURS 1 MINUTES 48 SECS | 61 min | 0 HOURS 7 MINUTES 43 SECS | 8 min | 13 Times Perf. | 0 HOURS 6 MINUTES 37 SECS | No Tuning | N/A |
| CTAS Aggr Table 8 | 0 HOURS 28 MINUTES 31 SECS | 28 min | 0 HOURS 22 MINUTES 12 SECS | 22 min |
22% |
0 HOURS 19 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Mod TBLS 9/10 | 1 HOURS 42 MINUTES 36 SECS | 102 min | 1 HOURS 42 MINUTES 22 SECS | 102 min | NONE | 1 HOURS 39 MINUTES 25 SECS | No Tuning | N/A |
| CTAS Table Aggr. 11 | 2 HOURS 26 MINUTES 58 SECS | 147 min | 1 HOURS 29 MINUTES 53 SECS | 90 min |
49% |
1 HOURS 24 MINUTES 42 SECS | No Tuning | N/A |
| CTAS Aggr. Table 12 | 7 HOURS 24 MINUTES 44 SECS | 445 min | 6 HOURS 7 MINUTES 48 SECS | 368 min |
18% |
6 HOURS 6 MINUTES 40 SECS | No Tuning | N/A |
| CTAS Aggr. Table 13 | 6 HOURS 47 MINUTES 31 SECS | 408 min | 4 HOURS 38 MINUTES 1 SECS | 278 min |
32% |
5 HOURS 5 MINUTES 32 SECS | No Tuning | N/A |
| CTAS Aggr. Table 14 | 25 HOURS 23 MINUTES 32 SECS | 1524 min | 10 HOURS 9 MINUTES 51 SECS | 610 min | 3 Times Perf. | 5 HOURS 27 MINUTES 17 SECS | 327 min | 5 Times Perf. |
| CTAS Aggr. Table 15 | 1 HOURS 21 MINUTES 59 SECS | 82 min | 0 HOURS 22 MINUTES 49 SECS | 23 min |
65% |
0 HOURS 4 MINUTES 33 SECS | 4 min | 20 Times Perf. |
| Update to Table 13 | 0 HOURS 12 MINUTES 45 SECS | 13 min | 0 HOURS 49 MINUTES 58 SECS | 50 min | 3 Times LOSS!! | 0 HOURS 1 MINUTES 22 SECS | 1 min | 9 Times Perf. |
Figure 2
I must note that what challenged us in unresolved issues were waits on CPU due to hyper-threaded CPU issues.
Tuning involved for the third columns time elapsed involved the following:
- Bind variable additions
- Literal additions where bind peeking was an issue.
- A change from ASSM, (Automatic Segment Space Management) to manual segment space management where freelists could be set at the object level, (dynamically allocated freelists were not able to adjust quickly enough for some of the load processes…)
- Changes to initial transactions, percent free and parallel that made sense, (upping it for some, downgrading it for others that did not work with the partitioning or a need for partitioning…)
Inspecting I/O by SQL_ID
This script, (adopted from Tim Gorman’s sqlhistory.sql from, www.evdbt.com) does a wonderful job of pulling a clean, clear picture of what physical and logical I/O is occurring in a single SQL_ID, seen here in Table 3 :
Table 3
| +————————————————————————————————–+ | |||||||||
| Plan HV Min Snap Max Snap Execs LIO PIO CPU Elapsed | |||||||||
| +————————————————————————————————–+ | |||||||||
| 1766271350 659 659 1 593,134,283 12,961,814 14,657.45 15,067.05 | |||||||||
| +————————————————————————————————–+ | |||||||||
| ========== PHV = 1766271350========== | |||||||||
| First seen from “07/15/11 13:00:31″ (snap #659) | |||||||||
| Last seen from “07/15/11 13:00:31″ (snap #659) | |||||||||
| Execs LIO PIO CPU Elapsed | |||||||||
| ===== === === === ======= | |||||||||
| 1 593,134,283 12,961,814 14,657.45 15,067.05 | |||||||||
| Plan hash value: 1766271350 | |||||||||
| TQ | IN-OUT | PQ Distrib | ||||||
|
0 |
CREATE TABLE STATEMENT | 1543M(100) | ||||||
|
1 |
PX COORDINATOR | |||||||
|
2 |
PX SEND QC (RANDOM) | :TQ10001 | 464M | 397G | 4128K (7) | |||
| Q1,01 | P->S | QC (RAND) | ||||||
|
3 |
LOAD AS SELECT | |||||||
| Q1,01 | PCWP | |||||||
|
4 |
PX RECEIVE | 464M | 397G | 4128K (7) | ||||
| Q1,01 | PCWP | |||||||
|
5 |
PX SEND RANDOM LOCAL | :TQ10000 | 464M | 397G | 4128K (7) | |||
| Q1,00 | P->P | RANDOM LOCA | ||||||
|
6 |
PX PARTITION LIST ALL | 464M | 397G | 4128K (7) |
1 |
1000 |
||
| Q1,00 | PCWC | |||||||
|
7 |
HASH JOIN RIGHT OUTER | 464M | 397G | 14G | 4128K (7) | |||
| Q1,00 | PCWP | |||||||
|
8 |
TABLE ACCESS FULL | HDN_TBL | 231M | 112G | 576K (22) |
1 |
1000 |
|
| Q1,00 | PCWP | |||||||
|
9 |
HASH JOIN RIGHT OUTER | 464M | 171G | 6967M | 1551K (7) | |||
| Q1,00 | PCWP | |||||||
|
10 |
TABLE ACCESS FULL | HD_TBL | 310M | 50G | 144K (34) |
1 |
1000 |
|
| Q1,00 | PCWP | |||||||
|
11 |
TABLE ACCESS FULL | H_TBL | 464M | 95G | 339K (13) |
1 |
1000 |
|
| Q1,00 | PCWP |
| Summary Execution Statistics Over Time | |||||||||
| Avg Avg | |||||||||
| Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs) | |||||||||
| Time Execs Per Exec Per Exec Per Exec Per Exec | |||||||||
| ———— ——– ——————- ——————- ——————- ——————- | |||||||||
| 15-JUL 13:00 1 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| ——– ——————- ——————- ——————- ——————- | |||||||||
| avg 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| sum 1 | |||||||||
| Per-Plan Execution Statistics Over Time | |||||||||
| Avg Avg | |||||||||
| Plan Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs) | |||||||||
| Hash Value Time Execs Per Exec Per Exec Per Exec Per Exec | |||||||||
| ———- ———— ——– ——————- ——————- ——————- ——————- | |||||||||
| 1766271350 15-JUL 13:00 1 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| ********** ——– ——————- ——————- ——————- ——————- | |||||||||
| avg 593,134,283.00 12,961,814.00 14,657.45 15,067.05 | |||||||||
| sum 1 | |||||||||
+—————————————————————————————————————————
This report clearly shows the amount of logical vs. physical I/O coming from the statement in question. This gives the DBA a clear indicator if any object in the poor performing process would benefit a move to SSD or if tuning is in order to eliminate the I/O performance challenge. A combination of both may be chosen, as there are multiple right outer hash-joins which clearly show as the performance hit in the time elapsed and in the temp tablespace usage/significant I/O categories, (note that the process needs to scan ALL the partitions for the objects in question…)
SSD and Forced Hash Joins on Indexes
When a database design is impacted by the front-end tool required to present data in a proper format, such as Business Analytics Software, the price can be high to the DBA who has to manage resource usage. Many times the data must be presented in a very flat, wide format and requires a large amount of data pulled across a network interface. This can be in anywhere from a couple 100GB’s to multiple Terabytes. When you are the DBA looking at ways to increase performance when logical performance tuning is limited, solid state disk can offer you gains not offered anywhere else.
Business Analytics Software often will query a few 100GB to 1TB objects, hash join and then perform an order by. For the DBA, to create an index, then using a hint to force a hash join between an index and the large table can improve performance greatly, but to move the index onto SSD can increase the hash and limit the requirements for SSD at the same time.
create table new_ordertmp_tbl compress pctfree 0 tablespace data_1 as
SELECT /*+ USE_HASH(t,i) INDEX_FFS(i,I_TBL2_IDX) INDEX(t,CT1) */
cast(MOD(t.i_id, 1000) as number(3)) im_key
, LEAST(ROUND(MONTHS_BETWEEN(:b1, t.t_dt) + .4999 ), 48) AS r_key , t.i_id AS ib_id
, t.m_id, t.t_dt, cast(:b5 as varchar2(5)) m_cd, FIRST_VALUE(i.ib_id) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS ibcid, t.t_nbr, cast(TO_NUMBER(TO_CHAR(FIRST_VALUE(t.t_dt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
), ''YYYYMMDD'')) as number(8)) AS d_id,
FIRST_VALUE(DECODE(t.oct_cd, NULL, 'O','W', 'O', 'E', 'O', 'R', 'R', 'F')
) OVER(PARTITION BY t.d_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS tct_cd, SUM(t.ot_amt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS ot_amt
, FIRST_VALUE(NVL(t.pmt_cd, ''U'')) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS pmt_cd, SUM(t.i_cnt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS i_cnt
, FIRST_VALUE(t.cs_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cs_cd, FIRST_VALUE(t.cc_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cc_cd, t.oct_cd
FROM CT_TBL1 t, I_TBL2 i
WHERE t.m_id = :b5
AND t.t_dt BETWEEN :b1 AND :b2 AND i.ibid = t.i_id
order by i.i_id;
Object Sizes:
CT_TBL1, partition 7= 800GB
I_TBL2=1.2TB
While the I_TBL2_IDX, the index created on the I_TBL2 and possessing only the columns required for this routinely run query and leading with the I_ID column, is only 200GB.
Execution Plan for Query:
Table 4
| Description | Object | Cost | Cardinality | Bytes | PartitionID |
| SELECT STATEMENT, GOAL = ALL_ROWS |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| WINDOW BUFFER |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| WINDOW SORT |
107587 |
16356015 |
10079496 |
||
| FILTER | |||||
| HASH JOIN |
107371 |
16356015 |
10079496 |
||
| PARTITION LIST SINGLE |
330 |
16356015 |
8166868 |
7 |
|
| TABLE ACCESS FULL | CT_TBL1 |
330 |
16356015 |
8166868 |
7 |
| INDEX FAST FULL SCAN | I_TBL2_IDX |
23597 |
6399400008 |
2120000 |
The hash join is thus, decreased to a total size of 1TB, vs. the much larger size it would have been if the hash join would have been run against the table. By running it with the index residing on solid state disks, the actual performance to create the table from the CTAS in question was increased by 12 fold.
What does the IO look like on the solid state disk vs. the old standard disk? The differences are startling when viewed through iostat, (table 5).
Table 5
| Device: | rsec/s | wsec/s | avgqu-sz | %util |
| Raid 5 Disk |
55200 |
30224 |
215.72 |
84.03 |
| SSD |
52394.67 |
41306 |
223.74 |
7.49 |
As you can see, the IO is much less impacting on the SSD than the standard disk.
Via graphs, such as from Cacti, the differences in IO throughput can be seen for standard disk, (figure 3) and solid state disk, (figure 4.)
Figure 3
Figure 4
Summary
Solid state disk is here to stay and often will be seen as a “silver bullet” for production I/O issues. The goal of the DBA is to utilize this technology in a way that does not replace logical tuning and focus instead, in ways that may actually support positive changes enforcing both physical and logical tuning to get the most out of the new hardware available on the market today.
Restarting a Duplicate Process From a VERY Failed State
As part of an 11g Upgrade, it was found that a database environment could be built from one of the upgraded databases through a duplicate. As this process had never been performed before in this fashion, a test was in order. The test was an excellent chance to discover that the OSUser that performs the duplicate process was the proud owner of a .kshrc file with hard coded Oracle variables set which was an excellent choice if you want to really screw up a new duplicate database…
Scenario:
1. Duplicate has started with correct variables set.
2. Subsequent shell scripts then “upset” the environment mid-process leaving a failed duplicate with ASM files for the controlfiles created, but the DBID and dbname still set to the target database, not the auxillary.
3. Post the failure, the auxillary, (duplicate) database can only be mounted, not opened.
| After setting environment to building auxillary database environment, after duplicate failure and you see in the spfile, name of db is no longer the auxilary database. |
| 1. Shutdown the auxilary database: |
| [oracledbs]$ sqlplus ‘/as sysdba’ |
| SQL> shutdown; |
| ORA-01109: database not open |
| Database dismounted. |
| ORACLE instance shut down.@ |
| In a second screen set to +ASM instance for auxillary, remove controlfiles for target db that have been set to that name through the asmcmd command console: |
| [oracle@dbs]$ asmcmd |
| ASMCMD> ls |
| DATA_H/ |
| DATA_RAM_H/ |
| ASMCMD> cd DATA_H |
| ASMCMD> ls |
| DB_H/ |
| ASMCMD> cd DB_H |
| ASMCMD> ls |
| CONTROLFILE/ |
| ASMCMD> cd CONTROLFILE |
| ASMCMD> ls |
| control1.ctl <–these are control files for auxillary, stuck with dbname of target, can’t be renamed, can’t mount db! |
| control2.ctl |
| control3.ctl |
| current.389.766743537 |
| current.445.766743537 |
| current.500.766743537 |
| ASMCMD> rm control* |
| You may delete multiple files and/or directories. |
| Are you sure? (y/n) y |
| ASMCMD> quit |
| Back on your original screen now, restart the auxillary with pfile set to corect dbname and start with this pfile- |
| SQL> startup nomount pfile=’/u01/app/oracle/product/11.2.0/dbhome_2/dbs/initdb_h.ora’ |
| ORACLE instance started. |
| Total System Global Area 7.6964E+10 bytes |
| Fixed Size 2215704 bytes |
| Variable Size 3.0065E+10 bytes |
| Database Buffers 4.6708E+10 bytes |
| Redo Buffers 189513728 bytes |
| SQL> create spfile from pfile; |
| File created. |
| SQL> shutdown; |
| ORACLE instance shut down. |
| SQL> startup nomount; |
| ORACLE instance started. |
| Total System Global Area 7.6964E+10 bytes |
| Fixed Size 2215704 bytes |
| Variable Size 3.0065E+10 bytes |
| Database Buffers 4.6708E+10 bytes |
| Redo Buffers 189513728 bytes |
| Now you can restart the duplicate process and the database will again be recognized correctly. |
How to Break and EM12c and Other Hobbies
As many know I’ve been busy trying to bring our many-times patched 10g Enterprise Manager migrated to a new server with EM12c. I thought it would be interesting to see how fast I could break it, considering my skills, I knew it might not be a challenge.
Scenario
1. Bug with listener refusing to connect dynamically to EM12c repository database.
2. After releasing code to the database for advanced notification, the SYSMAN.MGMT_ADMIN_DATA package has gone invalid, which is used to manage much of the repository at the command line.
3. The repository owner password has become corrupted.
These were my high level notes as I went through the troubleshooting:
| Reconfigure Repository: |
| $AGENT_HOME/bin/emctl config oms -store_repos_details -repos_port 1521 -repos_sid emrep12c -repos_host host_nm -repos_user SYSMAN -repos_pwd password |
| Stopping Old Oracle OMS and configuring: |
| export ORACLE_HOME=/opt/oracle/app/OracleHomes/oms10g |
| cd $ORACLE_HOME/opmn/bin |
| ./opmnctl stopall |
| cd $ORACLE_HOME/bin |
| ./emctl config oms -change_repos_pwd |
| cd $ORACLE_HOME/opmn/bin |
| ./opmnctl startall |
| Invalid package after loop from listener bug, corrupt SYSMAN password and invalid MGMT_XXX pkgs! |
| Can’t reset password, pkg used for it is invalid! |
| ORA-04063: package body “SYSMAN.MGMT_ADMIN_DATA” has errors |
| ORA-06508: PL/SQL: could not find program unit being called: “SYSMAN.MGMT_ADMIN_DATA” |
| ORA-06512: at line 1 |
| Can’t start repository, password error in logs: |
| Error occurred. Check the log /local/u01/app/oracle/product/12.1.0/gc_inst/em/EMGC_OMS1/sysman/log/secure.log |
| [oracle@vwgrid01 bin]$ ./emctl start oms |
| Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0 |
| Soooo close…but did I get far enough?, (Nope, failure…have to remove everything!!) |
| Can’t drop repository: |
| <Database ORACLE HOME>/bin/emca -deconfig dbcontrol db -repos drop -SYS_PWD sys_password -SYSMAN_PWD password |
| Can’t reconfigure through the system, either! |
| <Database ORACLE HOME>/bin/emca -deconfig dbcontrol db -repos drop -cluster -SYS_PWD sys_password -SYSMAN_PWD password |
| No chance either…I’m stuck in a terrible loop! |
| Solution- |
| backup and then edit out the 12c environment from the inventory.xml file in the ContentsXML dir of the Lsinventory |
| kill any process that are still running out the the EM12c home. |
| rm -rf the 12c home directory |
| Uninstall the emrep12 database in the 11g home |
| Create a new emrep12 database in the 11g home |
| recreate a new inventory file: |
| $ORACLE_HOME/oui/bin/runInstaller -silent -invPtrLoc “/u01/app/oraInventory/oraInst.loc” -attachHome ORACLE_HOME=”/u01/app/oracle/product/11.2.0″ ORACLE_HOME_NAME=”OraDb11g_home2″ |
| Install the EM12c once again, as now the installation appears to have never existed… |
| If a restart of the installation needs to be done, you need to look in the home it fails on, (*saying that it’s already installed there) and remove the *_temp file from the dir, then try again. |
The NO_INVALIDATE Option in DBMS_STATS with 10g
I had worked hard on a report, 47 SQL statements in all to tune it down from 5 hrs to under 30 minutes. The first runs had been quite successful, so when a third run sent an alert on temp usage, I knew something was wrong.
SID PROCESS MACHINE SQL_TEXT SQL_ID TOTAL MB
-------- ------------ -------------------- ---------------------
507 1137 HOST SELECT *** 7t3muww36xhzn 45516
600 1139 HOST SELECT *** 7t3muww36xhzn 45516
525 1132 HOST SELECT *** 7t3muww36xhzn 45516
509 1135 HOST SELECT *** 7t3muww36xhzn 45516
I checked the stats first, as one of the fixes was to ensure the staging tables in this process were collecting stats after the initial feeds came in, but both tables involved showed valid statistics:
SQL> select num_rows, last_analyzed from dba_tab_partitions 2 where table_name='<I_STAGE>' 3 and partition_name='P170';
NUM_ROWS LAST_ANAL
---------- ---------
480900000 17-OCT-11
SQL> select last_analyzed from dba_tables 2 where table_name='<SML_TBL>';
LAST_ANAL
---------
17-OCT-11
I ran a quick AWR report for the specific SQL_ID to see what I was dealing with, execution plan wise..
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 46198 18-Oct-11 09:00:09 282 7.9
End Snap: 46200 18-Oct-11 10:00:13 245 8.4
Elapsed: 60.07 (mins)
DB Time: 1,874.62 (mins)
SQL ID: 7t3muww36xhzn DB/Inst: PRODUCTION/PROD Snaps: 46198-46200
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> SELECT ***
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 324636810 4,785,428 4 46199 46199
2 4097803110 1,047 1 46200 46200
-------------------------------------------------------------
Plan 1(PHV: 324636810)
----------------------
Plan Statistics DB/Inst: PRODBASE/prodbase Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 4,785,428 1,196,357.0 4.3
CPU Time (ms) 3,450,070 862,517.6 8.4
Executions 4 N/A N/A
Buffer Gets 759,453 189,863.3 0.0
Disk Reads 683,619 170,904.8 2.8
Parse Calls 35 8.8 0.0
Rows 0 0.0 N/A
Execution Plan ------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | TQ |IN-OUT| PQ Distrib ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 9 | | | | | | 1 | COUNT STOPKEY | | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (ORDER) | :TQ10002 | 1 | 240 | 9 | | | Q1,02 | P->S | QC (ORDER) | 4 | VIEW | | 1 | 240 | 9 | | | Q1,02 | PCWP | | 5 | SORT ORDER BY STOPKEY | | 1 | 120 | 9 | | | Q1,02 | PCWP | | 6 | PX RECEIVE | | 1 | 240 | | | | Q1,02 | PCWP | | 7 | PX SEND RANGE | :TQ10001 | 1 | 240 | | | | Q1,01 | P->P | RANGE | 8 | SORT ORDER BY STOPKEY | | 1 | 240 | | | | Q1,01 | PCWP | | 9 | HASH JOIN | | 1 | 120 | 5 | | | Q1,01 | PCWP | | 10 | PX RECEIVE | | 1 | 109 | 3 | | | Q1,01 | PCWP | | 11 | PX SEND BROADCAST | :TQ10000 | 1 | 109 | 3 | | | Q1,00 | P->P | BROADCAST | 12 | PX BLOCK ITERATOR | | 1 | 109 | 3 | KEY | KEY | Q1,00 | PCWC | | 13 | TABLE ACCESS FULL| I_STAGE | 1 | 109 | 3 | KEY | KEY | Q1,00 | PCWP | | 14 | PX BLOCK ITERATOR | | 5002 | 55022 | 2 | | | Q1,01 | PCWC | | 15 | TABLE ACCESS FULL | SML_TBL | 5002 | 55022 | 2 | | | Q1,01 | PCWP | ------------------------------------------------------------------------------------------------------------------------
This is the one that was eating up all the temp! Note that even though I checked stats, stats were correct as of the previous day, no changes to the partition stats, the execution plan only shows one row, which anyone who listens to Maria Colgan knows, that’s just Oracle giving you the benefit of the doubt and saying, “I don’t think there’s any rows in this object, (or sub-object in this case..) but I’ll give you 1 row for the fun of it!”
Second execution plan in the report is the one I desired:
Plan 2(PHV: 4097803110) ----------------------- Plan Statistics DB/Inst: PRODUCTION/PROD Snaps: 46198-46200 -> % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100 Stat Name Statement Per Execution % Snap ---------------------------------------- ---------- -------------- ------- Elapsed Time (ms) 1,047 1,046.7 0.0 CPU Time (ms) 967 967.0 0.0 Executions 1 N/A N/A Buffer Gets 2,007 2,007.0 0.0 Disk Reads 3 3.0 0.0 Parse Calls 9 9.0 0.0 Rows 40 40.0 N/A User I/O Wait Time (ms) 1 N/A N/A
Execution Plan ------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | Pstart| Pstop | TQ |IN-OUT| PQ ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | | 2834 | | | | | | 1 | COUNT STOPKEY | | | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | | 3 | PX SEND QC (ORDER) | :TQ10002 | 1272K| 291M| | 2834 | | | Q1,02 | P->S | QC | 4 | VIEW | | 1272K| 291M| | 2834 | | | Q1,02 | PCWP | | 5 | SORT ORDER BY STOPKEY | | 1272K| 152M| 389M| 2834 | | | Q1,02 | PCWP | | 6 | PX RECEIVE | | 40 | 9600 | | | | | Q1,02 | PCWP | | 7 | PX SEND RANGE | :TQ10001 | 40 | 9600 | | | | | Q1,01 | P->P | RA | 8 | SORT ORDER BY STOPKEY | | 40 | 9600 | | | | | Q1,01 | PCWP | | 9 | HASH JOIN | | 1272K| 152M| | 7 | | | Q1,01 | PCWP | | 10 | PX RECEIVE | | 5005 | 55055 | | 2 | | | Q1,01 | PCWP | | 11 | PX SEND BROADCAST | :TQ10000 | 5005 | 55055 | | 2 | | | Q1,00 | P->P | BR | 12 | PX BLOCK ITERATOR | | 5005 | 55055 | | 2 | | | Q1,00 | PCWC | | 13 | TABLE ACCESS FULL| SML_TBL | 5005 | 55055 | | 2 | | | Q1,00 | PCWP | | 14 | PX BLOCK ITERATOR | | 1271K| 139M| | 4 | KEY | KEY | Q1,01 | PCWC | | 15 | TABLE ACCESS FULL | I_STAGE | 1271K| 139M| | 4 | KEY | KEY | Q1,01 | PCWP | ------------------------------------------------------------------------------------------------------------------------
So what changed? What impacted my statistics?
Upon investigation, I came to the conclusion that it is a combination of a “feature” with what I think is a bug in 10g dbms_stats.
A search of stats processing showed that during the one process that was executing against the P170 partition on the I_STAGE, there were a number of other partitions in this same table having stats gathered post loading.
declare v_stage_table_name varchar2(64); begin select min(stage_table_name) into v_stage_table_name from stage_tables
where stage_table_type_cd = '<I_STAGE>'; dbms_stats.gather_table_stats (ownname => 'dw_user', tabname => v_stage_table_name,
partname => 'P450' ,estimate_percent =>.01, granularity=>'PARTITION', method_opt=>'for all columns size 1',
no_invalidate=> false, cascade=>false, degree=>4); end;
Now the key here in the statement above is:
no_invalidate=>false
If you read the description for this from Oracle:
no_invalidate |
Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE. Use DBMS_STATS.AUTO_INVALIDATE. to have Oracle decide when to invalidate dependent cursors. This is the default. The default can be changed using the SET_PARAM Procedure. |
The surmised bug is one where even though the dbms_stats being performed by another process is partition level, the invalidation of the cursors is across all partitions in the object, causing them all to be invalidated, requiring them to re-parse the SQL. (There are a number of similar bugs already documented in 10.2.0.4.0 for partition level statistics gathering…)
The feature to allow Oracle to re-parse and take advantage of the newest statistics information in the data dictionary resulted in a poor performance challenge in this instance, as the cursors were invalidated on a process that needed no changes to statistics.
I tested repeatedly against partitions, collecting stats with the no_invalidate set to false or true and even to AUTO to see what would occur and it consistently impacted my cursors against other partitions. I can find not documented bug, but as many know, I’m about to move everything to 11g in short order and expect it would be a waste of time to pursue it to far…
I notified the Java developer who owns this code to please update the no_invalidate=>true to correct the performance impact short term and look forward to 11g bugs to replace my exhaustion on 10g ones!
Oracle Open World 2011 Followup
Oracle Open World is over for me now, but what a great event it was. I met so many people and actually was able to spend a little time getting to know a few of them. I attended parties, dinners and meetups. I networked myself, my company and RMOUG. It was a phenomenal turn out, even with a few folks missing that I knew we’d miss terribly
Arriving- California Zephyr
If you hadn’t heard, we decided to take a train from Denver, CO to San Francisco, CA. via Amtrak, called the California Zephyr. This was a 33 hour train ride, slowly trekking at times, racing traffic at others, through beautiful scenery, all from a second floor sleeper car. We had lovely meals with folks in the dinner lounge car and visited with others on the observation deck, (specialized car with a glass, domed ceiling.
Upon our arrival in San Fran, we quickly rented a car and headed up to a lovely party at Oracle’s one and only, Graham Woods. I quickly found Gwen Shapira, (we’ve been trying to meet for awhile now, so was glad to FINALLY have the opportunity met…) caught up with Alex Gorbachev and teased by Cary Millsap about our challenges with understanding that we needed to PAY for our bottle of wine on the train and that it was not included in the price of the meal, (glad they couldn’t figure out how to pronounce my name over the intercom, but quickly figured out that it was me and my wonderful companion they were looking for- his name is not so difficult to pronounce!)
Debra Lilley arrived soon into the party and many found fun as she brought me from room to room in search of those she wanted to introduce me to. I’m to ensure she has a great birthday at February’s RMOUG, so it’s important she and I bond, ya know…
I ended up with a lovely embarassing situation upon being introduced to “Greg Brown” who I had to ask repeatedly where I knew him from, which he found quite hilarious, considering our emails until it hit me I was speaking to “Greg Rahn”. He was a good sport about my lack of IQ after the long trip and I appreciate his patience.
Sunday, OOW11
The first day consisted of me attending a few of Tim Gorman’s sessions that I, as a VLDB DBA, had lived, but had never really sat through before. I still picked up a few things from my first DBA God and yes, the Gods are good to me. At his second session, Tim pointed out Andy Klock and I to each other, knowing we’d been tweeting back and forth about meeting up, so we sat together and it gave me an opportunity to physically meet one of the major clients I supported while at Pythian. It was a pleasure to speak to her in person and glad to see someone not as indepth in the database world revel in the festivities and presentations of Oracle Open World.
Sunday evening was the ACE dinner and attendance was fantastic, (along with the food!) It was easy to see why no one sat at any one table for very long and I did get to spend a good amount of time speaking with Mark Bobak, Kent Graziano, Doug Burns, James Morle, along with many others.
Monday, OOW11
I didn’t do to well on my schedule builder for Monday or Tuesday and if I go back over the actual schedule of sessions, I’m sure I’ll find a few that I should have gone to. Monday night was the Oak Table dinner which was great fun. We started out meeting up with Mike Swing and Craig Shalahammer for drinks before heading to the dinner. Craig showed me some of his mathamatica graphics for buffers and latch visuals, (yes, his is one of those sessions I obviously missed adding to my schedule!) We spoke about databases a bit, but the conversations regarding life were much more interesting.
Upon heading over to the Oak table dinner, there was a huge line of people waiting for taxis, but Mike Swing, Tim Gorman and I went up to the bellhop instead. I’m not sure if it was Mike or Tim, but one of them asked if there was a better option and we had a personal SUV taking us to our dinner in just a few minutes for a few dollars more than a taxi would have cost, (note to future OOW attenders…screw the lines!
)
As soon as we entered for the dinner, the gracious Carol Dacko ensured us our places and Mogens Norrsgard was busy entertaining everyone. He and I quickly attained a quirky regard for each other and the jokes about Tim, “I saw him first!” as the game of us challenging each other for Tim’s affections commenced.
I was seated next to Jeremy Schneider, who I thoroughly enjoyed conversing with. He’s a brilliant young DBA, so his company, along with Gwen Shapira, Robyn Sands, Tim Gorman, Andy Klock, Rihaj Shamsudeen and Alex Gorbachev guaranteed a lively conversation. This was also my first opportunity to meet up with Yury Velikanov from Pythian. I’ve only worked with him virtually, so this was a great chance considering he resides in Austrailia. He is technically skilled, easy-going and quick to make friends- a great representative for the Pythian name, like Gwen and Andy.
Tuesday, OOW11
Tuesday was the start for most of us feeling the heat from staying up to late and eliminating sleep from our diets. My voice was starting to sound a bit horse at this point, so I’m sure folks were starting to wonder how well I had fought off my cold, (not very well in the colder, wetter weather of San Francisco…) We met up with Ben Boise from Quest Software and spent a bit of time at the Enkitec booth. The Enkitec booth was hands-down, the winner for me. Kerry Osbourne had told me at Graham’s party, after I had finished teasing Frits Hoogland that he’d been given my copy of the Exadata book via Tanel, that if I came by, he’d have a copy for me. Yeah, wasn’t turning that down…
So while there, spoke to Kerry about what a great DBA and all around good guy Karl Arao was. He’s succeeding there and post the converation, asked Kerry and Randy Johnson to sign my copy of the book. I went back a bit later to talk with Karl a bit more and for the fun of it, forced him to sign my copy, too. Tanel had tweeted that he was going to have a secret Exadata hacking session that day, so I headed out with my book and was able to get Tanel’s signature, too… No, none of you can have it…
On our way out of Moscone for the night, spent about 20 minutes speaking with Jonathan Lewis. Dr. Steve Dorsey and a guest joined Tim Gorman, Jonathan and I to complete the conversation about the evenings plans.
We traveled down to The Stinking Rose for a wonderful dinner of wonderful dishes with way too much garlic in them. I truly feared anyone who would come near us post the meal, but it was well worth it, (and apologies to anyone near us afterwards…)
Wednesday, OOW11
The day went quickly and the evening was the blogger meetup, there was a break between networking, dropping off postcards promoting RMOUG 2012, where we were able to head over to Mogen’s office, (i.e. back of Chevy’s restaurant this year…) where everyone had been hanging out throughout each day when needing a break from the chaos. Throughout the day, there was some conversation via Twitter on who was going to get my wristband, as we were bowing out of the concert/chaos that night. First it appeared that Greg Rahn would need it, but I had already pointed him towards Mike Swing who was offering him up one, so Doug Burns was the lucky winner and new owner of my band. The group at Chevy’s, as always was fun to simply observe, let alone partake in conversation with and how can you say no to Mogens?
The meetup was a quick hop and a jump over to Jillian’s where there were a number of private parties going on, but the bloggers meetup by Pythian was the top deal. We were all given a bandana and sharpies to get each other’s signature, which I was a happy blogger to just go around and meet as many as I could. I enjoyed writing “Kellyn was here” and pointing arrows to the Pythian logo on the bandana or as everyone was wearing them on their heads, it had humor all in itself, (note to self, another reason I’m glad no one decided to wear the bandana as a bustier…)
Paul Vallee did a lovely tribute to Steve Jobs in the beginning of the meet up at the high time, Pythian first gave away an Apple TV to the blogger picked who had posts from the dates randomly chosen, (happened to be RMOUG Training Days week, so I didn’t even have to check, I KNEW I had posts out there.. LOL) Yury won the TV and then they gave away an IPod Touch to the person who had received the most signatures. I was sure I was no where near the top, but then got a look at the leader, Tim Hall’s bandana. Upon counting mine up, I was two short of his number and he won, but Tim, the gracious guy he is, handed me the prize. I, confused easily as I am, asked why I was getting it, I came in second and he replied, “I [worked] around to get my signatures and you just got them while meeting everyone, I’m disqualifying myself!” Thank you, Tim Hall, from me and my children, who one in particular has been jonesing for one of these!
Post the meetup, Tim Gorman and I were going to head out for some dinner and Alex Gorbachev joined us at a wonderful Indian restaurant called Amber. Wine and conversation flowed, while the fun and chaos of the Petty/Sting concert went on at Treasure Island. When we did finally finish, it was just in time to meet everyone back over at the night’s bar of choice, “W”. I sat and spoke most of the evening with Martin Paul Nash, Alex G. and Dan Norris, (Mogen’s slept in the corner, those Danes and their catnaps to catch up on jetlag really impresses me!) Folks came in from the concert, Lisa Dobson, Connor McDonald, Andy Klock, Doug Burns and others, little by little. We stayed and talked until my voice had become so hoarse that I was starting to sound a bit like Barry White.
Gotta say, another brilliant, easy-going and friendly DBA, Martin Paul Nash. Between Martin, Andy, Jeremy, Connor and Dan, I’m feeling good about the future of our database administration world.
Thursday, OOW11
Surprise came the next morning when we found out many of the people we left the night before had never actually slept that night. They continued to enjoy the opportunity to see folks that many may only see once a year and had simply stayed up! A few of them were presenting on Thursday, so a lot of attendees may have wondered about that, too…
I attended only one session on Thursday, had slept in too late for the one I’d wanted to attend on optimal performance, (and had to answer to Gwen and others as to why I wasn’t there…
) Maria Colgan was great, (as usual) and she was one of the last folks I really wanted to meet, but had reserved the fact by the group that crowded her immediately after the presentation, that it just wasn’t going to happen. Tim and I went over to Chevy’s to have a last OOW11 lunch with Mogens’ group before heading to the airport and who shows up to have lunch there, too? Yes, Maria Colgan, so I did get to meet her…AND have lunch with her, (along with DBA Gods, Demi-Gods, you know the drill…
)
During all of this, I did a lot of RMOUG networking to ensure that I added as much to the great plans for the 2012 conference that I could. I was thrilled to have so many folks dedicated to coming out to Denver in February to talk, (because the conference is second to Debra Lilley’s birthday, I swear the marketing is there!) Had a lovely conversation about bringing RAC Attack out for training days this year, which I think will be well received. Jeremy Schneider is in Africa the week of the conference, but we are working on others who can really take on this great opportunity for DBA’s to take advantage of.
I wish I could say the plane ride back was relaxing and a wonderful time to reflect on a great Oracle Open World, but as usual, the airlines were busy trying to ruin travel for all of us. I am thrilled with everyone I met while in San Francisco and although I should have attended more sessions, I wouldn’t have changed a thing.
Thank you Oracle, Pythian and all that I met this last week for such a wonderful experience!
OOW11 Dinners
Yes, typing on my tablet screen again, so patience with my short posts…:-)
Had the pleasure of attending both the ACE and Oak table dinners the last two nights. Wonderful, impressive and technically gifted people at every table and a fantastic opportunity to meet so many that I’ve only known virtually. I enjoyed another set of high energy conversations with Gwen Shapira, Debra Lilley, Robyn Sands and Lisa Dobsen- all women who make me proud of the representatives of my gender in the technical world.
Spent sometime with Mark Bobak, Kent G., Alex G., Craig S., Jeremy Schneider, Mike Swing and Yuri Y. I was also so thrilled to spend time with a virtual team member, Andy Klock- great guy to work with and happy to meet in person. Carol Dacko did a phenomenal job planning the Oak Table event and I know Robyn Sands helped with some of the arrangements, too. Mogens N. Is beyond entertaining and his legend is intact another year. I threatened to stalk Tanel unless he signed my Exadata book, (thank you, thank you Kerry Osbourne for the copy…) and still am missing mentioning many others that should be named here. All made an impact and were a pleasure to meet.
Food and spirits pale in comparison to the wonderful opportunity these dinners offer us all to sit and speak with the peers we admire so much…
First Day at Oracle Open World
After a long first day and a 33 hr train ride in, I’m ready for a fun and enjoyable Ace dinner that I’m a lucky guest to.
First day, being Sunday, is commonly quiet, but Tim had two sessions that I wanted to attend and time flew by with meetups, expert panels, etc. I met a number of great new people today and saw some more old friends, (or as “new blood”, can I refer to them as old?)
I was still recovering after a fantastic get-together at Graham Woods from lastnight. I was impressed with the great group of people inattendance. DBA Gods like Tim Gorman, Alex Gorbachev, Cary Millsap, Kerry Osbourne and even demi-Gods like Frits Hoogland and Greg Rahn were there. I met,(finally) so many others that have been on my list.
Locating UNKNOWN SQL_ID Info in OEM Through AWR
Rarely are reports based off large snapshot variances helpful to a DBA unless you come across an odd situation such as this one…Better yet, we need to know a little bit about our AWR tables behind our reports so we can piece together what the reports leave out…:)
Scenario: After-hours support has killed a session after high temp usage has occurred. You, as the primary DBA, are left to look into the issue the next day.
Your first attempt to inspect the issue is through Enterprise Manager, (OEM) and you are surprised that very little activity is actually showing up that
is of concern. You can see pink areas, something that would flag you, but they aren’t showing up in the active sessions in the left- only showing in the right.
What’s going on here?

Now, taking the right side into consideration, you do see four sessions, easily assumed to be parallel slaves, processing away.
When drilling down into the left side SQL_ID sessions, you not that none of the top right side sessions are showing up for the SQL_ID’s in question.
When drilling down into the right side four sessions, SID’s, all of them show no SQL_ID’s connected and are listed as “UNKNOWN”.
How as a DBA, do you locate this data? Your first quest will be to run an ASH report, but it will only show you the problem waits, type of waits, not the SQL that was the
cause of the issue. How do you find the SQL/SQL_ID for the SID’s?
ASH Report for the snapshot in question:
DB Id DB Name Inst Num Instance ----------- ------------ -------- ------------ xxxxxxxxx DBMART 1 mart1
Specify the Report Type ~~~~~~~~~~~~~~~~~~~~~~~ Enter 'html' for an HTML report, or 'text' for plain text Defaults to 'html' Enter value for report_type: text
Type Specified: text
Defaults to current database
Using database id: xxxxxxxxxxxxxxx
Defaults to current instance
Using instance number: 1
ASH Samples in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oldest ASH sample available: 12-Sep-11 12:20:34 [ 24306 mins in the past] Latest ASH sample available: 29-Sep-11 09:26:08 [ 0 mins in the past]
Specify the timeframe to generate the ASH report ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter begin time for report:
-- Valid input formats: -- To specify absolute begin time: -- [MM/DD[/YY]] HH24:MI[:SS] -- Examples: 02/23/03 14:30:15 -- 02/23 14:30:15 -- 14:30:15 -- 14:30 -- To specify relative begin time: (start with '-' sign) -- -[HH24:]MI -- Examples: -1:15 (SYSDATE - 1 Hr 15 Mins) -- -25 (SYSDATE - 25 Mins)
Defaults to -15 mins Enter value for begin_time: 09/27/11 16:00:00 Report begin time specified: 09/27/11 16:00:00
Enter duration in minutes starting from begin time: Defaults to SYSDATE - begin_time Press Enter to analyze till current time Enter value for duration: 240
Enter a name for the report or a default will be used, (I prefer a naming convention of: <report type, (ash, awr, addm)>_<sid>_<snapshot>.<txt/html>
View the ASH report for the time period, going to the top events first:
Event % Event P1 Value, P2 Value, P3 Value % Activity ------------------------------ ------- ----------------------------- ---------- Parameter 1 Parameter 2 Parameter 3 -------------------------- -------------------------- -------------------------- PX Deq Credit: send blkd 58.37 "268501009","3","0" 1.97 sleeptime/senderid passes qref
"268501010","3","0" 1.97
"268501010","4","0" 1.97
There’s our issue!
Now using this info, we know we are looking at a parallel, (PX) issue that we can then take to the next
step to identify the process and see if it is connected to one of our SID’s we noted in the OEM console:
Sid, Serial# % Activity Event % Event --------------- ---------- ------------------------------ ---------- User Program # Samples Active XIDs -------------------- ------------------------------ ------------------ -------- 2170, 3249 16.27 PX Deq Credit: send blkd 14.75 DM_USER oracle@host1 (P020) 11K/14K [ 74%] 0
CPU + Wait for CPU 1.46 1,059/14K [ 7%] 0
2138, 585 16.13 PX Deq Credit: send blkd 14.55 DM_USER oracle@host1 (P022) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.53 1,107/14K [ 8%] 0
2130, 581 16.11 PX Deq Credit: send blkd 14.53 DM_USER oracle@host1 (P023) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.54 1,113/14K [ 8%] 0
2129, 533 16.03 PX Deq Credit: send blkd 14.54 DM_USER oracle@host1 (P021) 11K/14K [ 73%] 0
CPU + Wait for CPU 1.41 1,024/14K [ 7%] 0
Note that is also reports back the same four sids, all showing as “UNKNOWN” (2170, 2129,2130 and 2138) as we see in the OEM console..)
It just doesn’t have any SQL or SQL_ID’s to connect with them.
The coordinating SID is caught in the Top sessions running PQ’s, so we will see this as our identifier, even if it wasn’t seen in the OEM console:
Sid,Srl# (Inst) % Activity SQL ID Event % Event --------------- ---------- ------------- ----------------------------- -------- User Program -------------------- ------------------------------ 2113,UNKNOWN(1) 80.89 PX Deq Credit: send blkd 58.37
This is very important, as we need this to track further, outside of the report.
Using the coordinator, then query the DBA_HIST_ACTIVE_SESS_HISTORY from during the snapshots to capture the main SQL_ID:
select * from sys.DBA_HIST_ACTIVE_SESS_HISTORY where session_id=2113 and snap_id between 717 and 727;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# USER_ID SQL_ID 27-SEP-11 10.33.39.946 PM 2113 143 255 5dpcst074nk81 27-SEP-11 10.33.49.976 PM 2113 143 255 5dpcst074nk81
You can then pull the data that is connected to the parent that will tell you what session experienced the problem:
select * from sys.WRH$_SQLTEXT where sql_id='5dpcst074nk81';
SNAP_ID DBID SQL_ID SQL_TEXT COMMAND_TYPE REF_COUNT 760 2778270765 5dpcst074nk81 <CLOB> 1 0
the clob then shows the SQL involved in the executed session, which then tells you what your pain is on:
create table dm_user.tbl_iot ( col1 , col2 , col3 , col4.... constraint tbl_pk primary key (col1)) organization index.... <--Here's our key, it's an IOT, which includes the PK!
tablespace mart_data1 pctfree 0 nologging parallel 4 as....
We can now find the secondary SQL_ID that correponds to most of the combined hit to the primary SQL that will show in the following statement:
select distinct SQL_ID, optimizer_cost from dba_hist_sqlstat where snap_id between 717 and 727 and plsexec_time_delta=0 order by optimizer_cost desc;
SQL_ID OPTIMIZER_COST
fjbbxnyc66t7c 256350
5dpcst074nk81 253906 <--Here was the one shown, it will be the one above!
1xg7tkc156a1m 5477 24hc2470c87up 5302 dyd4b36t1ppph 4865 gfjvxb25b773h 2331 3wy90uysqgcfx 1974 a55tay7577psc 1922 00a15fn17bx7p 1912 b92jqqxwt3tfd 1725 cf621qmts91wf 1028 5atpa8vj2gakz 1007 7qskskbx91q7t 992
Now run an awrsqrpt.sql to see the hit for the SQL_ID in question:
If you have never run one of these, it’s a great report for unique info on a SQL Report for a given run. For this one, we just want to see WHAT was running:
You will need your snapshot ID’s and the SQL_ID to create the report and the report resides in $ORACLE_HOME/rdbms/admin. Yes, I prefer the text version of the HTML, but to each their own…
WORKLOAD REPOSITORY SQL Report
Snapshot Period Summary
DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ DBMART xxxxxxxxx mart1 1 10.2.0.4.0 NO host1
Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 709 27-Sep-11 14:30:31 57 6.3 End Snap: 727 27-Sep-11 23:30:53 37 5.1 Elapsed: 540.37 (mins) DB Time: 2,726.16 (mins)
SQL Summary DB/Inst: MARTF/martf Snaps: 709-727
Elapsed
SQL Id Time (ms)
------------- ----------
fjbbxnyc66t7c ##########
CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("IBID") INDEX
ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NOLOGGING parallel 4 as select
col1 , col2 , col3 , col4 , col5 , col6
-------------------------------------------------------------
SQL ID: fjbbxnyc66t7c DB/Inst: MARTF/martf Snaps: 709-727
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"("COL...
Plan Hash Total Elapsed 1st Capture Last Capture # Value Time(ms) Executions Snap ID Snap ID --- ---------------- ---------------- ------------- ------------- -------------- 1 3019078213 115,239,821 1 710 726 -------------------------------------------------------------
Plan 1(PHV: 3019078213) -----------------------
Plan Statistics DB/Inst: DBMART/mart1 Snaps: 709-727 -> % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap ---------------------------------------- ---------- -------------- ------- Elapsed Time (ms) ########## 115,239,820.7 70.5 CPU Time (ms) ########## 23,668,620.1 45.7 Executions 1 N/A N/A Buffer Gets ########## 940,627,725.0 70.9 Disk Reads 7,761,573 7,761,573.0 19.4 Parse Calls 9 9.0 0.0 Rows 0 0.0 N/A User I/O Wait Time (ms) 1,949,762 N/A N/A Cluster Wait Time (ms) 0 N/A N/A Application Wait Time (ms) 0 N/A N/A Concurrency Wait Time (ms) 301,390 N/A N/A Invalidations 0 N/A N/A Version Count 17 N/A N/A Sharable Mem(KB) 9,285 N/A N/A -------------------------------------------------------------
-------------------------------------------------------------
Execution Plan ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------ | 0 | CREATE INDEX STATEMENT | | | | 256K| | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10001 | 637M| 229G| | | | Q1,01 | P->S | QC (ORDER) | | 3 | INDEX BUILD UNIQUE | PRIM_PK | | | | | | Q1,01 | PCWP | | | 4 | SORT CREATE INDEX | | 637M| 229G| | | | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 637M| 229G| 136K| | | Q1,01 | PCWP | | | 6 | PX SEND RANGE | :TQ10000 | 637M| 229G| 136K| | | Q1,00 | P->P | RANGE | | 7 | PX BLOCK ITERATOR | | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWC | | | 8 | TABLE ACCESS FULL| PRIM_TBL | 637M| 229G| 136K| 1 | 4 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------
Note ----- - cpu costing is off (consider enabling it) - estimated index size: 257G bytes
Full SQL Text
SQL ID SQL Text
------------ -----------------------------------------------------------------
fjbbxnyc66t7 CREATE UNIQUE INDEX "DM_USER"."TBL_PK" on "DM_USER"."TBL_IOT"
"("COL1") INDEX ONLY TOPLEVEL TABLESPACE "MART_DATA1" PCTFREE 0 NO
LOGGING parallel 4 as select ...<want to see the full SQL statement, yes
this report will show you the output...>
We now can see that our “offender” was the primary key creation on the IOT create statement.
Now you have the Information on the “Unknown” offender that was showing in OEM. Nothing can hide from a DBA for long!





