Jan 16 2012

Getting the Most Out of Enterprise Manager and Notifications

Category: OracleKellyn Pot'Vin @ 7:20 pm

I ran out of time before I was able to provide an adequate white paper this year for my EM12c presentation, but there was some valuable info in what I had started, so thought I’d turn it into a mulit-part blog post…

The Oracle Enterprise Manager, (OEM) is the standard monitoring tool for Enterprise Edition Oracle databases.  The interface allows the DBA to manage the entire Oracle stack using a single console.  The installation and interface is easy for most DBA’s to implement and utilize.  In the newest EM12c version, it encompasses integrated systems management, application management, application-to-disk and cloud management , the following documentation will include some 10g but mostly, the EM12c version of the product.

The goal for any DBA is to be notified of an issue and only notified when there is an actual issue.  One of the most common downfalls of a monitored environment is the misconception that receiving emails upon success or checks stating that a process is running, is correctly configured monitoring.  This produces an environment that leads to a “white noise” effect, where DBA’s may misinterpret a notification as one of the success notifications when a real issue has actually arisen.

The optimal design is one where redundancy checks of the monitoring system is included to ensure that if there is an issue with the monitoring environment that deters it from monitoring and sending alerts, the system has a redundant check on a secondary server that is notifying the DBA on call of the issue.   Multiple Oracle Management Server Repositories , residing on separate servers can address this, but in my opinion, would be overkill when simple additional scripts run from a cron would suffice.

OEM Basics

The Oracle Enterprise Manager, 10g and EM12c comprises of the following, basic components:

  • The Oracle Management Server/Service, (OMS).
  • The OMS repository database.
  • The OMS Home, aka the EM state directory, which contains the bin files, log files, collection files and configuration files.
  • The Agent installation, application and configuration on each monitored host server.

EM12c has the additional weblogic components included automatically, along with the Cloud support features which can be installed.

Licensing

As long as the OMS is on its own server and is only used for the OMS repository and/or an RMAN backup catalog repository, individual oracle licensing IS NOT required for the Oracle database utilized for the repositories, (Please see pg. 15 of the following PDF from Oracle.)

http://download.oracle.com/docs/cd/B19306_01/license.102/b40010.pdf

Monitoring the OEM from a secondary server:

This can be performed easily from a shell script and allows the DBA(s) to rest easy, knowing that the interface to their database environment, if impacted, will notify them from a secondary server.  This allows for redundant checks without sending an “I’m OK” notification to grant comfort:

 

       #!/usr/bin/ksh
       #----------------------------------------------------------------------------
       # Author:   Kellyn Pot’Vin
       # Redundancy Check to OEM Server to ensure EM is up and Running!
       # Verify that all parameters are set in the remote host env. vars...
       #----------------------------------------------------------------------------
       if (( $# != 2 ))
       then
           echo "usage: $0 SID hostname"
           exit 1
      fi
      #
      #----------------------------------------------------------------------------
      # Set up Oracle environment...
      #----------------------------------------------------------------------------
      export ORACLE_SID=$1
      export who_to_ping=$2
      echo "Oracle SID: "${ORACLE_SID
      export AVL_LOG=${LOG_DIR}/oem_avl.log
      export AVL_ERR=${LOG_DIR}/oem_avl.err
      export AVL_PNG_ERR=${LOG_DIR}/ping_avl.err
      #Check Repository DB for Access
      $ORACLE_HOME/bin/sqlplus oem_chk/"${pass}"@${ORACLE_SID} <<EOF
      spool ${AVL_LOG};
      select sum(1+1) from dual@grid_chk;
      spool off;
      exit;
      EOF
      cat ${AVL_LOG} | grep "ORA-" > ${AVL_ERR}
      if [ -s ${AVL_ERR} ]
      then
       echo|mail -s "No Response from Grid Control from Oracle Management Server!" "<EML_Address>"  < ${AVL_LOG}
      fi
#Check to verify that EM12C is up!  This requires SSH authentication from remote server.
ssh oracle/n0c1u3ata11@
"$OMS_HOME/bin/emctl status oms" | grep Down > ${EM_LOG}

if [ -s ${EM_LOG} ]
then
 echo|mail -s "No Response from EM12C Grid Control!" "<EML_ADDRESS>"  < ${EM_LOG}
exit
fi
      #Check Grid Server, ensure that you can ping it as well
      date
      ping -c 3 ${who_to_ping}
      if [ $? -ne 0 ]
      then
          sleep 5
          ping -c 3 ${who_to_ping}
          if [ $? -ne 0 ]
          then
              echo "`hostname` CANNOT PING ${who_to_ping} the EM Server!" > /tmp/ping.$$
       echo|mail -s "`hostname` CANNOT PING ${who_to_ping} from Oracle Managent Server!" "<EML_Address>"
       rm -f /tmp/ping.$$
           fi
      fi
      rm -f ${AVL_LOG}
      rm -f ${AVL_ERR}
exit

 

Pretty simple to schedule in cron:

0,15,30,45 * * * * /home/oracle/scripts/admin/chk_grid.ksh <dbname> <servername> > /dev/null  2>&1

I’ve chosen a 15 minute interval on the checks, but this can be done with any interval as requirements are set.

 

Escalation

Due to Sarbanes-Oxley and/or outside support contracts, an enhanced escalation process may be required.  One that can offer more choices and escalation paths then what is currently offered in the 10g and EM12c console.  A simple package/support object implementation can be created to support this type of requirement that works with OEM.  The code presented here will allow one to set the on-call DBA, scheduler and escalation outside of the OEM interface, but will all OEM alerts and escalation from the OMS will utilize the data found in the supporting tables.

I will try to upload and post the supporting schema and code soon on dbakevlar.com

 

 

Blacking out DB from Agent Side with Shell Scripts:

Blackouts can be performed via a shell script to assist in automated processes that could trigger OEM alerts, sending false notifications when a blackout script is all that is required for Unix Admin or Application support personnel.

#!/usr/local/bin/ksh
# #######################################################
# start_blackout.ksh
# Usage ./start_blackout.ksh <oracle_sid>
# Rewrite Date: 4/22/2011
# Modified by:  reckl
#########################################################
usage="$0 <db_name>"
if (($# != 1))
then
    print $usage
    exit 1
fi
ORACLE_SID=$1
sudo su - oracle -c "$AGENT_HOME/bin/emctl start blackout ${ORACLE_SID}_blackout ${ORACLE_SID}"
exit

Patching

I am a supporter of patch deployments through OEM.  If you have not configured this or are working to get this feature approved in your database environments, I highly recommend it.  In the “Deployments” tab of the EM console, first ensure that the MOS credentials is configured:

Once this has been set up for your environment, you can then designate a patching strategy to deploy to development, test and then production with a full testing cycle that will make any DBA stop quaking in their boots when they receive the notification that new patches have arrived from Oracle Support.

The Deployment Procedure Manager allows the DBA group to schedule deployments of necessary patching with the most effective schedule and little DBA involvement required.

 

The DBA can then set up patching resource allocation and requirements from the “Offline Patching” UI and choose what to install for automatically patching:

To be continued in next post….

 

 


Jan 04 2012

RMOUG 2012!

Category: DBA Life,OracleKellyn Pot'Vin @ 3:01 pm

As busy as I am with the 11g project, (no real weekends off for 9 weekends and counting… :P )  I wanted to take some time out to write on the upcoming RMOUG Training Days 2012.

For those of you Oracle techies, DBA or Developer, this is a must attend and for any who do choose to travel and attend- kudos to you, good choice.    As one of the directors on the RMOUG board, I can attest to the incredible amount of time and resources that have been invested into what is easily, the best grass-roots, Oracle conference around.  John Jeunette, the Training Days Director for the 2012 event has done a bang-up job with planning and none of us could get along without the continued support from those, such as Peggy King and Team YCC.

Oracle folks who do attend are going to be treated to a keynote from one of my favorite DBA Gods, Cary Millsap, along with presentations from some of the greats in the DBA world, including Jonathan Lewis, Debra Lilley, Mark Farnham, John King, Alex Gorbachev, Guy Harrison, Dan Morgan,  Marco Gralike, James Morle and Graham Woods.  We also can’t forget the local favorites, like Tim Gorman and Randy Cunningham.

Upon quick count, I realized we have 14 Oracle ACE’s and 14 ACE Directors speaking this year.  With all these ACE folks, we’ve decided to create a special event with them, something to really find out what it means to be an Oracle ACE.  Stay tuned, it’s shaping up to be an awesome session.

We also have the benefit for those interested in Oracle RAC of having “RAC Attack”, a great workshop, first offered at Oracle Open World and UKOUG, now also offered at RMOUG Training Days for 2012!  This is a great opportunity to get your “RAC on” and learn from some of the best on how to properly build a RAC environment and when.  Pythian and Apress will be sharing in the sponsorship of this great workshop at the RMOUG training days event. Show up with a laptop that meets the requirements for the workshop and build your own, how great is that?

That’s a pretty impressive count for a two day Oracle conference when you think of it!  I’m thrilled with the quality on the content of the presentations this year and how the event is coming together.  We had a record year for abstract submissions and it was a difficult decision deciding who would be in the schedule, so many great, solid abstracts submitted!

So if you are interested in attending, here’s the link to RMOUG’s Training Days Event.  You can view the current schedule, registration and biographies for the speakers.  There’s only a short time left to take advantage of the advance registration rate, saving even more if you become a member, (that’s me as the membership director just selling it a bit! :) )

http://www.teamycc.com/RMOUG_2012_Conference/Registration.html


Dec 21 2011

Presenting at KScope 2012!

Category: DBA LifeKellyn Pot'Vin @ 12:34 pm


Dec 20 2011

It’s an RMOUG Christmas!

Category: DBA LifeKellyn Pot'Vin @ 10:25 am


Dec 06 2011

Solid Choices for Oracle Tuning on Solid State Disk

Category: OracleKellyn Pot'Vin @ 1:33 pm

As I continue to work on very large databases, (VLDB), I am exposed to more  opportunities to speed up IO.  This can involve Oracle’s solution of Exadata or stand alone improvements with options such as SSD, (Solid State Disk) which can offer faster IO performance at a fraction of the price.  When this option becomes a reality, there will always be non-DBA’s that advise what would best benefit from the hardware, but to take the time to research what would truly benefit is important for the DBA to perform.

The Just the Facts on Solid State Disk:

There are several types of SSD available:

  • Flash memory-based
  • DRAM-based
  • Cache or Buffer

The SSD can have different types of host interfaces, depending on the main hardware you are interfacing with and/or vendor choices:

  • PCI
  • Fibre Channel
  • ATA, (Serial or Parallel)
  • SCSI, (Serial or Parallel)
  • USB

Rarely do we get a chance to move entire Terabytes of data onto fast disk, but rather are offered limited, faster disk to utilize for crucial objects that can give us the “most bang for the buck.”   Commonly this is due to the price of these specialized and impressive IO read/write drives, but it can also be due to limitations on the hardware they are interfacing with.

As I started working on databases that utilized faster disk, with or without ASM, it became apparent that what these speedy disks were allocated to wasn’t always what SHOULD have been placed in the new location.  Where indexes, look up tables and temp tablespace experienced impressive gains vs. the standard disk drives they had formerly resided on, I have been quick to dissuade anyone from placing redo logs on SSD.

I’m going to go through what data, reports and queries that I utilize to decide what should be on fast disk, along with my benchmark findings when I did have the opportunity to create an entire database on Fusion Octal fast disk.

Getting the most out of SSD is all about getting what won’t fit in memory, (SGA and PGA) onto a faster disk.  All consistently large, [consistent] read tasks that the database must direct to disk for,  but doesn’t write as often to disk, (visualizing batch loads vs. heavy transactional) and ONCE TUNING OPPORTUNITIES HAVE BEEN EXHAUSTED, are excellent choices for research when deciding what should be placed on SSD. This information can be achieved multiple ways as a DBA.  AWR/ADDM and ASH reports can provide solid, high level data to direct you in the right direction if you are not as familiar with your data or wish to validate some of what you already know.  For those of you that do not have the tuning pack license, then Statspack can do the same.  Tracing can offer a detailed output that will tell you about objects that you are often going to slower disk for.  OEM can provide graphs that will show IO demands on a heavily “weighted” system, as can other GUI tools in the market.

 

AWR/Statspack and I/O Wait Indicators

Your group has already decided that IO is an issue and should have verified this in the top 5 wait events that can be seen through AWR or statspack.  The snapshots utilized for this examination should be times of heavy IO in the database environment as can be seen in the example Table 1.

 

Table 1

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                     wait   Call
Event Waits Time (s) (ms) Time Wait Class
—————————— ———— ———– —— —— ———-
db file sequential read

979,382

36,066

37

45.1

User I/O
db file scattered read

5,083,058

22,401

4

28

User I/O
Direct path write temp

13,577

17

User I/O
db file parallel write

464,287

5,136

11

6.4

System I/O
direct path read temp

366,956

2,671

7

3.3

User I/O

From here, we inspect our AWR or statspack reports, there is a section that should be inspected first and foremost, referred to as Segments by Physical Reads the output from this section can be seen in Table 2.

Table 2

Tablespace Obj. Physical
Owner Name Object Name Type Reads %Total
———- ———- ——————– —– ———— ——-
SCHM_OWNR TBLSPC1_DATA TBL1_FILE_1 TABLE

86,788,592

47.87

SCHM_OWNR TBLSPC2_DATA TBL1_FILE_PK INDEX

80,544,192

46.59

SCHM_OWNR TBLSPC1_IDX TBL2_MR_PK INDEX

74,742,752

45.39

SCHM_OWNR TBLSPC1_IDX TBL3_M_PK INDEX

40,924,576

28.43

SCHM_OWNR TBLSPC2_DATA TBL4 TABLE

26,790,464

15.52

Tuning, Always the First Step

The first step in the process is to inspect I/O issues with large objects. Is there a partitioning strategy that can take the physical reads and IO down for the objects in question? If there is not or there is still a requirement for full scans or large index or partition scans, then you need to look and see what tuning options there are for the code involved.  If there is already partitioning in place, is it the right partitioning key and/or is sub-partitioning in order.

Once this process has completed, then inspect performance for physical reads again and verify the objects in question are still a bottleneck for IO.  If so, then they may be a valid choice to relocate to a new ASM diskgroup residing on SSD.

Creating a specific ASM disk group for the SSD disk is the obvious choice, as the SSD will not be part of the standard disk groups without performance and rebalance challenges.  Once complete, you will then have the new SSD diskgroup available for use.

Inspect the sizes of the objects in left in your “top 5 physical IO objects” and decide what you move over for initial testing.  I commonly make a copy and test a copy of the objects against the code to test true performance gains, ensuring that there are no required physical storage required changes as well.  ***over what you need for capacity growth estimates.  What should you bring over next?  Now if we are still using the same reports that are showing above, I would look carefully at what I have available and would start to inspect temp usage as a possible next candidate.

It is important that if you consider temp, that it is in a “controlled” state for your environment.  It is not uncommon for many DBA’s to set TEMP to autoextend and not pay attention to temp tablespace usage.  I fully advocate the opposite and track temp usage, along with monitor alerts with scripts for anytime any user or process consumes a certain threshold per process on any of my production systems.

Considering the amount of waits on temp read and writes, tuning opportunities may be boundless on hash joins and sorting.  Low hanging fruit in these categories will involve looking for “order by’s” that have been left in for insert statements, (not sure how often I’ve seen this, but it’s a very common and an unfortunate occurrence…)  In regards to hash joins, there can be examples of wide reporting tables only one or two columns are actually required for the results and the join.  A choice of CTAS, (create table as select) of only the columns required for the process, dropping post the join to the second table, can drastically trim time and temp usage for a hash of tables that involve only a few columns on a wide table where an index is a less than efficient answer.  This choice allows the performance gain of the hash without the performance hit of swapping to temp when wide tables cause PGA to never be enough.

After tuning temp usage due to large hash joins and sorting outside of PGA, inspect the max temp tablespace required.  If this will now fit without impacting capacity planning requirements for the SSD, move the temp tablespace onto the SSD ASM disk group.

Scripts to Inspect IO Usage

There are many scripts that can be written or available on the web and in reports to inspect IO usage.  The following is a good example of one:

select
io.cnt Count,
io.event Event,
substr(io.obj,1,20) Object_Name,
io.p1 P1_Value,
f.tablespace_name Tablespace_Name
from
(
select
count(*) cnt,
round(count(*)/(60*60),2) aas,
substr(event,0,15) event,
nvl(o.object_name,decode(CURRENT_OBJ#,-1,0,CURRENT_OBJ#)) obj,
ash.p1,
o.object_type otype
from v$active_session_history ash,
all_objects o
where ( event like 'db file s%' or event like 'direct%' )
and o.object_id (+)= ash.CURRENT_OBJ#
and sample_time > sysdate - 7/(60*24)
group by
substr(event,0,15) ,
CURRENT_OBJ#, o.object_name ,
o.object_type ,
ash.p1
) io,
dba_data_files f
where
f.file_id = io.p1
and f.tablespace_name not like '%RAM%' –-exclude SSD objects
Order by io.cnt desc
/

 

COUNT EVENT OBJECT_NAME P1_Value TABLESPACE_NAME
122 db file sequent TBL1_CHAIN 102 N_DATA
33 db file sequent HH_TBL1_FDX01 161 H_INDX1
28 db file sequent CA_TBL2_PK 270 C_INDX
25 db file sequent I_TBL3_IDX02 225 I_INDX2
21 db file sequent E_TBL4 43 E_DATA
20 direct path rea I_MRG_TBL 75 M_DATA
23 db file scatter C_TBL3 50 C_DATA

 

The above script gives you clear examples of what objects you should point your research to, first indexes, (sequential) and in this case, a look up table, (direct path read).

Building a Database Entirely on SSD

We were given this opportunity recently to test performance gains and decide if budget should be set aside for investing in the hardware to build entire databases on SSD vs. strategic objects within a database.  We have a process that takes approximately five days to aggregate a snapshot in time, up to 12TB of data.  The goal was to see, could we accomplish this in two days if given all SSD for the database vs. a combination of standard disks on a disk array and SSD for high read/write data.

This sounds like a slam dunk, but it is more challenging than one might think.  There are small things to that have to be updated in the database, such as system statistics in 10g to ensure the database knows fully the gift you have granted it, but then you may also need to make significant logical changes to take advantage of the hardware due to limitations in CPU and memory per process.  The build was on a server that utilized hyper-threading and some of the “performance settings” actually appeared to work against the database vs. the lesser setting that might stripe the CPU usage more efficiently.  The graph below show the hits against the first 32 of “hyper-threaded” 64 CPU’s:

Figure 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This graph in Figure 1 only shows CPU usage over a small snapshot of time, but over long intervals, it showed the same differing data vs. SAR or other reports from the Admin side-  the database continued to hit the same CPU’s over and over, leaving other CPU’s untouched for extended periods of time.  This did not bode well for the database performance, high read/write capability or not.

 

The build time improvements were impressive, but the one thing that must be included is that the improvement in performance was not just a hardware improvement step.  There was first the additional hardware and then a tuning process at the database level to ensure the processes were able to achieve the best performance the solid state disk offered it, (comparison of columns New Run Time against the Final Run Time in Figure 2.)

Process Step Standard Disk/SSD Total Min. SSD Without Tuning New Run Time Initial Perf. Gain SSD With Tuning Final Run Time Total Perf. Gain
DIM Table 1 CTAS 4 HOURS 20 MINUTES 27 SECS 260 min 3 HOURS 38 MINUTES 24 SECS 218 min

19%

2 HOURS 43 MINUTES 45 SECS 164 min

58%

CTAS Table 2 4 HOURS 23 MINUTES 11 SECS 263 min 0 HOURS 16 MINUTES 2 SECS 16 min 16 Times Perf 0 HOURS 19 MINUTES 8 SECS No Tuning N/A
CTAS Table 3 1 HOURS 29 MINUTES 21 SECS 89 min 0 HOURS 44 MINUTES 27 SECS 44 min 2 Times Perf. 0 HOURS 57 MINUTES 19 SECS No Tuning N/A
CTAS Table 4 2 HOURS 55 MINUTES 58 SECS 175 min 0 HOURS 42 MINUTES 16 SECS 42 min 4 Times Perf. 0 HOURS 47 MINUTES 55 SECS No Tuning N/A
CTAS Table 5 10 HOURS 7 MINUTES 41 SECS 607 min 1 HOURS 50 MINUTES 7 SECS 110 min 6 Times Perf. 1 HOURS 42 MINUTES 6 SECS No Tuning N/A
CTAS Table 6 11 HOURS 32 MINUTES 40 SECS 692 min 4 HOURS 51 MINUTES 17 SECS 291 min 2 Times Perf. 5 HOURS 9 MINUTES 26 SECS No Tuning N/A
Multiple Table Aggregation 25 HOURS 15 MINUTES 3 SECS 1515 min 9 HOURS 58 MINUTES 1 SECS 598 min 3 Times Perf. 5 HOURS 16 MINUTES 31 SECS 316 min 5 Times Perf.
Summary Table 1 Agg. 25 HOURS 24 MINUTES 35 SECS 1524 min 10 HOURS 0 MINUTES 20 SECS 600 min 3 Times Perf. 5 HOURS 18 MINUTES 14 SECS 318 min 5 Times Perf.
Summary Table 2 Agg. 25 HOURS 23 MINUTES 56 SECS 1523 min 10 HOURS 7 MINUTES 22 SECS 607 min 3 Times Perf. 5 HOURS 25 MINUTES 54 SECS 325 min 5 Times Perf.
Index Creation Table 1 1 HOURS 16 MINUTES 33 SECS 76 min 0 HOURS 53 MINUTES 42 SECS 54 min

39%

0 HOURS 53 MINUTES 14 SECS No Tuning N/A
Index Creation Table 2 1 HOURS 22 MINUTES 55 SECS 82 min 0 HOURS 59 MINUTES 55 SECS 60 min

28%

0 HOURS 59 MINUTES 6 SECS No Tuning N/A
CTAS Aggr Table 3 6 HOURS 36 MINUTES 20 SECS 396 min 3 HOURS 21 MINUTES 18 SECS 201 min

50%

3 HOURS 13 MINUTES 38 SECS No Tuning N/A
Index Creation Table 3 0 HOURS 52 MINUTES 2 SECS 52 min 0 HOURS 40 MINUTES 3 SECS 40 min

24%

0 HOURS 48 MINUTES 15 SECS No Tuning N/A
CTAS Aggr. Table 4 2 HOURS 41 MINUTES 13 SECS 161 min 1 HOURS 32 MINUTES 8 SECS 92 min

43%

1 HOURS 28 MINUTES 25 SECS No Tuning N/A
CTAS Aggr Table 5 3 HOURS 46 MINUTES 59 SECS 226 min 2 HOURS 58 MINUTES 29 SECS 179 min

21%

2 HOURS 55 MINUTES 20 SECS No Tuning N/A
CTAS Aggr. Table 6 0 HOURS 51 MINUTES 27 SECS 51 min 0 HOURS 36 MINUTES 46 SECS 37 min

28%

0 HOURS 34 MINUTES 33 SECS No Tuning N/A
Insert to Table 6 0 HOURS 5 MINUTES 24 SECS 5 min 0 HOURS 5 MINUTES 6 SECS 5 min NONE 0 HOURS 4 MINUTES 52 SECS 5 min NONE
Update to Table 6 26 HOURS 40 MINUTES 41 SECS 1640 min 25 HOURS 9 MINUTES 52 SECS 1510 min

8%

17 HOURS 44 MINUTES 2 SECS 1084 min

44%

CTAS Table 7 1 HOURS 1 MINUTES 48 SECS 61 min 0 HOURS 7 MINUTES 43 SECS 8 min 13 Times Perf. 0 HOURS 6 MINUTES 37 SECS No Tuning N/A
CTAS Aggr Table 8 0 HOURS 28 MINUTES 31 SECS 28 min 0 HOURS 22 MINUTES 12 SECS 22 min

22%

0 HOURS 19 MINUTES 25 SECS No Tuning N/A
CTAS Mod TBLS 9/10 1 HOURS 42 MINUTES 36 SECS 102 min 1 HOURS 42 MINUTES 22 SECS 102 min NONE 1 HOURS 39 MINUTES 25 SECS No Tuning N/A
CTAS Table Aggr. 11 2 HOURS 26 MINUTES 58 SECS 147 min 1 HOURS 29 MINUTES 53 SECS 90 min

49%

1 HOURS 24 MINUTES 42 SECS No Tuning N/A
CTAS Aggr. Table 12 7 HOURS 24 MINUTES 44 SECS 445 min 6 HOURS 7 MINUTES 48 SECS 368 min

18%

6 HOURS 6 MINUTES 40 SECS No Tuning N/A
CTAS Aggr. Table 13 6 HOURS 47 MINUTES 31 SECS 408 min 4 HOURS 38 MINUTES 1 SECS 278 min

32%

5 HOURS 5 MINUTES 32 SECS No Tuning N/A
CTAS Aggr. Table 14 25 HOURS 23 MINUTES 32 SECS 1524 min 10 HOURS 9 MINUTES 51 SECS 610 min 3 Times Perf. 5 HOURS 27 MINUTES 17 SECS 327 min 5 Times Perf.
CTAS Aggr. Table 15 1 HOURS 21 MINUTES 59 SECS 82 min 0 HOURS 22 MINUTES 49 SECS 23 min

65%

0 HOURS 4 MINUTES 33 SECS 4 min 20 Times Perf.
Update to Table 13 0 HOURS 12 MINUTES 45 SECS 13 min 0 HOURS 49 MINUTES 58 SECS 50 min 3 Times LOSS!! 0 HOURS 1 MINUTES 22 SECS 1 min 9 Times Perf.

Figure 2

I must note that what challenged us in unresolved issues were waits on CPU due to hyper-threaded CPU issues. 

Tuning involved for the third columns time elapsed involved the following:

  • Bind variable additions
  • Literal additions where bind peeking was an issue.
  • A change from ASSM, (Automatic Segment Space Management) to manual segment space management where freelists could be set at the object level, (dynamically allocated freelists were not able to adjust quickly enough for some of the load processes…)
  • Changes to initial transactions, percent free and parallel that made sense, (upping it for some, downgrading it for others that did not work with the partitioning or a need for partitioning…)

Inspecting I/O by SQL_ID

This script, (adopted from Tim Gorman’s sqlhistory.sql from, www.evdbt.com)  does a wonderful job of pulling a clean, clear picture of what physical and logical I/O is occurring in a single SQL_ID, seen here in Table 3 :

Table 3

+————————————————————————————————–+
Plan HV     Min Snap  Max Snap  Execs       LIO            PIO            CPU         Elapsed    
+————————————————————————————————–+
1766271350  659       659       1           593,134,283    12,961,814     14,657.45   15,067.05
+————————————————————————————————–+
========== PHV = 1766271350==========
First seen from “07/15/11 13:00:31″ (snap #659)
Last seen from  “07/15/11 13:00:31″ (snap #659)
Execs          LIO            PIO            CPU            Elapsed
=====          ===            ===            ===            =======
1              593,134,283    12,961,814     14,657.45      15,067.05
Plan hash value: 1766271350

 

    TQ  IN-OUT  PQ Distrib            

0

 CREATE TABLE STATEMENT   1543M(100)

1

  PX COORDINATOR

2

   PX SEND QC (RANDOM)  :TQ10001    464M    397G   4128K  (7)
  Q1,01  P->S  QC (RAND)

3

    LOAD AS SELECT
  Q1,01  PCWP

4

     PX RECEIVE    464M    397G   4128K  (7)
  Q1,01  PCWP

5

      PX SEND RANDOM LOCAL  :TQ10000    464M    397G   4128K  (7)
  Q1,00  P->P  RANDOM LOCA

6

       PX PARTITION LIST ALL    464M    397G   4128K  (7)

1

1000

  Q1,00  PCWC

7

        HASH JOIN RIGHT OUTER    464M    397G     14G   4128K  (7)
  Q1,00  PCWP

8

         TABLE ACCESS FULL HDN_TBL    231M    112G    576K (22)

1

1000

  Q1,00  PCWP

9

         HASH JOIN RIGHT OUTER    464M    171G   6967M   1551K  (7)
  Q1,00  PCWP

10

          TABLE ACCESS FULL HD_TBL    310M     50G    144K (34)

1

1000

  Q1,00  PCWP

11

          TABLE ACCESS FULL H_TBL    464M     95G    339K (13)

1

1000

  Q1,00  PCWP

 

 

                                              Summary Execution Statistics Over Time
                                                                              Avg                 Avg
Snapshot                          Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Time                 Execs            Per Exec            Per Exec            Per Exec            Per Exec
———— ——– ——————- ——————- ——————- ——————-
15-JUL 13:00        1      593,134,283.00       12,961,814.00           14,657.45           15,067.05
             ——– ——————- ——————- ——————- ——————-
avg                                 593,134,283.00       12,961,814.00           14,657.45           15,067.05
sum                        1
                                              Per-Plan Execution Statistics Over Time
                                                                                         Avg                 Avg
      Plan Snapshot                          Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
Hash Value Time            Execs            Per Exec            Per Exec            Per Exec            Per Exec
———- ———— ——– ——————- ——————- ——————- ——————-
1766271350 15-JUL 13:00        1      593,134,283.00       12,961,814.00           14,657.45           15,067.05
**********              ——– ——————- ——————- ——————- ——————-
avg                                                        593,134,283.00       12,961,814.00           14,657.45           15,067.05
sum                                               1

+—————————————————————————————————————————

This report clearly shows the amount of logical vs. physical I/O coming from the statement in question.  This gives the DBA a clear indicator if any object in the poor performing process would benefit a move to SSD or if tuning is in order to eliminate the I/O performance challenge.  A combination of both may be chosen, as there are multiple right outer hash-joins which clearly show as the performance hit in the time elapsed and in the temp tablespace usage/significant I/O categories, (note that the process needs to scan ALL the partitions for the objects in question…)

SSD and Forced Hash Joins on Indexes

When a database design is impacted by the front-end tool required to present data in a proper format, such as Business Analytics Software, the price can be high to the DBA who has to manage resource usage.  Many times the data must be presented in a very flat, wide format and requires a large amount of data pulled across a network interface.  This can be in anywhere from a couple 100GB’s to multiple Terabytes.  When you are the DBA looking at ways to increase performance when logical performance tuning is limited, solid state disk can offer you gains not offered anywhere else.

Business Analytics Software often will query a few 100GB to 1TB objects, hash join and then perform an order by.  For the DBA, to create an index, then using a hint to force a hash join between an index and the large table can improve performance greatly, but to move the index onto SSD can increase the hash and limit the requirements for SSD at the same time.

create table    new_ordertmp_tbl  compress pctfree 0 tablespace data_1 as
SELECT /*+ USE_HASH(t,i) INDEX_FFS(i,I_TBL2_IDX) INDEX(t,CT1) */
cast(MOD(t.i_id, 1000) as number(3)) im_key
, LEAST(ROUND(MONTHS_BETWEEN(:b1,  t.t_dt) + .4999 ), 48) AS r_key , t.i_id AS ib_id
, t.m_id, t.t_dt, cast(:b5 as varchar2(5)) m_cd, FIRST_VALUE(i.ib_id) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS ibcid, t.t_nbr, cast(TO_NUMBER(TO_CHAR(FIRST_VALUE(t.t_dt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
), ''YYYYMMDD'')) as number(8)) AS d_id,
FIRST_VALUE(DECODE(t.oct_cd, NULL, 'O','W', 'O', 'E', 'O', 'R', 'R', 'F')
) OVER(PARTITION BY t.d_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS tct_cd, SUM(t.ot_amt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS ot_amt
, FIRST_VALUE(NVL(t.pmt_cd, ''U'')) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS pmt_cd, SUM(t.i_cnt) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt) AS i_cnt
, FIRST_VALUE(t.cs_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cs_cd, FIRST_VALUE(t.cc_cd IGNORE NULLS) OVER(
PARTITION BY t.i_id, t.m_id, t.t_nbr,t.t_dt ORDER BY t.t_dt ASC
) AS cc_cd, t.oct_cd
FROM CT_TBL1 t, I_TBL2 i
WHERE t.m_id = :b5
AND t.t_dt BETWEEN  :b1  AND  :b2 AND i.ibid = t.i_id
order by i.i_id; 

Object Sizes:

CT_TBL1, partition 7= 800GB

I_TBL2=1.2TB

While the I_TBL2_IDX, the index created on the I_TBL2 and possessing only the columns required for this routinely run query and leading with the I_ID column, is only 200GB.

Execution Plan for Query:

Table 4

Description Object Cost Cardinality Bytes PartitionID
SELECT STATEMENT, GOAL = ALL_ROWS

107587

16356015

10079496

 WINDOW SORT

107587

16356015

10079496

  WINDOW BUFFER

107587

16356015

10079496

   WINDOW SORT

107587

16356015

10079496

    WINDOW SORT

107587

16356015

10079496

     FILTER
      HASH JOIN

107371

16356015

10079496

       PARTITION LIST SINGLE

330

16356015

8166868

7

        TABLE ACCESS FULL CT_TBL1

330

16356015

8166868

7

       INDEX FAST FULL SCAN I_TBL2_IDX

23597

6399400008

2120000

The hash join is thus, decreased to a total size of 1TB, vs. the much larger size it would have been if the hash join would have been run against the table.  By running it with the index residing on solid state disks, the actual performance to create the table from the CTAS in question was increased by 12 fold.

What does the IO look like on the solid state disk vs. the old standard disk?  The differences are startling when viewed through iostat, (table 5).

Table 5

Device: rsec/s wsec/s avgqu-sz %util
Raid 5 Disk

55200

 30224

215.72

84.03

SSD

52394.67

   41306

223.74

7.49

As you can see, the IO is much less impacting on the SSD than the standard disk.

Via graphs, such as from Cacti, the differences in IO throughput can be seen for standard disk, (figure 3) and solid state disk, (figure 4.)

Figure 3

 

 

 

 

 

Figure 4

 

 

 

 

 

 

 

Summary

Solid state disk is here to stay and often will be seen as a “silver bullet” for production I/O issues.  The goal of the DBA is to utilize this technology in a way that does not replace logical tuning and focus instead, in ways that may actually support positive changes enforcing both physical and logical tuning to get the most out of the new hardware available on the market today.

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Nov 22 2011

Restarting a Duplicate Process From a VERY Failed State

Category: DB ChaosKellyn Pot'Vin @ 8:38 pm

As part of an 11g Upgrade, it was found that a database environment could be built from one of the upgraded databases through a duplicate.  As this process had never been performed before in this fashion, a test was in order.  The test was an excellent chance to discover that the OSUser that performs the duplicate process was the proud owner of a .kshrc file with hard coded Oracle variables set which was an excellent choice if you want to really screw up a new duplicate database… :)

Scenario:

1. Duplicate has started with correct variables set.

2.  Subsequent shell scripts then “upset” the environment mid-process leaving a failed duplicate with ASM files for the controlfiles created, but the DBID and dbname still set to the target database, not the auxillary.

3.  Post the failure, the auxillary, (duplicate) database can only be mounted, not opened.

After setting environment to building auxillary database environment, after duplicate failure and you see in the spfile, name of db is no longer the auxilary database.
1.  Shutdown the auxilary database:
 [oracledbs]$ sqlplus ‘/as sysdba’
SQL> shutdown;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.@
In a second screen set to +ASM instance for auxillary, remove controlfiles for target db that have been set to that name through the asmcmd command console:
[oracle@dbs]$ asmcmd
ASMCMD> ls
DATA_H/
DATA_RAM_H/
ASMCMD> cd DATA_H
ASMCMD> ls
DB_H/
ASMCMD> cd DB_H
ASMCMD> ls
CONTROLFILE/
ASMCMD> cd CONTROLFILE
ASMCMD> ls
control1.ctl <–these are control files for auxillary, stuck with dbname of target, can’t be renamed, can’t mount db!
control2.ctl
control3.ctl
current.389.766743537
current.445.766743537
current.500.766743537
ASMCMD> rm control*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ASMCMD> quit
Back on your original screen now, restart the auxillary with pfile set to corect dbname and start with this pfile-
SQL>  startup nomount pfile=’/u01/app/oracle/product/11.2.0/dbhome_2/dbs/initdb_h.ora’
ORACLE instance started.
Total System Global Area 7.6964E+10 bytes
Fixed Size                  2215704 bytes
Variable Size            3.0065E+10 bytes
Database Buffers         4.6708E+10 bytes
Redo Buffers              189513728 bytes
SQL> create spfile from pfile;
File created.
SQL> shutdown;
ORACLE instance shut down.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 7.6964E+10 bytes
Fixed Size                  2215704 bytes
Variable Size            3.0065E+10 bytes
Database Buffers         4.6708E+10 bytes
Redo Buffers              189513728 bytes
Now you can restart the duplicate process and the database will again be recognized correctly.


Nov 22 2011

How to Break and EM12c and Other Hobbies

Category: DB ChaosKellyn Pot'Vin @ 8:25 pm

As many know I’ve been busy trying to bring our many-times patched 10g Enterprise Manager migrated to a new server with EM12c.  I thought it would be interesting to see how fast I could break it, considering my skills, I knew it might not be a challenge.

Scenario

1.  Bug with listener refusing to connect dynamically to EM12c repository database.

2.  After releasing code to the database for advanced notification,  the SYSMAN.MGMT_ADMIN_DATA package has gone invalid, which is used to manage much of the repository at the command line.

3.  The repository owner password has become corrupted.

These were my high level notes as I went through the troubleshooting:

Reconfigure Repository:
$AGENT_HOME/bin/emctl config oms -store_repos_details -repos_port 1521 -repos_sid emrep12c -repos_host host_nm -repos_user SYSMAN -repos_pwd password
Stopping Old Oracle OMS and configuring:
export ORACLE_HOME=/opt/oracle/app/OracleHomes/oms10g
cd $ORACLE_HOME/opmn/bin
./opmnctl stopall
cd $ORACLE_HOME/bin
./emctl config oms -change_repos_pwd
cd $ORACLE_HOME/opmn/bin
./opmnctl startall
Invalid package after loop from listener bug, corrupt SYSMAN password and invalid MGMT_XXX pkgs!
Can’t reset password, pkg used for it is invalid!
ORA-04063: package body “SYSMAN.MGMT_ADMIN_DATA” has errors
ORA-06508: PL/SQL: could not find program unit being called: “SYSMAN.MGMT_ADMIN_DATA”
ORA-06512: at line 1
Can’t start repository, password error in logs:
Error occurred. Check the log /local/u01/app/oracle/product/12.1.0/gc_inst/em/EMGC_OMS1/sysman/log/secure.log
[oracle@vwgrid01 bin]$ ./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0
Soooo close…but did I get far enough?, (Nope, failure…have to remove everything!!)
Can’t drop repository:
<Database ORACLE HOME>/bin/emca -deconfig dbcontrol db -repos drop -SYS_PWD sys_password -SYSMAN_PWD password
Can’t reconfigure through the system, either!
<Database ORACLE HOME>/bin/emca -deconfig dbcontrol db -repos drop -cluster -SYS_PWD sys_password -SYSMAN_PWD password
No chance either…I’m stuck in a terrible loop!
Solution-
backup and then edit out the 12c environment from the inventory.xml file in the ContentsXML dir of the Lsinventory
kill any process that are still running out the the EM12c home.
rm -rf the 12c home directory
Uninstall the emrep12 database in the 11g home
Create a new emrep12 database in the 11g home
recreate a new inventory file:
$ORACLE_HOME/oui/bin/runInstaller -silent -invPtrLoc “/u01/app/oraInventory/oraInst.loc” -attachHome ORACLE_HOME=”/u01/app/oracle/product/11.2.0″ ORACLE_HOME_NAME=”OraDb11g_home2″
Install the EM12c once again, as now the installation appears to have never existed…
If a restart of the installation needs to be done, you need to look in the home it fails on, (*saying that it’s already installed there) and remove the *_temp file from the dir, then try again.


Oct 20 2011

The NO_INVALIDATE Option in DBMS_STATS with 10g

Category: DB ChaosKellyn Pot'Vin @ 1:51 pm

I had worked hard on a report, 47 SQL statements in all to tune it down from 5 hrs to under 30 minutes.  The first runs had been quite successful, so when a third run sent an alert on temp usage, I knew something was wrong.

     SID PROCESS      MACHINE SQL_TEXT    SQL_ID     TOTAL MB
-------- ------------ -------------------- ---------------------
     507 1137	      HOST SELECT *** 7t3muww36xhzn  45516
     600 1139	      HOST SELECT *** 7t3muww36xhzn  45516
     525 1132	      HOST SELECT *** 7t3muww36xhzn  45516
     509 1135	      HOST SELECT *** 7t3muww36xhzn  45516

I checked the stats first, as one of the fixes was to ensure the staging tables in this process were collecting stats after the initial feeds came in, but both tables involved showed valid statistics:

SQL> select num_rows, last_analyzed from dba_tab_partitions 2 where table_name='<I_STAGE>' 3 and partition_name='P170';

  NUM_ROWS LAST_ANAL
---------- ---------
 480900000 17-OCT-11
SQL> select last_analyzed from dba_tables 2 where table_name='<SML_TBL>';

LAST_ANAL
---------
17-OCT-11

I ran a quick AWR report for the specific SQL_ID to see what I was dealing with, execution plan wise..


              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     46198 18-Oct-11 09:00:09       282       7.9
  End Snap:     46200 18-Oct-11 10:00:13       245       8.4
   Elapsed:               60.07 (mins)
   DB Time:            1,874.62 (mins)

 

SQL ID: 7t3muww36xhzn           DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> SELECT ***

    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   324636810               4,785,428             4         46199          46199
2   4097803110                  1,047             1         46200          46200
          -------------------------------------------------------------

Plan 1(PHV: 324636810)
----------------------

Plan Statistics                 DB/Inst: PRODBASE/prodbase  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                         4,785,428    1,196,357.0     4.3
CPU Time (ms)                             3,450,070      862,517.6     8.4
Executions                                        4            N/A     N/A
Buffer Gets                                 759,453      189,863.3     0.0
Disk Reads                                  683,619      170,904.8     2.8
Parse Calls                                      35            8.8     0.0
Rows                                              0            0.0     N/A
Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |     9 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |     1 |   240 |     9 |       |       |  Q1,02 | P->S | QC (ORDER)
|   4 |     VIEW                      |             |     1 |   240 |     9 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |     1 |   120 |     9 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |     1 |   240 |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |     1 |   240 |       |       |       |  Q1,01 | P->P | RANGE
|   8 |         SORT ORDER BY STOPKEY |             |     1 |   240 |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |     1 |   120 |     5 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |     1 |   109 |     3 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |     1 |   109 |     3 |       |       |  Q1,00 | P->P | BROADCAST
|  12 |             PX BLOCK ITERATOR |             |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| I_STAGE     |     1 |   109 |     3 |   KEY |   KEY |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | SML_TBL     |  5002 | 55022 |     2 |       |       |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

This is the one that was eating up all the temp! Note that even though I checked stats, stats were correct as of the previous day, no changes to the partition stats, the execution plan only shows one row, which anyone who listens to Maria Colgan knows, that’s just Oracle giving you the benefit of the doubt and saying, “I don’t think there’s any rows in this object, (or sub-object in this case..) but I’ll give you 1 row for the fun of it!”

Second execution plan in the report is the one I desired:

Plan 2(PHV: 4097803110)
-----------------------

Plan Statistics                 DB/Inst: PRODUCTION/PROD  Snaps: 46198-46200
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                             1,047        1,046.7     0.0
CPU Time (ms)                                   967          967.0     0.0
Executions                                        1            N/A     N/A
Buffer Gets                                   2,007        2,007.0     0.0
Disk Reads                                        3            3.0     0.0
Parse Calls                                       9            9.0     0.0
Rows                                             40           40.0     N/A
User I/O Wait Time (ms)                           1            N/A     N/A
Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |    TQ  |IN-OUT| PQ
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |       |  2834 |       |       |        |      |
|   1 |  COUNT STOPKEY                |             |       |       |       |       |       |       |        |      |
|   2 |   PX COORDINATOR              |             |       |       |       |       |       |       |        |      |
|   3 |    PX SEND QC (ORDER)         | :TQ10002    |  1272K|   291M|       |  2834 |       |       |  Q1,02 | P->S | QC
|   4 |     VIEW                      |             |  1272K|   291M|       |  2834 |       |       |  Q1,02 | PCWP |
|   5 |      SORT ORDER BY STOPKEY    |             |  1272K|   152M|   389M|  2834 |       |       |  Q1,02 | PCWP |
|   6 |       PX RECEIVE              |             |    40 |  9600 |       |       |       |       |  Q1,02 | PCWP |
|   7 |        PX SEND RANGE          | :TQ10001    |    40 |  9600 |       |       |       |       |  Q1,01 | P->P | RA
|   8 |         SORT ORDER BY STOPKEY |             |    40 |  9600 |       |       |       |       |  Q1,01 | PCWP |
|   9 |          HASH JOIN            |             |  1272K|   152M|       |     7 |       |       |  Q1,01 | PCWP |
|  10 |           PX RECEIVE          |             |  5005 | 55055 |       |     2 |       |       |  Q1,01 | PCWP |
|  11 |            PX SEND BROADCAST  | :TQ10000    |  5005 | 55055 |       |     2 |       |       |  Q1,00 | P->P | BR
|  12 |             PX BLOCK ITERATOR |             |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWC |
|  13 |              TABLE ACCESS FULL| SML_TBL     |  5005 | 55055 |       |     2 |       |       |  Q1,00 | PCWP |
|  14 |           PX BLOCK ITERATOR   |             |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWC |
|  15 |            TABLE ACCESS FULL  | I_STAGE     |  1271K|   139M|       |     4 |   KEY |   KEY |  Q1,01 | PCWP |
------------------------------------------------------------------------------------------------------------------------

So what changed? What impacted my statistics?

Upon investigation, I came to the conclusion that it is a combination of a “feature” with what I think is a bug in 10g dbms_stats.

A search of stats processing showed that during the one process that was executing against the P170 partition on the I_STAGE, there were a number of other partitions in this same table having stats gathered post loading.

declare v_stage_table_name varchar2(64); begin select min(stage_table_name) into v_stage_table_name from stage_tables 
where stage_table_type_cd = '<I_STAGE>'; dbms_stats.gather_table_stats (ownname => 'dw_user', tabname => v_stage_table_name, 
partname => 'P450' ,estimate_percent =>.01, granularity=>'PARTITION', method_opt=>'for all columns size 1', 
no_invalidate=> false, cascade=>false, degree=>4); end;

Now the key here in the statement above is:

no_invalidate=>false

If you read the description for this from Oracle:

no_invalidate Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE. Use DBMS_STATS.AUTO_INVALIDATE. to have Oracle decide when to invalidate dependent cursors. This is the default. The default can be changed using the SET_PARAM Procedure.

The surmised bug is one where even though the dbms_stats being performed by another process is partition level, the invalidation of the cursors is across all partitions in the object, causing them all to be invalidated, requiring them to re-parse the SQL.  (There are a number of similar bugs already documented in 10.2.0.4.0 for partition level statistics gathering…)

The feature to allow Oracle to re-parse and take advantage of the newest statistics information in the data dictionary resulted in a poor performance challenge in this instance, as the cursors were invalidated on a process that needed no changes to statistics.

I tested repeatedly against partitions, collecting stats with the no_invalidate set to false or true and even to AUTO to see what would occur and it consistently impacted my cursors against other partitions.  I can find not documented bug, but as many know, I’m about to move everything to 11g  in short order and expect it would be a waste of time to pursue it to far…

I notified the Java developer who owns this code to please update the no_invalidate=>true to correct the performance impact short term and look forward to 11g bugs to replace my exhaustion on 10g ones! :)


Oct 09 2011

Oracle Open World 2011 Followup

Category: DBA LifeKellyn @ 8:51 pm

Oracle Open World is over for me now, but what a great event it was.  I met so many people and actually was able to spend a little time getting to know a few of them.  I attended parties, dinners and meetups.  I networked myself, my company and RMOUG.  It was a phenomenal turn out, even with a few folks missing that I knew we’d miss terribly

Arriving-  California Zephyr

If you hadn’t heard, we decided to take a train from Denver, CO to San Francisco, CA. via Amtrak, called the California Zephyr.  This was a 33 hour train ride, slowly trekking at times, racing traffic at others, through beautiful scenery, all from a second floor sleeper car.  We had lovely meals with folks in the dinner lounge car and visited with others on the observation deck, (specialized car with a glass, domed ceiling.

Upon our arrival in San Fran, we quickly rented a car and headed up to a lovely party at Oracle’s one and only, Graham Woods.  I quickly found Gwen Shapira, (we’ve been trying to meet for awhile now, so was glad to FINALLY have the opportunity met…) caught up with Alex Gorbachev and teased by Cary Millsap about our challenges with understanding that we needed to PAY for our bottle of wine on the train and that it was not included in the price of the meal, (glad they couldn’t figure out how to pronounce my name over the intercom, but quickly figured out that it was me and my wonderful companion they were looking for-  his name is not so difficult to pronounce!)

Debra Lilley arrived soon into the party and many found fun as she brought me from room to room in search of those she wanted to introduce me to.  I’m to ensure she has a great birthday at February’s RMOUG, so it’s important she and I bond, ya know… :)

I ended up with a lovely embarassing situation upon being introduced to “Greg Brown” who I had to ask repeatedly where I knew him from, which he found quite hilarious, considering our emails until it hit me I was speaking to “Greg Rahn”.  He was a good sport about my lack of IQ after the long trip and I appreciate his patience.

Sunday, OOW11

The first day consisted of me attending a few of Tim Gorman’s sessions that I, as a VLDB DBA, had lived, but had never really sat through before.  I still picked up a few things from my first DBA God and yes, the Gods are good to me.  At his second session, Tim pointed out Andy Klock and I to each other, knowing we’d been tweeting back and forth about meeting up, so we sat together and it gave me an opportunity to physically meet one of the major clients I supported while at Pythian.  It was a pleasure to speak to her in person and glad to see someone not as indepth in the database world revel in the festivities and presentations of Oracle Open World.

Sunday evening was the ACE dinner and attendance was fantastic, (along with the food!)  It was easy to see why no one sat at any one table for very long and I did get to spend a good amount of time speaking with Mark Bobak, Kent Graziano, Doug Burns, James Morle, along with many others.

Monday, OOW11

I didn’t do to well on my schedule builder for Monday or Tuesday and if I go back over the actual schedule of sessions, I’m sure I’ll find a few that I should have gone to.  Monday night was the Oak Table dinner which was great fun.  We started out meeting up with Mike Swing and Craig Shalahammer for drinks before heading to the dinner.  Craig showed me some of his mathamatica graphics for buffers and latch visuals, (yes, his is one of those sessions I obviously missed adding to my schedule!)  We spoke about databases a bit, but the conversations regarding life were much more interesting.

Upon heading over to the Oak table dinner, there was a huge line of people waiting for taxis, but Mike Swing, Tim Gorman and I went up to the bellhop instead.  I’m not sure if it was Mike or Tim, but one of them asked if there was a better option and we had a personal SUV taking us to our dinner in just a few minutes for a few dollars more than a taxi would have cost, (note to future OOW attenders…screw the lines! :) )

As soon as we entered for the dinner, the gracious Carol Dacko ensured us our places and Mogens Norrsgard was busy entertaining everyone.  He and I quickly attained a quirky regard for each other and the jokes about Tim,  “I saw him first!” as the game of us challenging each other for Tim’s affections commenced.

I was seated next to Jeremy Schneider, who I thoroughly enjoyed conversing with.  He’s a brilliant young DBA, so his company, along with Gwen Shapira, Robyn Sands, Tim Gorman, Andy Klock, Rihaj Shamsudeen and Alex Gorbachev guaranteed a lively conversation.  This was also my first opportunity to meet up with Yury Velikanov from Pythian. I’ve only worked with him virtually, so this was a great chance considering he resides in Austrailia.  He is technically skilled, easy-going and quick to make friends-  a great representative for the Pythian name, like Gwen and Andy.

Tuesday, OOW11

Tuesday was the start for most of us feeling the heat from staying up to late and eliminating sleep from our diets.  My voice was starting to sound a bit horse at this point, so I’m sure folks were starting to wonder how well I had fought off my cold, (not very well in the colder, wetter weather of San Francisco…)  We met up with Ben Boise from Quest Software and spent a bit of time at the Enkitec booth.  The Enkitec booth was hands-down, the winner for me.  Kerry Osbourne had told me at Graham’s party, after I had finished teasing Frits Hoogland that he’d been given my copy of the Exadata book via Tanel, that if I came by, he’d have a copy for me.  Yeah, wasn’t turning that down… :)   So while there, spoke to Kerry about what a great DBA and all around good guy Karl Arao was.  He’s succeeding there and post the converation, asked Kerry and Randy Johnson to sign my copy of the book.  I went back a bit later to talk with Karl a bit more and for the fun of it, forced him to sign my copy, too.  Tanel had tweeted that he was going to have a secret Exadata hacking session that day, so I headed out with my book and was able to get Tanel’s signature, too…  No, none of you can have it… :)

On our way out of Moscone for the night, spent about 20 minutes speaking with Jonathan Lewis.  Dr. Steve Dorsey and a guest joined Tim Gorman, Jonathan and I to complete the conversation about the evenings plans.

We traveled down to The Stinking Rose for a wonderful dinner of wonderful dishes with way too much garlic in them.  I truly feared anyone who would come near us post the meal, but it was well worth it, (and apologies to anyone near us afterwards…)

Wednesday, OOW11

The day went quickly and the evening was the blogger meetup, there was a break between networking, dropping off postcards promoting RMOUG 2012, where we were able to head over to Mogen’s office, (i.e. back of Chevy’s restaurant this year…) where everyone had been hanging out throughout each day when needing a break from the chaos.  Throughout the day, there was some conversation via Twitter on who was going to get my wristband, as we were bowing out of the concert/chaos that night.  First it appeared that Greg Rahn would need it, but I had already pointed him towards Mike Swing who was offering him up one, so Doug Burns was the lucky winner and new owner of my band.  The group at Chevy’s, as always was fun to simply observe, let alone partake in conversation with and how can you say no to Mogens?

The meetup was a quick hop and a jump over to Jillian’s where there were a number of private parties going on, but the bloggers meetup by Pythian was the top deal.  We were all given a bandana and sharpies to get each other’s signature, which I was a happy blogger to just go around and meet as many as I could.  I enjoyed writing “Kellyn was here” and pointing arrows to the Pythian logo on the bandana or as everyone was wearing them on their heads, it had humor all in itself, (note to self, another reason I’m glad no one decided to wear the bandana as a bustier…)

Paul Vallee did a lovely tribute to Steve Jobs in the beginning of the meet up at the high time, Pythian first gave away an Apple TV to the blogger picked who had posts from the dates randomly chosen, (happened to be RMOUG Training Days week, so I didn’t even have to check, I KNEW I had posts out there.. LOL)  Yury won the TV and then they gave away an IPod Touch to the person who had received the most signatures.  I was sure I was no where near the top, but then got a look at the leader, Tim Hall’s bandana.  Upon counting mine up, I was two short of his number and he won, but Tim, the gracious guy he is, handed me the prize.  I, confused easily as I am, asked why I was getting it, I came in second and he replied, “I [worked] around to get my signatures and you just got them while meeting everyone, I’m disqualifying myself!”  Thank you, Tim Hall, from me and my children, who one in particular has been jonesing for one of these! :)

Post the meetup, Tim Gorman and I were going to head out for some dinner and Alex Gorbachev joined us at a wonderful Indian restaurant called Amber.  Wine and conversation flowed, while the fun and chaos of the Petty/Sting concert went on at Treasure Island.  When we did finally finish, it was just in time to meet everyone back over at the night’s bar of choice, “W”.  I sat and spoke most of the evening with Martin Paul Nash, Alex G. and Dan Norris, (Mogen’s slept in the corner, those Danes and their catnaps to catch up on jetlag really impresses me!)  Folks came in from the concert, Lisa Dobson, Connor McDonald, Andy Klock, Doug Burns and others, little by little.  We stayed and talked until my voice had become so hoarse that I was starting to sound a bit like Barry White.

Gotta say, another brilliant, easy-going and friendly DBA, Martin Paul Nash.  Between Martin, Andy, Jeremy, Connor and Dan, I’m feeling good about the future of our database administration world.

Thursday, OOW11

Surprise came the next morning when we found out many of the people we left the night before had never actually slept that night.  They continued to enjoy the opportunity to see folks that many may only see once a year and had simply stayed up!  A few of them were presenting on Thursday, so a lot of attendees may have wondered about that, too… :P

I attended only one session on Thursday, had slept in too late for the one I’d wanted to attend on optimal performance, (and had to answer to Gwen and others as to why I wasn’t there… :) )  Maria Colgan was great, (as usual) and she was one of the last folks I really wanted to meet, but had reserved the fact by the group that crowded her immediately after the presentation, that it just wasn’t going to happen.  Tim and I went over to Chevy’s to have a last OOW11 lunch with Mogens’ group before heading to the airport and who shows up to have lunch there, too?  Yes, Maria Colgan, so I did get to meet her…AND have lunch with her, (along with DBA Gods, Demi-Gods, you know the drill… :) )

During all of this, I did a lot of RMOUG networking to ensure that I added as much to the great plans for the 2012 conference that I could.  I was thrilled to have so many folks dedicated to coming out to Denver in February to talk, (because the conference is second to Debra Lilley’s birthday, I swear the marketing is there!)  Had a lovely conversation about bringing RAC Attack out for training days this year, which I think will be well received.  Jeremy Schneider is in Africa the week of the conference, but we are working on others who can really take on this great opportunity for DBA’s to take advantage of.

I wish I could say the plane ride back was relaxing and a wonderful time to reflect on a great Oracle Open World, but as usual, the airlines were busy trying to ruin travel for all of us.  I am thrilled with everyone I met while in San Francisco and although I should have attended more sessions, I wouldn’t have changed a thing.

Thank you Oracle, Pythian and all that I met this last week for such a wonderful experience!


Oct 04 2011

OOW11 Dinners

Category: DBA LifeKellyn Pot'Vin @ 12:27 pm

Yes, typing on my tablet screen again, so patience with my short posts…:-)

Had the pleasure of attending both the ACE and Oak table dinners the last two nights.  Wonderful, impressive and technically gifted people at every table and a fantastic opportunity to meet so many that I’ve only known virtually.  I enjoyed another set of high energy conversations with Gwen Shapira, Debra Lilley, Robyn Sands and Lisa Dobsen- all women who make me proud of the representatives of my gender in the technical world. 

Spent sometime with Mark Bobak, Kent G., Alex G., Craig S., Jeremy Schneider, Mike Swing and Yuri Y.  I was also so thrilled to spend time with a virtual team member, Andy Klock- great guy to work with and happy to meet in person.  Carol Dacko did a phenomenal job planning the Oak Table event and I know Robyn Sands helped with some of the arrangements, too.  Mogens N. Is beyond entertaining and his legend is intact another year.  I threatened to stalk Tanel unless he signed my Exadata book, (thank you, thank you Kerry Osbourne for the copy…) and still am missing mentioning many others that should be named here.  All made an impact and were a pleasure to meet.

Food and spirits pale in comparison to the wonderful opportunity these dinners offer us all to sit and speak with the peers we admire so much…


Next Page »