Category: Oracle

The last couple weeks I’ve been lucky enough to get time with a new ZFSSA Simulator, (it will be out soon for everyone, so you’ll have to just play with the current one available, patience grasshopper! :)) and spent checking out the newest features available with Database as a Service, (DBaaS) Snapclone via Enteprise Manager 12c.  I’m really thrilled with Snapclone-  In two previous positions, I spent considerable time finding new ways of speeding up RMAN duplicates to ease the stress of weekly datamart builds that were sourced off of production and this feature would have been a lifesaver back then!

As Oracle’s DBaaS offering is so full featured, I think its easy to have misconceptions about the product or find blog posts and documentation on earlier releases that lead to misconceptions. Due to this, we thought it might help if I tried to dispel some of myths, letting more folks realize just how incredible Snapclone really is!

Misconception #1- Snapclone only works with the ZFS Storage Appliance

DBaaS does offer a fantastic hardware solution that requires a hardware NAS like Netapp and ZFS/ZFSSA, but that’s not the only option.  There is also the software solution which  can work on ANY storage.  There’s no requirement for the test master database, (used to track changes and save considerable space vs. a traditional cloning method…) on where it must reside and that means it can be on different storage than production.

There are benefits to both the hardware and software solutions of Snapclone.  Keep in mind, Oracle prefers to support hardware and software that are engineered to work together, but they realize that not all customers will have a ZFS or NetApp Solution in place, so they ensure that Snapclone solutions are available to all storage a customer may have in their shop.  Some of those benefits and requirements are listed below, but you can clearly see, Snapclone does not have a requirements of ZFS or NetApp:


Misconception #2-  Snapclone Requires a Cloud Implementation

It is true that Snapclone requires database pools to be configured and built, but this is in the environment’s best interest to ensure that it is properly governed placement is enforced. This feature is separate from an actual “cloud” implementation and the two shouldn’t be confused.  There are often dual definitions for terms and cloud is no different.  We have private cloud, public cloud, the over-all  term for cloud technology and then we have database pools that are part of Snapclone and those are configured in the cloud settings of DBaaS.  They should not be confused with having to implement “cloud”.

Misconception #3- Unable to Perform Continuous Refresh on Test Master Database

There are a couple easy ways to accomplish a refresh of the test master database used to clone from production.  Outside of the traditional profiles that would schedule a snapshot refresh, DBAs can set up active or physcial dataguard or can use storage replication technologies that they already have in place.  To see how these look from a high level, let’s first look at a traditional refresh of a test master database:



Now you’ll notice that the diagram states the test master is “regularly” refreshed with a current data set from production and if you inspect the diagram below, you will see an example of an example of a software or hardware refresh scenario to the test master database, (using datamasking and subsetting if required) and then creating Snap Clones.



Now as I said earlier, you can use a standby database, too, to perform a clone.  The following diagram shows the simplicity of a standby with Dataguard or Golden Gate.  Notice where Snap Clone takes over-  it’s at the Standby database tier, so you still get the benefit of the feature, but can utilize comfortable technology such as Dataguard or Golden Gate:


Misconception #4- Snapclone Can Only be Used with DB12c

Snapclone works with any supported version of the database 10gR2 to 12c. Per the DBaaS team of experts, it may work on earlier versions, they just haven’t certified it, so if there are any guinea pigs out there that want to test it out, we’d love to hear about your experience!

The process of Snapclone is very simple once the environment is set up.  With the Rapid Start option, it’s pretty much the “Easy button” for setting up the DBaaS environment and the process solidified once service templates are built and the service request has been submitted in the Self Service Portal.  There isn’t anymore confusion surrounding where an Oracle home installation should be performed or what prefix is used for database naming convention and other small issues that can end up costing an environment in unnecessary complexity later on.

Misconception #5- Snapclone doesn’t Support Exadata

A couple folks have asked me if Snapclone is suppored on Exadata and in truth, Exadata with Snapclone offers a unique opportunity with consolidations and creating a private cloud for the business.  I’ll go into it in depth in another blog post, as it deserves it’s own post, but the following diagram does offer a high level view of how Snapclone can offer a really cool option with Exadata:


There are so many features that are provided by Snapclone that its difficult to keep on top of everything, but trying to dispel the misconceptions is important so people don’t miss out on this impressive opportunity to save companies time, money and storage.  I know my whitepaper was over 40 pages on DBaaS and I only focused on NetApp and ZFS, so I do understand how easy it is, but hopefully this first post will get people investigating DBaaS options more!

This is the first of many posts I’ll do where we take a particular issue and trouble-shoot.  I’ll go through multiple scenarios to show the common steps to take via the Enterprise Manager 12c from discovery to identification to research to explanation.

This blog will use the following features:

  • ASH Analytics
  • SQL Details
  • ASH Report

The first thing we need is an issue-


Note the CPU red line-  the IO is considerably over that and quite a lot of IO, along with Commit and Concurrency waits.  We’ll start with this as our issue that we can see, has quickly escalated over our timeline, (15 minutes was requested from the view…)

Scanning down to the Top SQL, (by default) we can then see that this IO pressure is primarily caused by two-three SQL_ID during the timeline:


We can click on the first SQL_ID and display the insert statement.  We also note the escalation from 1pm to 1:15pm in the Statistics tab that shows the active sessions.


So I know a few of you are asking, “Why isn’t there any Plan Hash Value?  If it is a simple insert, with no subquery, etc. then would it have a Plan Hash Value?   Think about that as we inspect the next tab, Activity.


We are now inspecting just the wait events connected with THIS SQL_ID and the sessions that are involved with it.  On the right, we can quickly see that there is a huge percentage of cell single block physical read waits involved with this insert.  What does that tell you about the hardware that the insert is running on?  Yes, it’s part of an Exadata environment.

Scrolling down allows us to view the sessions involved in this timeline.  This is not just the sessions for the 15 minutes of escalation we are interested in.  These are ALL the sessions in the timeline shown above.  We can view from the top, .77% of the activity for the first one and note that it decreases as we go through the list.  Also notice that it’s 1-10 of 200 sessions total that have been captured during the total timeline.  We already know a majority of them are part of our escalation in IO on the system.


Inspecting the last 10 helps us see the decrease in IO resources as concurrency waits increased-


We are down to .39% by the time we get to the 200th session captured in this timeline.  Now getting back to our “Should we have a plan hash value for our insert statement?” question.  No, we shouldn’t and there isn’t a plan, nor is there plan control that will be found for this simple insert.  This is verified when you inspect the Plan and Plan Control tabs for this SQL_ID.


We then can look at the Tuning History.


You might wonder how could it be that this insert, that doesn’t have a plan or plan hash value is found in the tuning task?  If it is a top SQL that is deemed an impact to performance, then it will show up here.  There is no escape for this insert.

At this time, we can see that we are looking at a concurrency issue that is not coming from a user or from a specific session in any way.  So the next step is to get a clearer look at the 5+ minutes that lead up to the heavy IO to see what is really going on and eliminate assumptions.


I simply click on “Run ASH Report” in the SQL Details window, change the times to 1:10pm –> 1:16pm and notice that we are focused on this SQL_ID right still.

The report shows the standard header and high level info, but then once you dig in, you’ll see the top events:


We see that yes, it is almost 96% of the activity with an average of 21 active sessions running at a given time.


If you were to have traced this session, you’d see something similar to the following in the trace file to represent the data in the first three lines that we now see in the ASH report:

WAIT #3: nam='cell single block physical read' ela= 945 cellhash#=3983857215 diskhash#=3558547146 bytes=8192 
obj#=12894 tim=1912764532115123
WAIT #3: nam='cell single block physical read' ela= 483 cellhash#=1017296094 diskhash#=1180032005 bytes=8192 
obj#=46129 tim=1213898462783761
WAIT #3: nam='cell single block physical read' ela= 697 cellhash#=1017296094 diskhash#=3896957901 bytes=8192 
obj#=47883 tim=1158963811267355

Note that there are THREE cell single block physical reads for the insert event.  Why would we see this?  This is the one question to the readers of this blog-  What would cause the single block physical reads on these three indexes?



It’s not the table we are having an issue with but the indexes that are on the table.  Three indexes show up very clearly in our Top DB Objects section showing our wait event, cell single block physical read.  The ASH report gained this information from the obj#, just like you see in the trace file info in red up above that I displayed for you.  For each of those entries shown, a very similar percentage of the event is displayed farther up in the Top Event P1/P2/P3 Values.

So from here, I think we’ve inspected the SQL_ID pretty fully and we know why the problem is occurring.  There are three indexes that require a scan

The final answer we want is why is this heavy load occurring?  Well, ASH answers this for us, too-


Yeah, someone is TRYING to put a heavy load on the database with Swingbench and the active average sessions matches our active average sessions overall.   We’ll leave it as is-  no one wants to be responsible for killing someone’s load test… :)


There are a number of Verification Utilities for Enterprise Manager 12c, (EM12c) and I’ve written about them before, but today I’m going to start on the Repository Verification Utility, (REPVFY).  This will be an ongoing series, as there are so many valuable features rolled into the utility and new ones that will be added as new patches and releases happen.

For this post, I’m going to be reviewing a level 2 diagnostics report from a new repository database.  This is going to offer us all kinds of valuable data about the environment and should be considered by any EM12c administrator when doing a health check of their EM environment.

Inspecting the files from the output, you’ll note that they don’t require a lot of space, even when unzipped and this allows the admin to retain historical information about the EM12c environment.

Note that the files are broken up by type of review and marked with the timestamp for many of these files.


For this first blog post, I’m going to jump right into the first log alphabetically, the advisors log file, which as you might guess, is advising on performance.  The interesting thing is as small as this file is, it is actually comprised of a number of reports that one would be surprised how much work to run all of it manually.

The Advisors Log is comprised of the following reports:

  • ADDM Report
    • Space Advisor Report
    • ASH Report
  • AWR Report
  • SGA Resizing Report

The report then ends with the CBO Repository information before closing up this one of many log files generated by a simple and single execution from the command line.  I would definitely read through to this final section to understand the stats collection in the EM Repository database a bit more clearly, as I do touch on it in this post.

Advisor Log Header

When viewing the log file, the first thing it tells you is where the source SQL for the report comes from.  This is helpful if you want to inspect the SQL used.

The next is information about the release versions and such involved.  This is standard for most of the log files in the repvfy output:


------------------ ----------------------------------------
EMDIAG Version     2014.0318
Repository Version
Database Version
Test Version       2014.0402
Repository Type    CENTRAL

The next sections of the report will go over the repository database and report on the type of database, (single instance or RAC, protected, version, etc.) and other pertinent information.  This is multiple query output I would like to see rewritten and compiled into one result set, but it’s still good to have this data, no matter what the output.

We then jump into SGA information and init/sp parameters.  What surprised me is that next section was dedicated to System statistics and the OS stats data.  This data is often only updated when a change to hardware has been made or if parallel is implemented and it’s set up in an upper level of the report indicating that the author of the report found this information valuable to the performance of the repository database, (not agreeing or disagreeing here, found it more curious than anything as I find it can be important in large databases, which the EM repository rarely is…)

--- Database OS statistics
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    03-29-2014 02:00
SYSSTATS_INFO                  DSTOP                                     03-29-2014 02:00
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                           1405
SYSSTATS_MAIN                  IOSEEKTIM                              18
SYSSTATS_MAIN                  IOTFRSPEED                           4096


ADDM Report

Next the utility runs an ADDM report for the previous hour’s snapshot.  Now ADDM reports rarely receive a lot of love from DBAs.  It’s not the top info it includes or the findings, it’s the recommendations that commonly leave them less than thrilled.  How often do we read one that doesn’t recommend some type of hardware solution to a problem that we know with hardware applied, will only rear its ugly head again a few months to a year later?  If there was one recommendation I could make to this valuable report is to take those recommendations out and I think you’d see a lot more DBAs embracing ADDM reports.

The report for the repository database is true to form, only this time it’s Oracle’s objects being offered hardware solutions!

ECMAM_PK                                  0          0          0          0
Move object SYSMAN.ECMAM_PK to flash storage, estimated I/O wait reduction is 10 millisec

EM_CS_SCORE_HIST                          0          0          0          0
Move object SYSMAN.EM_CS_SCORE_HIST partition 2014-03-25 00:00 to flash storage, estimated I/O wait reduction is 125 millisec.

EM_EVENT_MSGS_PK                          0          0          0          0
Move object SYSMAN.EM_EVENT_MSGS_PK partition OPEN to flash storage, estimated I/O wait reduction is 5 millisec.

EM_METRIC_KEY_GUID_IDX                    0          0          0          0
Move object SYSMAN.EM_METRIC_KEY_GUID_IDX to flash storage, estimated I/O wait reduction is 336 millisec

Now who is to say that a savings of 10-336ms is worth the cost of flash storage?  I’m not, but this is just more proof to me that the recommendations section is still not where most DBAs would like to see the ADDM report.

Further sections of the report do live up to the hype and offer similar value to what we see in AWR for the EM repository database:

Top Background Events            DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Avg Active
Event                               Event Class     % Activity   Sessions
----------------------------------- --------------- ---------- ----------
os thread startup                   Concurrency          47.91       0.03
CPU + Wait for CPU                  CPU                   9.77       0.01


To clearly show that it is Concurrency that is causing the OS Thread Startup % of activity is important.  This helps keep the administrator on the right track if they were to investigate the wait event.

The following section goes through the code and action involved in the last hour’s ADDM activity.  You can see if there is anything causing an issue, but as we can see in our repository, not much of anything-  it’s nice an quiet…. J

Top Service/Module               DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Service        Module                   % Activity Action               % Action
-------------- ------------------------ ---------- ------------------ ----------
SYS$USERS      DBMS_SCHEDULER                 8.84 EM_REPOS_SEV_EVAL        2.79
EM_TASK_WORKER_3         1.40
MGMT_COLLECTION.COLLECTI       8.37 Task:881 Repositor       4.19
Task:37 Performanc       1.40
OEM.PbsSystemPool              6.51 PingHeartBeatWork:       2.33
LoaderWorker             1.40
GATHER_SYSMAN_STATS            2.33 UNNAMED                  0.93


Next it reports on sessions and then on percentage of activity was allocated to different SQL types and the percentage allocated to different phases of the SQL, (what percentage of activity was consumed by hard parsing vs. SQL or PL/SQL execution….)

Just like an ADDM report, the top SQL statements, including SQL_ID and percentage of activity is reported:

Sampled #
SQL ID             Planhash        of Executions     % Activity
----------------------- -------------------- -------------------- --------------
Event                          % Event Top Row Source                    % RwSrc
------------------------------ ------- --------------------------------- -------
7mqykaw68fkhb           2691428406                    6           4.19
CPU + Wait for CPU                4.19 TABLE ACCESS - FULL                  2.79


The report will go through this for top row source, literals, actions, etc.  It will then jump into the top procedures that were called:

PLSQL Entry Subprogram                                            % Activity
----------------------------------------------------------------- ----------
PLSQL Current Subprogram                                           % Current
----------------------------------------------------------------- ----------
SYSMAN.EM_TASK.WORKER                                                   8.84
SQL                                                                  6.05

SYS.DBMS_ASSERT.SIMPLE_SQL_NAME                                      1.40
SYSMAN.EM_PING.RECORD_BATCH_HEARTBEAT                                   3.26
SQL                                                                  1.86


This can help you quickly find what procedures to search for the SQL shown in the sections before.

Blocking sessions, top sessions and top latches are just some of the areas, just as in a standard AWR report that are covered, but this is all for the EM repository database!

This section of the report is finished up with information about the AWR retention so if you need to go back and investigate further, you know up front what you have to work with in the AWR, (Automatic Workload Repository.)

--- AWR retention

---------- ----------
60      11520


No, the report is not even close to being finished for the Advisor log at this point.  It jumps promptly into a workload report to give you a good view of what kind of processing this repository database is handling daily.


Just as with a standard AWR report, the basic information about the repository for the previous hour is included with the Advisor log.  Keep this in mind as you review this report.  If this is not a heavy use time for your EM repository database, the data may not be as valid as if we were looking at a larger timeframe.

If you are more comfortable with statspack or AWR reports, then you will want to spend more time here than the ADDM section, but both provide their own view on the data provided.

The IO Statistics by Function Summary

Reads:   Reqs   Data    Writes:  Reqs   Data    Waits:    Avg
Function Name   Data    per sec per sec Data    per sec per sec Count    Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re    818M     2.1 .227007      0M     0.0      0M    7628     0.1
Others              62M     1.1 .017205     44M     0.8 .012210    6643     0.8
DBWR                 0M     0.0      0M     94M     1.7 .026086    5938     0.5
LGWR                 1M     0.0 .000277     40M     2.2 .011100   15.3K     0.5
Direct Reads         2M     0.0 .000555      0M     0.0      0M      17     0.0
Direct Writes        0M     0.0      0M      1M     0.0 .000277      11     0.7
TOTAL:             883M     3.2 .245045    179M     4.6 .049675   35.5K     0.5

Direct Reads
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Reads (Data File)
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Writes
0M     0.0      0M      1M     0.0 .000277       0     N/A
Direct Writes (Data File)
0M     0.0      0M      1M     0.0 .000277       0     N/A
883M     3.2 .245045    178M     4.6 .049397   11.6K     0.1

Enqueue Activity

Enqueue Type (Request Reason)
Requests    Succ Gets Failed Gets       Waits  Wt Time (s) Av Wt Time(ms)
------------ ------------ ----------- ----------- ------------ --------------
JS-Job Scheduler (queue lock)
25,724       25,724           0          59            0           1.19
KO-Multiple Object Checkpoint (fast object checkpoint)
30           30           0           3            0           6.67
SQ-Sequence Cache
75           75           0           3            0           3.33

Also spend some time inspecting the Latch and Mutex sections.  These are areas of memory that many Oracle databases are experiencing challenges with from time to time.  These allocations in memory are good to understand and even the EM repository database experiences issues with both when concurrency is commonly involved.

I’ve been a long time proponent of the Segment reporting section in the AWR.  This report can often seal up remaining questions that DBAs have after seeing the top SQL by elapsed time that were answered upon viewing the execution plan.

Segments by Physical Reads
-> Total Physical Reads:         105,015
-> Captured Segments account for   98.1% of Total

Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
SYS        SYSAUX     SCHEDULER$_JOB_RUN_D            TABLE       59,463   56.62
SYS        SYSAUX     SCHEDULER$_EVENT_LOG            TABLE       43,048   40.99
SYS        SYSTEM     OBJ$                            TABLE          266     .25
SYS        SYSAUX     I_WRI$_OPTSTAT_H_OBJ            INDEX          107     .10
SYSMAN     MGMT_TABLE EM_EV_METRIC_ALERT   P201404    TABLE           47     .04


There will be sections for logical, physical and unoptimized reads, along with row locks, table scans, writes and even ITL waits.  The report continues with library cache and all variations of memory information.  This is an area that I feel more than an hour could be beneficial for the REPVFY utility output and a future enhancement might be to follow with an argument for time frame of snapshots to be utilized for the report.

SGA Resizing Report

This report provides not just SGA information, but also PGA and caches.  It starts out by giving the reader a view of any specially sized buffer caches and each of the pools, (large, java, ASM, etc.)

-- SGA Dynamic Components
--   Mode: DEF:DEFerred mode, IMM:IMMediate mode

Last Oper  Last
Component              Size (M) Type/Mode  Operation Time
---------------------- -------- ---------- --------------------
D: 2K buffer cache            0 STATIC
D: 4K buffer cache            0 STATIC
D: 8K buffer cache            0 STATIC
D:16K buffer cache            0 STATIC
D:32K buffer cache            0 STATIC
D:buffer cache              448 INITIALIZI

From there, it steps into resize operations for SGA and then PGA.

-- PGA Resize Operations overview

---- -------------------- ---------- --------- -------- ----------- -----------
1 02-APR-2014 15:00:00        493       281       60        0.00        0.00
1 02-APR-2014 14:00:00        493       290       66        0.00        0.00
1 02-APR-2014 13:00:00        493       311       70        0.00        0.00
1 02-APR-2014 12:00:00        493       286       66        0.00        0.00


This part of the report is not held to the 1 hour snapshot as we experience in the ADDM and AWR section of the Advisor log.  The data for these sections went back for one week and provided a clear picture of resizing activities for the SGA and PGA.

Repository CBO Report

This section is built out of SQL added to the advisor.sql in the REPVFY utility.  Its only found in the Advisor report and is essential to understanding specific information in terms to CBO, (cost based optimizer) information to the repository database.  To think of this as schema specific is more in line with the value provided from the information.

If you remember back to when you installed the EM12c on the repository database, it has you disable the automated dbms_stats job?  This is now providing you with the internal stats job information that provides the correct statistical information to the CBO-

-------------------------------- ----------- --------------- ---------
ADP_METRIC0                      TRUE
ADP_METRIC1                      TRUE
ADP_METRIC2                      TRUE
EM_EVENTS_INCIDENT               TRUE                        10
EM_EVENT_ANNOTATIONS             TRUE                        10
EM_EVENT_CONTEXT                 TRUE
EM_EVENT_MSGS                    TRUE                        10
EM_EVENT_RAW                     TRUE                        10
EM_EVENT_SEQUENCES               TRUE                        10
EM_EVENT_SEQ_STATE                                           10
EM_METRIC_VALUES                             AUTO            10
EM_METRIC_VALUES_DAILY                       AUTO            10
EM_METRIC_VALUES_HOURLY                      AUTO            10


As the repository database scales, the EM stats feature can be scaled to support more extensive statistics information as well.  I am cautious to say this and will say it with a grain of salt, but it backs up my experience that 95% of databases are served successfully with the current automated stats job, but that there are types that require a DBA with knowledge of the schema, data and code to ensure the CBO has what it needs to perform efficiently.  The biggest challenge is that there aren’t a lot of DBAs with that knowledge or time to invest in this task, (I know, I did this in two companies and it must be a true passion.)  EM12c has removed this challenge from the DBAs plate by setting up the one-off stats collection requirement for them as part of the repository installation.

They’ve enhanced that in the REPVFY by having a utility step that checks to see what is currently in place and then the ability to review it as part of this output.

TABLE_NAME                       RUN_TYPE                         INCREMENTAL GRANULARITY     STALE_PCT
-------------------------------- -------------------------------- ----------- --------------- ---------
EM_EVENTS_INCIDENT               DAILY                                                        20
EM_EVENT_ANNOTATIONS             DAILY                                                        20
EM_EVENT_MSGS                    DAILY                                                        20
EM_EVENT_RAW                     DAILY                                                        20


Curious about when and how long the jobs that do this work run?  This is included as well-

[----- CBO statistics gathering log (last 3 days) ---------------------------]

-------------------- -------------------------------- ---------- ---------------- ------------- -------------- --------------------------------------------------------------------------------
02-APR-2014 14:24:16 PROACTIVE                                 5                3            69             24 P:1.86:EM_EVENT_ANNOTATIONS;1.13:MGMT_JOB_EXECUTION;.71:EM_EV_METRIC_ALERT;
02-APR-2014 12:24:16 PROACTIVE                                 9                5            71             24 P:2.47:EM_EVENT_RAW;1.97:EM_EVENT_SEQUENCES;1.34:MGMT_JOB_EXECUTION;.87:EM_EVENT
02-APR-2014 10:24:16 DAILY                                   123              272           457                D:6.91:EM_JOB_METRICS;5.66:MGMT_SYSTEM_PERFORMANCE_LOG;4.81:MGMT_ARU_PATCH_FIXES
02-APR-2014 08:24:16 PROACTIVE                                13                7            74             24


Table stats in the repository?  Yep, that’s included, too!

[----- Table statistics -----------------------------------------------------]

-------------------------------- ------------------------------ -------------------- ---------- ---------- ----------- --- --- ----- ---
EM_METRIC_VALUES_DAILY                                          29-MAR-2014 10:25:19      95257       2040          16 YES NO  ALL   YES
EM_METRIC_VALUES_HOURLY                                         01-APR-2014 10:26:49     681018      15820          16 YES NO  ALL   YES
MGMT_DB_FEATUREUSAGE                                            02-APR-2014 10:25:13       5713        244         149 YES NO  ALL   YES


And with all of that,  (pages and pages…) we’re done with the first of many reports.  I’ll keep to the most essential data in each, but hopefully you’ve learned something you hadn’t known before about the REPVFY utility and I’ll update with the next log report soon!

Now that we learned in Part I how to create a notification schedule for a specified user so that it will only page one individual on an oncall rotation, I’ll now show you how to use this in conjunction with rulesets to complete the process of modernizing and automating your oncall.

In my Test scenario, I’ve already made copies of the main rule sets, have deviated rules by three rules and four groups, (mission critical, production, test and development.)

  • Mission critical are all systems that I WANT to be paged for after hours and need to know immediately if there is an issue.
  • Production are important systems that I want to be notified of, but do NOT want to wake up resources at night time of, as these incidents will not cause an outage and can be addressed in the business day.
  • Development and test, for this example, will be treated the same way and will only email issues to the DBA team, as the development group has privileges to often address issues they have caused in the process of developing and testing.

With this said, now we look at our Rule Sets.  Notice I’ve disabled the default rules that come with the EM12c, but have made copies that I can use to offer the best value to the business.



Let’s click on the left box to the rule named Mission Critical Incident Rule Set and click on Actions, Edit.



Notice that the rule set is only specific to Specific Targets by group and the group is Mission Critical.  We then click on the Rules tab and we can update the current rule setting to use our new DBA_PAGE user!

Editing the Rule Set

There are two things we are going to do to the first, major rule set for metric alerts.

1.  We are going to limit the categories that we want to be notified for.  I want you to see this option so you can decide what categories that you want to page on and can update this as you want.  For our example, I am going to limit our metric rule set to the categories of Capacity and Availability.  At the end, I’ll show you how to create a second rule to email on the rest.

Click on the left of the rule set and then click on Edit up in the menu bar.



We’ll then edit the categories to limit to the two we spoke about above.  You’ll be able to see all the categories in the drop down and note in your incident notifications you receive what categories are important to you for email vs. pages.



Click on Next to continue onto the actions.

2.  We are going to update the rule to use the new DBA_PAGE user to page for the metric alerts in these two categories.




We’ve already got a pretty advanced set of actions to perform in the Action Summary.  I like to automate many of the Incident Manager tasks, but this time around, we are going to add a page option to the actions.  The firsts rule is already highlighted, so simply click on Edit in the menu bar to proceed.


I’m going to have all paged Incidents assigned automatically to the DBA_PAGE user and then I’m going to add the DBA_PAGE in the Page section in the Basic Notification section.


You can also enable Repeat notifications if you want to take this one step further, (we’ll go into this more in another blog post, but I thought I’d at least show it to you in this one… :))

If satisfied, I’d recommend clicking in the box to clear events permanently to keep your repository cleaned up and then click continue to review the changes you’ve made:



You’ll now see that the owner for the incident will be set to the DBA_PAGE user and you now page the DBA_PAGE user along with emails sent.  Keep in mind, only during the notification schedule active times will this be able to notify, so Monday-Friday, 8-5, no paging will occur as that’s how we set it up in our Part I of this blog post series.

We now need to do this for each of the rules that we want to page on from this rule set:

  • Create incident for critical metric alerts, (for only two categories…)
  • Create incident for critical Service Level Agreement Alerts
  • Incident creation Rule for Target Down availability stattus
  • Incident creation rule for high-availability events.

As promised, I think created a new rule set to cover the categories no longer covered in the original.  Click on Create in the menu bar and create the following rule:



Add the following Actions, ENSURING you do not choose to PAGE and assign the rule to SYSMAN as you would do your standard rules you did before.  This is just to cover all your other categories, but NOT PAGE on them.



Complete the review, name the rule Metric Alert for All Other Categories to Email and then click Save and then SAVE ALL YOUR CHANGES you’ve made.

You have now updated all the rule sets to use the DBA_PAGE user we set up in Part I of this series and it will only notify that user during the scheduled time it is to be active, in this case, just for after hours support and to rotate on a one week schedule between three DBA’s SMS contact information!





I came across a discussion on Oracle-l on how after hours paging was handled for many companies and was kind of surprised how many DBAs still carry around a secondary pager/cell phone or are just expected to be woke up if on call or not.  I’m not one to go back to sleep once I’m woke, so I’ve been a proponent of EM notification schedules for after hours paging.  Now there are other ways to handle this in Enterprise Manager 12c as well, but we’ll use this method, as it is backward compatible to OEM 10g, too.

The requirement of this setup is to have an inbox alerting option separate from your SMS/Texting option on your smartphone, which can be Android or iphone, even blackberry, any of these are more than acceptable to satisfy the requirement.  The next is know your SMS notification email address.

Example-  TMobiles would be <cell number> and ATT would be <cell number>

Your mobile provider can verify what your email extension is for your SMS address.  With this information in hand, you can then proceed with the setup.  The design is simple-  All email notification will continue to go to your email, no matter if warnings or critical, but we will create a secondary user that will have the SMS addresses for the DBAs oncall and set them up for a rotation to be notified of critical issues.

Sit down and first figure out how often you want to rotate your schedule, weekly, every two weeks, once per month and any vacation or time off coming up.  That should tell you what your rotation needs to be to keep things sane.

Create the Paging User in Enterprise Manager

First we need to create the user that will then be utilized for paging.  This can be done from the Setup, Security, Administrators menu in the console.


We have all our DBAs currently listed, but for this user, we will need each of their SMS addresses and once you have those, click on Create.

Name the user DBA_PAGE and put in the following values, and it can be set to an Administrator, but just for this example, I shortened the process and created it as a Super Admin, (this is not required…)


Note that I’ve entered each of the DBA’s SMS addresses into the Email Address section, separated by commas and that I’ve entered a clear description of what this login is used for.

Click on Finish and your new DBA_PAGE user is now listed.


Managing the DBA_PAGE User

Once you’ve completed the entry and then log out of the EM12c console as your current user and log back in as the DBA_PAGE user, (just easier to manage the notification schedule as this user…)



Next you need to create the notification schedule, but I’ll first show you how you can add more users easily to this Administrator:



Once you’ve accessed this page, you will note that new email addresses can be added easily:



The Add Another Row option will allow you to add another SMS address and then you can check mark just that address and test it.  You can also remove addresses from here of DBAs that have moved on from the company.

Editing the My Notification Schedule



This is again accessed from the Setup menu on the right hand of screen and once you are in the My Notification Schedule, you will see the following:


This is the default schedule, starts from the day the user was created and would notify ALL email addresses in the list.  We are going to replace this schedule with a new one that supports a three week rotation of oncall, notifying on one DBA each week, switching automatically each week to the next DBAs SMS address.

Click on Edit Schedule Definition in the upper right of the screen.


As starting a rotation in the middle of a schedule would be difficult, we are going to choose the following Monday as our beginning.  You can see the default is a one week rotation and that the time zone is set to Canadian Central Time, (I have no idea where that came from… :))



For our example, we are going for a three week rotation, (three DBAs, three week rotation… :)) We’ll start the rotation this following Monday, April 21st and we’re going to say that the DBAs are on Mountain Time.

We are then given a schedule of three weeks to fill in for who is oncall.  By default, it will want to offer us all three SMS addresses entered for the user.  We will only use the first one, we will click on Week 1 to fill in, ensure we are VIEWING Week 1 and now to fill in our schedule.



To create a schedule that DOES NOT notify during business hours, there is a batch file for each week of three steps.  There is no way around this, so I’ll use the following schedule creation and then repeat it for the 2nd and 3rd week for the two other DBAs.



Now notice that we’re still viewing and editing week 1, and choosing all the days, but I’ve changed the time to show 12am-8am to fill in with the first DBA’s SMS address.

Finally, we fill in for the last time slot, Saturday and Sunday during the weekend daytime.



Your scheduled should now look something like this, with all sections above and below business hours filled in with the 1st DBA’s SMS Address for notifications:



We then will click on the down arrow button on the View menu and choose the second week, then do the same for the check mark in the box for Weeks, ensuring we are changing the schedule for Week 2.  We’ll repeat the scheduling, but will also change the Email Address to the second DBA’s SMS Address by clicking on the search.


Once you have filled in everything for the Second DBA, you will switch to View Week 3 and edit Week 3 for the third DBA and fill in that schedule:



Once completed, Save the changes.

The default view may alarm you at first, as it will be empty:



The reason is the first date that is shown, which is the current date and if you remember from our example, we decided to start our rotation on April 21st.  Go to the View Week Containing and change it to April 21st.

You will now see the rotation schedule for the first DBA for after hours.



Updating the date for the next week will display the 2nd DBA’s rotation:



And then if you switch the date to first week of May, we see the third DBA:



If you go past that, you’ll see that the rotation continues, starting again with the first DBA for the next week and will continue on without change unless you edit or replace the schedule.

If for some reason, let’s say a DBA can’t take a shift that is in their weekly schedule for one night, you can go into that DAY’s shift and edit it to one of the other two DBAs in the list.

Now you’ve finished setting up an after hours pager in a notification schedule.  In Part II, I will show you how to use the notification schedule/user with rule sets to page on critical notifications.





EM12c Auditing

Lately I’ve been having more discussions on securing the EM12c environment.  All of IT has a tendency to treat the Enterprise Manager as a afterthought in both hardware allocation, as well as security best practices.  No one is sure of exactly why this is-  they all have their theories, but we do know it happens often.

Today we are going to go over some of the auditing options within EM12c.  Basic auditing is turned on by default in the environment, but only covers basics processes.  There are over 150 auditing options and extensive information can be collected, retained within the repository, as well as turned into an externalized service to reside as log files on the OS file system.  These options include login/logout information, updates, OMS password changes and EM key copy and removals from the repository.

Basic auditing information can be gained through the console via the Setup, Security, Auditing Data menu option, but the auditing configuration, additional features, updates and externalized service setup, must be performed through the Enterprise Manager command line interface, (EM CLI).

If you haven’t used the EM CLI before, please refer to my blog post on Beginning with the Command Line Interface, otherwise log in a user with appropriate rights to run the EM CLI and connect to the repository.

First, let’s inspect the current operations list and what will impact the infrastructure if executed:


Note that the last option, APPLY_UPDATE, is to update the repository and yes, it will impact the infrastructure by doing so.

Next, let’s look at the current settings. As I stated earlier, auditing is turned on by default, but the next options are disabled for the externalized service, so it is marked as disabled.


The defaults for the externalized service, outside of the directory, (configured in the DBA_DIRECTORIES and read/write privileges granted to SYSMAN) are pre-configured with default information.

  • File prefix is the prefix used for all audit log files so that they are easily identified in the directory.
  • File size is default to 50M
  • Retention is default to 365 days.  Keep this in mind before enabling, as this could be impacting to disk space if you OS directory has limited space.

Notice that there is also a note informing you that Infrastructure Audit is always on, (go inspect the access.log and you will see information that can be sync’d up with the emctl.log and others to create a solid picture that this feature can create for you.)

Enabling/Disabling Features

To enable or disable audit features, the following syntax is used:

>emcli update_audit_settings -audit_switch="ENABLE/DISABLE" -
operations_to_enable="<insert operation name here or just say ALL>" -
operations_to_disable="<insert operation name here or just say ALL>" 

To demonstrate this, we’ll enable auditing for logins and logouts:


The response letting us know if the change was successful in the auditing configuration completes the task and we can move on to other tasks.

Next, we’ll configure the externalized service for auditing.  This is an excellent choice and should be considered for all EM12c environments.  Even with high availability options, the idea of keeping a minimum of 7-31 days of auditing information regarding the EM12c environment, especially considering the access and power of the EM12c, is a good idea.

The syntax for the configuration for the externalized auditing service is:

>emcli update_audit_settings -file_prefix=<file_prefix> -
directory_name=<directory_name> -file_size = <file size> -data_retention_period=<period in days> 

And in our example, we will update the service to file sizes of 25M each, with a prefix of “em12c_audit” and retain 31 days of audit files that our OS file system can easily handle.

>emcli update_audit_settings -externalization_switch=ENABLE -file_prefix=em12c_audit -directory=AUD_DMP -file_size=25000000 -data_retention_period=31

After executing this statement, the audit files will automatically start generating to the directory, (make sure you HAVE created a DBA Directory to hold this data first!) and we can then view logs as needed to inspect what activity is occurring in the EM12c environment.

This is a solid best practice to ensure you are offering one more line of protection to the database and software that is essential to you, your business and your environment.


I know that folks have been having some challenges downloading my slides from Collaborate for a couple of my sessions and I know I’ve received errors when updating two of them the other day, so I’ve added them to my slideshare location for your convenience.

Thank you for everyone who attended my sessions-  such great turn outs and fantastic attendee participation!  I had some real EM12c stars in the audience and appreciated the support and banter!

My slideshare list for download can be accessed from here.

Thank you to IOUG, Oracle and everyone for their support!

This is the second post on ASH Analytics Detail.   You can find the first post here.

This post we’re going to work into more Activity data within ASH Analytics, but I’m focusing on Resource Consumption, which you can find once you are into Ash Analtyics, (you may want to stop refreshes, which can be done at the upper right in the refresh settings) under the Activity button.


We’ll be working with the last three options in this menu marked off in red, as the Wait Class is just a similar view of what we already get in the main Activity graph.

Wait Class

I’m sure you’ve already noticed, but ASH Analytics has a tendency to always arrange any of the graphs by the heaviest usage at the bottom to the lightest at the top.  This makes it easier to view for the user.   This will be the case for pretty much all the activity graphs, where Load Maps will arrange largest to smallest, left to right.


Nope, not much new here-  We are still seeing the resource consumption, but as wait events.  If we compare it to the standard wait events view, not much different.  The reason we would really find value in this is if we had IORM, (IO Resource Manager) enabled.  Since we don’t, we’re going to skip it for now and it will get to be part of that blog post in the future… :)

Wait Event

when we filter by Wait Event, we get a very different view of our data than we did by Wait class.  Note the actual categories that are graph is broken up by, listed down on the right hand side.


Scanning from the bottom –> up on the right hand side, we can see the largest resource consumption is “LNS Wait on SENDREQ” and the second highest consumer is “LGWR- LNS Wait on Channel”.  You are also seeing log file sync and parallel log file write.  All of this comes to a clear understanding of what we are dealing with here… Why?  This is a dataguard environment and the LNS* type waits are quite common and occur when-

  • LGWR writes redo to the online redo log on this primary database (when LGWR SYNC is not used, user commits are acknowledged once this step completes except when the parameter COMMIT NOWAIT is used.
  • The Data Guard LNS process on this primary database performs a network send to the Data Guard RFS process on the standby database. For redo write sizes larger than a MB, LNS will issue multiple network sends to the RFS process on the standby.
  • LNS posts LGWR that the all the redo has been successfully received and written to disk by the standby, which is a heavy consumer, as well.

On the primary database here is extended resource consumption represented with the “log file parallel write” wait event. We will then note repeated “LNS wait on SENDREQ” wait event, too. You can further divide the “LNS wait on SENDREQ” wait event into network time and RFS I/O time by subtracting the “RFS write” wait event obtained on the standby. These wait events can be assessed on the standby by using multiple queries in the “Data Guard Specific Wait Events” section for a physical standby or by using AWR for a logical standby if you need further proof.


I’ve hidden the exact object names involved in this database, but you’ll note, they weren’t involved in much of the resource consumption anyway.  Now on an environment with heavy IO, this would change significantly and you would see a lot more of one or more objects being the focus of this type of graph.  This is where having the Top Activity box check marked is helpful, as it clearly shows you that object IO is not much to focus on for this environment.  The black mark showing the upper max of activity for any given time in the activity graph gives you a clear line to compare with.


Blocking Session

This is where many of you will become very interested again.  We all want to know about blocking sessions and how they may be impacting the resource consumption in our environment.  Although there was no *true* impact to production processing, there was some blocking sessions that we could inspect in our current example.

Only sessions that are experiencing some type of bl0cking are displayed in this view.  No matter if it’s transaction, (TX- Tran Lock contention or TM- DML enqueue contention , etc.) or UL, (user defined)  these sessions will be shown in this graph.


I’ve included the bottom portion of the ASH Analytics section so you’re able to see the SQL_ID’s involved in the sessions and then to the right, you can also view the session information, displayed by Instance ID, (for RAC) session ID, (sid) and serial number, (serial#).

You can then click on a session in the lower section, which will filter the ASH graph by the session and the wait class to give you another view, demonstrating in our example that we are waiting on a commit(s) and then our favorite wait, “other”…. :)


What’s helpful, is that the SQL_ID is clearly shown on the left and  you’re able to click on any of these, (although I’d highly recommend clicking on the top one that is 5 times the concern vs. the others. :))

This completes the review of the Resource Consumption Menu for the ASH Analytics Activity Page.  We’ll continue on in Part III later this week!

I have a request to hit on some EM performance blog posts, so we’re going to start with breaking down some of the ASH Analytics areas.  ASH Analytics is not your grandfather’s “Top Activity” and I recommend everyone begin to embrace it as it is the the future of performance activity in the Enterprise Manager.  The idea that we will be able to pull directly from ASH and AWR to present our performance data via the EM12c is exciting, to say the least.  The added accuracy and value of the aggregated historical data must be recognized as well.

The standard output for ASH Analytics looks quite similar to Top Activity in the way of focusing on activity graphed out by wait events, but I’m going to dig into a different ways to present the activity data, as it may answer questions that simply won’t show via wait event graphing.

Once you first enter the ASH Analytics interface, you’ll be presented with the section at the top, which will display the timeline for your examination and then below, the actual wait event graph as seen here in our example:


You’ll note that we haven’t used any filters, so we’re viewing all activity data and the standard wait classes we are accustomed to viewing as shown, i.e. I/O, (dark blue) CPU, (kelly green) and System, (light blue).  We can see the system is rather busy during multiple intervals, bypassing the CPU cores line, displayed in red across the image.

Now let’s display the data in a different format.  This is done by changing the Wait Class, shown under the Activity button, to another filtering type.


The section highlighted in red is where we will be spending our time today.  I’ll display the same section of time we see in the standard wait event displayed in our standard wait event view, but focus on the SQL defined displays and we’ll go over how they might assist you in troubleshooting an issue.


Now this one might seem a bit self-explanatory, but displaying the data from the activity pane appears very different, then, let’s say from the load map:


Now let’s look at it from the Activity View defined by SQL_ID:


I find the load map an excellent way to get my point across to non-technical folks, but I’m a DBA-  I’m used to spending my time looking at the data in terms of activity, so this is a more comfortable view for me to move from seeing the data in terms of wait events and transitioning that view to seeing the percentage of activity allocated to each SQL_ID.

We can click on the SQL_ID displayed on the right to go to it’s detailed ASH page, which will show you all data pertaining to that SQL_ID and wait information involved with it.  By clicking on different sections, we’re not just digging down into more detailed information, keep in mind, we are also “filtering” out more data that could have been masking valuable information.


You may also think that this one is not going to look any different than the last, that’s where you’re wrong.  This is not the SQL_ID that has resources allocated to it in activity, but the top level SQL_ID that is executing the SQL_ID active.  This is helpful if you are trying to locate what packages and procedures should be first on the list for code review or if you want to quickly locate the package or procedure responsible for a specific SQL_ID.


These can then be identified to sessions by clicking on links and then traced to users, hosts, etc.  If I click on one of the top SQL_IDs, it will take me to all SQL_ID’s involved in that Top SQL_ID and all the wait events, displayed in the same graph timeline.  From there, I can then dig down into the data pertaining to the waits, the SQL_ID’s involved as part of the Top SQL_ID or even switch to other views, such as a load map to present the data in another format for peers to view more easily.

Force Matching Signature

Force matching signature is a way of ensuring that SQL is using the same plan/profile, etc. even when it has literals in it.  It’s kind of like setting cursor_sharing=FORCE, but in a “hinted” way throughout a database.  This can be both good and bad, as let’s say that it forces the value to “1″, where the value “1″ really only makes up 2% of the rows and it would have been better if it knew what it was working with.


SQL Plan Hash Value

You should be seeing this next image [hopefully] as a mirror of the SQL_ID one.  We do like plan stability and the idea that we have a ton of hash plan values changing erratically is enough to make most of us feel queasy.  Having a quick visual that we’re not experiencing a lot of plan changes can be quite helpful.  It’s not foolproof, again, this is a visual representation of the data, but it’s helpful.


SQL Plan Operation

If you’re a DBA managing a data warehouse, you know when any DML is running and on what schedule.  The SQL Plan Operation view can give you a quick verification if something is amiss and there are updates, deletes or inserts happening that shouldn’t be.  You can see the percentage of activity that may also quickly tell you a change has occurred vs. the normal database activity.


You can also see just how much of your activity is going to certain type of processing.

SQL Plan Operation Line

The plan operation line, you can see the operation type for the process, along with the description.  the process is then broke down by both the SQL_Id and the step the statement is performing.


If you hover over any SQL_ID listed, it will also show this to you in the highlighted area:


SQL OpCode

This one is self-explanatory.  We are simply looking at the activity level per statement type.  It is easy to see that during our busy intervals, queries were around a 1/3 or the activity, as were inserts.  This view can be helpful if you are retaining screenshots at different snapshot intervals for comparison.


Top Level SQL Opcode

You can also view the data simply by Operation code, displaying by each code.  Notice that these are displayed by the Instance number, (if on RAC), Session ID, (sid) and Serial number, (serial#).  This is a global view of the operations that are occurring in the environment and can offer a clear view of activity, but at an operation type level.



We’ve just touched on what the SQL section of the ASH Analytics Activity view can offer.  Stay Tuned and we’ll dig into each section and each area of this great interface for performance monitoring and analysis.





I’m in the midst of transferring over to my travel laptop to run all my VMs on and retiring my previous ASUS work laptop to my youngest son.  I was surprised to find out that not ALL laptops are set up to run virtual environments these days.

1.  Virtualization may not be enabled in the BIOS, (i.e. On-Boot UEFI in the Lenovo Yoga 11s.)


Once this is enabled, save the configuration and reboot, allowing you to now run a VM on your laptop.

If you are importing appliances, make sure you have set the location to import the disks to the appropriate location, especially if you are using an external disk to house your VMs.

Before you click, “Import”, look at the very bottom where the location of the virtual disk will reside and verify that is where you want them to live.  The default is commonly the data location for the PC user on Windows or the /home/user for Linux users.



Now onto the VM

2.  All VM images are not the same.  If you are using an image file or have created the image from scratch, be prepared to do some preparation to get the image ready before you can successfully install Oracle on it and/or any other product such as Enterprise Manager, etc.

The nicest part is that Oracle will let you know what is wrong in its prereq checks during the installation of Oracle.  It’s going to let you know if your OS and virtual host has what it takes to support the installation-



Most of these are pretty straight forward for anyone who’s installed Oracle for awhile, maybe not for those that are newer to the product.

1.  Most VM images have a small /tmp and/or it can be filled quickly by installation.

  • clear out files from /tmp after each installation or in this case, failed install.
  • create a soft link to point to a second temp directory that has more space and make it the default temporary directory.

2.  Increase is the swapfile space.  This can be created in a location you have space to reserve and read/write swap to.  On  our VM, we’re kind of limited at the moment, so we’re just going to create a swapfile off of /u01 and give it 2GB of space:

dd if=/dev/zero of=/u01/swapfile bs=1024 count=2097152
mkswap /u01/swapfile
swapon /u01/swapfile
swapon -a

3.  Hard limit/soft limit-  These can be fixed by the following inside your VM, most of the steps must be performed as ROOT, but verify if it’s looking for ROOT to perform or a check as the OS user-

Next, we’re alerted that the max open file descriptors is an issue, so let’s look at those for the system, the user and then increase them:

cat /proc/sys/fs/file-max
Do as the user!!
ulimit -Hn
ulimit -Sn

We’ve now verified that they are smaller than the requested amount, (in our installation, it requested 65536) so we’ll add this and then verify the change in the file-

sysctl -w fs.file-max=65536
vi /etc/sysctl.conf
We can also simply append to file, (as ROOT):
fs.file-max = 100000

Now save and verify with the following:

sysctl -p

Log out and log back in to set the new values.

Next post we’ll take on all those darn lib files… :)