Category: Oracle


There are a number of Verification Utilities for Enterprise Manager 12c, (EM12c) and I’ve written about them before, but today I’m going to start on the Repository Verification Utility, (REPVFY).  This will be an ongoing series, as there are so many valuable features rolled into the utility and new ones that will be added as new patches and releases happen.

For this post, I’m going to be reviewing a level 2 diagnostics report from a new repository database.  This is going to offer us all kinds of valuable data about the environment and should be considered by any EM12c administrator when doing a health check of their EM environment.

Inspecting the files from the output, you’ll note that they don’t require a lot of space, even when unzipped and this allows the admin to retain historical information about the EM12c environment.

Note that the files are broken up by type of review and marked with the timestamp for many of these files.

em_blog_post_0418

For this first blog post, I’m going to jump right into the first log alphabetically, the advisors log file, which as you might guess, is advising on performance.  The interesting thing is as small as this file is, it is actually comprised of a number of reports that one would be surprised how much work to run all of it manually.

The Advisors Log is comprised of the following reports:

  • ADDM Report
    • Space Advisor Report
    • ASH Report
  • AWR Report
  • SGA Resizing Report

The report then ends with the CBO Repository information before closing up this one of many log files generated by a simple and single execution from the command line.  I would definitely read through to this final section to understand the stats collection in the EM Repository database a bit more clearly, as I do touch on it in this post.

Advisor Log Header

When viewing the log file, the first thing it tells you is where the source SQL for the report comes from.  This is helpful if you want to inspect the SQL used.

The next is information about the release versions and such involved.  This is standard for most of the log files in the repvfy output:

 

COMPONENT          INFO
------------------ ----------------------------------------
EMDIAG Version     2014.0318
Repository Version 12.1.0.3.0
Database Version   11.2.0.3.0
Test Version       2014.0402
Repository Type    CENTRAL

The next sections of the report will go over the repository database and report on the type of database, (single instance or RAC, protected, version, etc.) and other pertinent information.  This is multiple query output I would like to see rewritten and compiled into one result set, but it’s still good to have this data, no matter what the output.

We then jump into SGA information and init/sp parameters.  What surprised me is that next section was dedicated to System statistics and the OS stats data.  This data is often only updated when a change to hardware has been made or if parallel is implemented and it’s set up in an upper level of the report indicating that the author of the report found this information valuable to the performance of the repository database, (not agreeing or disagreeing here, found it more curious than anything as I find it can be important in large databases, which the EM repository rarely is…)

--- Database OS statistics
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    03-29-2014 02:00
SYSSTATS_INFO                  DSTOP                                     03-29-2014 02:00
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                           1405
SYSSTATS_MAIN                  IOSEEKTIM                              18
SYSSTATS_MAIN                  IOTFRSPEED                           4096

 

ADDM Report

Next the utility runs an ADDM report for the previous hour’s snapshot.  Now ADDM reports rarely receive a lot of love from DBAs.  It’s not the top info it includes or the findings, it’s the recommendations that commonly leave them less than thrilled.  How often do we read one that doesn’t recommend some type of hardware solution to a problem that we know with hardware applied, will only rear its ugly head again a few months to a year later?  If there was one recommendation I could make to this valuable report is to take those recommendations out and I think you’d see a lot more DBAs embracing ADDM reports.

The report for the repository database is true to form, only this time it’s Oracle’s objects being offered hardware solutions!

ECMAM_PK                                  0          0          0          0
Move object SYSMAN.ECMAM_PK to flash storage, estimated I/O wait reduction is 10 millisec

EM_CS_SCORE_HIST                          0          0          0          0
Move object SYSMAN.EM_CS_SCORE_HIST partition 2014-03-25 00:00 to flash storage, estimated I/O wait reduction is 125 millisec.

EM_EVENT_MSGS_PK                          0          0          0          0
Move object SYSMAN.EM_EVENT_MSGS_PK partition OPEN to flash storage, estimated I/O wait reduction is 5 millisec.

EM_METRIC_KEY_GUID_IDX                    0          0          0          0
Move object SYSMAN.EM_METRIC_KEY_GUID_IDX to flash storage, estimated I/O wait reduction is 336 millisec

Now who is to say that a savings of 10-336ms is worth the cost of flash storage?  I’m not, but this is just more proof to me that the recommendations section is still not where most DBAs would like to see the ADDM report.

Further sections of the report do live up to the hype and offer similar value to what we see in AWR for the EM repository database:

Top Background Events            DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Avg Active
Event                               Event Class     % Activity   Sessions
----------------------------------- --------------- ---------- ----------
os thread startup                   Concurrency          47.91       0.03
CPU + Wait for CPU                  CPU                   9.77       0.01

 

To clearly show that it is Concurrency that is causing the OS Thread Startup % of activity is important.  This helps keep the administrator on the right track if they were to investigate the wait event.

The following section goes through the code and action involved in the last hour’s ADDM activity.  You can see if there is anything causing an issue, but as we can see in our repository, not much of anything-  it’s nice an quiet…. J

Top Service/Module               DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Service        Module                   % Activity Action               % Action
-------------- ------------------------ ---------- ------------------ ----------
SYS$USERS      DBMS_SCHEDULER                 8.84 EM_REPOS_SEV_EVAL        2.79
EM_SYSTEM_MEMBER_S       1.40
EM_TASK_WORKER_3         1.40
MGMT_COLLECTION.COLLECTI       8.37 Task:881 Repositor       4.19
Task:37 Performanc       1.40
OEM.PbsSystemPool              6.51 PingHeartBeatWork:       2.33
LoaderWorker             1.40
GATHER_SYSMAN_STATS            2.33 UNNAMED                  0.93

 

Next it reports on sessions and then on percentage of activity was allocated to different SQL types and the percentage allocated to different phases of the SQL, (what percentage of activity was consumed by hard parsing vs. SQL or PL/SQL execution….)

Just like an ADDM report, the top SQL statements, including SQL_ID and percentage of activity is reported:

Sampled #
SQL ID             Planhash        of Executions     % Activity
----------------------- -------------------- -------------------- --------------
Event                          % Event Top Row Source                    % RwSrc
------------------------------ ------- --------------------------------- -------
7mqykaw68fkhb           2691428406                    6           4.19
CPU + Wait for CPU                4.19 TABLE ACCESS - FULL                  2.79
SELECT COUNT(*) FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE OWNER = MGMT_USER.GET_R
EPOSITORY_OWNER AND JOB_NAME = UPPER(:B3 ) AND STATUS = 'SUCCEEDED' AND ACTUAL_S
TART_DATE >= :B2 AND ACTUAL_START_DATE < :B1

 

The report will go through this for top row source, literals, actions, etc.  It will then jump into the top procedures that were called:

PLSQL Entry Subprogram                                            % Activity
----------------------------------------------------------------- ----------
PLSQL Current Subprogram                                           % Current
----------------------------------------------------------------- ----------
SYSMAN.EM_TASK.WORKER                                                   8.84
SQL                                                                  6.05

SYS.DBMS_ASSERT.SIMPLE_SQL_NAME                                      1.40
SYSMAN.EM_PING.RECORD_BATCH_HEARTBEAT                                   3.26
SQL                                                                  1.86

 

This can help you quickly find what procedures to search for the SQL shown in the sections before.

Blocking sessions, top sessions and top latches are just some of the areas, just as in a standard AWR report that are covered, but this is all for the EM repository database!

This section of the report is finished up with information about the AWR retention so if you need to go back and investigate further, you know up front what you have to work with in the AWR, (Automatic Workload Repository.)

--- AWR retention

INTERVAL  RETENTION
---------- ----------
60      11520

 

No, the report is not even close to being finished for the Advisor log at this point.  It jumps promptly into a workload report to give you a good view of what kind of processing this repository database is handling daily.

FULL AWR Report

Just as with a standard AWR report, the basic information about the repository for the previous hour is included with the Advisor log.  Keep this in mind as you review this report.  If this is not a heavy use time for your EM repository database, the data may not be as valid as if we were looking at a larger timeframe.

If you are more comfortable with statspack or AWR reports, then you will want to spend more time here than the ADDM section, but both provide their own view on the data provided.

The IO Statistics by Function Summary

Reads:   Reqs   Data    Writes:  Reqs   Data    Waits:    Avg
Function Name   Data    per sec per sec Data    per sec per sec Count    Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re    818M     2.1 .227007      0M     0.0      0M    7628     0.1
Others              62M     1.1 .017205     44M     0.8 .012210    6643     0.8
DBWR                 0M     0.0      0M     94M     1.7 .026086    5938     0.5
LGWR                 1M     0.0 .000277     40M     2.2 .011100   15.3K     0.5
Direct Reads         2M     0.0 .000555      0M     0.0      0M      17     0.0
Direct Writes        0M     0.0      0M      1M     0.0 .000277      11     0.7
TOTAL:             883M     3.2 .245045    179M     4.6 .049675   35.5K     0.5
------------------------------------------------------------

Direct Reads
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Reads (Data File)
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Writes
0M     0.0      0M      1M     0.0 .000277       0     N/A
Direct Writes (Data File)
0M     0.0      0M      1M     0.0 .000277       0     N/A
TOTAL:
883M     3.2 .245045    178M     4.6 .049397   11.6K     0.1
------------------------------------------------------------

Enqueue Activity

Enqueue Type (Request Reason)
------------------------------------------------------------------------------
Requests    Succ Gets Failed Gets       Waits  Wt Time (s) Av Wt Time(ms)
------------ ------------ ----------- ----------- ------------ --------------
JS-Job Scheduler (queue lock)
25,724       25,724           0          59            0           1.19
KO-Multiple Object Checkpoint (fast object checkpoint)
30           30           0           3            0           6.67
SQ-Sequence Cache
75           75           0           3            0           3.33
------------------------------------------------------------

Also spend some time inspecting the Latch and Mutex sections.  These are areas of memory that many Oracle databases are experiencing challenges with from time to time.  These allocations in memory are good to understand and even the EM repository database experiences issues with both when concurrency is commonly involved.

I’ve been a long time proponent of the Segment reporting section in the AWR.  This report can often seal up remaining questions that DBAs have after seeing the top SQL by elapsed time that were answered upon viewing the execution plan.

Segments by Physical Reads
-> Total Physical Reads:         105,015
-> Captured Segments account for   98.1% of Total

Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
SYS        SYSAUX     SCHEDULER$_JOB_RUN_D            TABLE       59,463   56.62
SYS        SYSAUX     SCHEDULER$_EVENT_LOG            TABLE       43,048   40.99
SYS        SYSTEM     OBJ$                            TABLE          266     .25
SYS        SYSAUX     I_WRI$_OPTSTAT_H_OBJ            INDEX          107     .10
SYSMAN     MGMT_TABLE EM_EV_METRIC_ALERT   P201404    TABLE           47     .04

 

There will be sections for logical, physical and unoptimized reads, along with row locks, table scans, writes and even ITL waits.  The report continues with library cache and all variations of memory information.  This is an area that I feel more than an hour could be beneficial for the REPVFY utility output and a future enhancement might be to follow with an argument for time frame of snapshots to be utilized for the report.

SGA Resizing Report

This report provides not just SGA information, but also PGA and caches.  It starts out by giving the reader a view of any specially sized buffer caches and each of the pools, (large, java, ASM, etc.)

-- SGA Dynamic Components
--  Cache: D:DEFAULT, K:KEEP, R:RECYCLE
--   Mode: DEF:DEFerred mode, IMM:IMMediate mode

Last Oper  Last
Component              Size (M) Type/Mode  Operation Time
---------------------- -------- ---------- --------------------
D: 2K buffer cache            0 STATIC
D: 4K buffer cache            0 STATIC
D: 8K buffer cache            0 STATIC
D:16K buffer cache            0 STATIC
D:32K buffer cache            0 STATIC
D:buffer cache              448 INITIALIZI

From there, it steps into resize operations for SGA and then PGA.

-- PGA Resize Operations overview

I# SNAP_TIME            PGA_TARGET PGA_ALLOC    PROCS        AUTO      MANUAL
---- -------------------- ---------- --------- -------- ----------- -----------
1 02-APR-2014 15:00:00        493       281       60        0.00        0.00
1 02-APR-2014 14:00:00        493       290       66        0.00        0.00
1 02-APR-2014 13:00:00        493       311       70        0.00        0.00
1 02-APR-2014 12:00:00        493       286       66        0.00        0.00

 

This part of the report is not held to the 1 hour snapshot as we experience in the ADDM and AWR section of the Advisor log.  The data for these sections went back for one week and provided a clear picture of resizing activities for the SGA and PGA.

Repository CBO Report

This section is built out of SQL added to the advisor.sql in the REPVFY utility.  Its only found in the Advisor report and is essential to understanding specific information in terms to CBO, (cost based optimizer) information to the repository database.  To think of this as schema specific is more in line with the value provided from the information.

If you remember back to when you installed the EM12c on the repository database, it has you disable the automated dbms_stats job?  This is now providing you with the internal stats job information that provides the correct statistical information to the CBO-

TABLE_NAME                       INCREMENTAL GRANULARITY     STALE_PCT
-------------------------------- ----------- --------------- ---------
ADP_METRIC0                      TRUE
ADP_METRIC1                      TRUE
ADP_METRIC2                      TRUE
EMCT_CBA_CHARGE_1DAY             TRUE
EMCT_CBA_CHARGE_1HOUR            TRUE
EM_EVENTS_INCIDENT               TRUE                        10
EM_EVENT_ANNOTATIONS             TRUE                        10
EM_EVENT_CONTEXT                 TRUE
EM_EVENT_MSGS                    TRUE                        10
EM_EVENT_PROCESSING_STATE        TRUE
EM_EVENT_RAW                     TRUE                        10
EM_EVENT_SEQUENCES               TRUE                        10
EM_EVENT_SEQ_STATE                                           10
EM_METRIC_VALUES                             AUTO            10
EM_METRIC_VALUES_DAILY                       AUTO            10
EM_METRIC_VALUES_HOURLY                      AUTO            10

 

As the repository database scales, the EM stats feature can be scaled to support more extensive statistics information as well.  I am cautious to say this and will say it with a grain of salt, but it backs up my experience that 95% of databases are served successfully with the current automated stats job, but that there are types that require a DBA with knowledge of the schema, data and code to ensure the CBO has what it needs to perform efficiently.  The biggest challenge is that there aren’t a lot of DBAs with that knowledge or time to invest in this task, (I know, I did this in two companies and it must be a true passion.)  EM12c has removed this challenge from the DBAs plate by setting up the one-off stats collection requirement for them as part of the repository installation.

They’ve enhanced that in the REPVFY by having a utility step that checks to see what is currently in place and then the ability to review it as part of this output.

TABLE_NAME                       RUN_TYPE                         INCREMENTAL GRANULARITY     STALE_PCT
-------------------------------- -------------------------------- ----------- --------------- ---------
EM_EVENTS_INCIDENT               DAILY                                                        20
EM_EVENT_ANNOTATIONS             DAILY                                                        20
EM_EVENT_MSGS                    DAILY                                                        20
EM_EVENT_RAW                     DAILY                                                        20

 

Curious about when and how long the jobs that do this work run?  This is included as well-

[----- CBO statistics gathering log (last 3 days) ---------------------------]

LOG_DATE             RUN_TYPE                           DURATION OBJECTS_ANALYZED OBJECTS_STALE OBJECTS_FILTER VALUE
-------------------- -------------------------------- ---------- ---------------- ------------- -------------- --------------------------------------------------------------------------------
02-APR-2014 14:24:16 PROACTIVE                                 5                3            69             24 P:1.86:EM_EVENT_ANNOTATIONS;1.13:MGMT_JOB_EXECUTION;.71:EM_EV_METRIC_ALERT;
02-APR-2014 12:24:16 PROACTIVE                                 9                5            71             24 P:2.47:EM_EVENT_RAW;1.97:EM_EVENT_SEQUENCES;1.34:MGMT_JOB_EXECUTION;.87:EM_EVENT
02-APR-2014 10:24:16 DAILY                                   123              272           457                D:6.91:EM_JOB_METRICS;5.66:MGMT_SYSTEM_PERFORMANCE_LOG;4.81:MGMT_ARU_PATCH_FIXES
02-APR-2014 08:24:16 PROACTIVE                                13                7            74             24

 

Table stats in the repository?  Yep, that’s included, too!

[----- Table statistics -----------------------------------------------------]

TABLE_NAME                       PARTITION_NAME                 LAST_ANALYZED          NUM_ROWS     BLOCKS AVG_ROW_LEN GLO USE STATT STA
-------------------------------- ------------------------------ -------------------- ---------- ---------- ----------- --- --- ----- ---
EM_METRIC_VALUES_DAILY                                          29-MAR-2014 10:25:19      95257       2040          16 YES NO  ALL   YES
EM_METRIC_VALUES_HOURLY                                         01-APR-2014 10:26:49     681018      15820          16 YES NO  ALL   YES
MGMT_DB_FEATUREUSAGE                                            02-APR-2014 10:25:13       5713        244         149 YES NO  ALL   YES

 

And with all of that,  (pages and pages…) we’re done with the first of many reports.  I’ll keep to the most essential data in each, but hopefully you’ve learned something you hadn’t known before about the REPVFY utility and I’ll update with the next log report soon!

Now that we learned in Part I how to create a notification schedule for a specified user so that it will only page one individual on an oncall rotation, I’ll now show you how to use this in conjunction with rulesets to complete the process of modernizing and automating your oncall.

In my Test scenario, I’ve already made copies of the main rule sets, have deviated rules by three rules and four groups, (mission critical, production, test and development.)

  • Mission critical are all systems that I WANT to be paged for after hours and need to know immediately if there is an issue.
  • Production are important systems that I want to be notified of, but do NOT want to wake up resources at night time of, as these incidents will not cause an outage and can be addressed in the business day.
  • Development and test, for this example, will be treated the same way and will only email issues to the DBA team, as the development group has privileges to often address issues they have caused in the process of developing and testing.

With this said, now we look at our Rule Sets.  Notice I’ve disabled the default rules that come with the EM12c, but have made copies that I can use to offer the best value to the business.

rule_sets_dba_notif2

 

Let’s click on the left box to the rule named Mission Critical Incident Rule Set and click on Actions, Edit.

rule_sets_dba_notif3

 

Notice that the rule set is only specific to Specific Targets by group and the group is Mission Critical.  We then click on the Rules tab and we can update the current rule setting to use our new DBA_PAGE user!

Editing the Rule Set

There are two things we are going to do to the first, major rule set for metric alerts.

1.  We are going to limit the categories that we want to be notified for.  I want you to see this option so you can decide what categories that you want to page on and can update this as you want.  For our example, I am going to limit our metric rule set to the categories of Capacity and Availability.  At the end, I’ll show you how to create a second rule to email on the rest.

Click on the left of the rule set and then click on Edit up in the menu bar.

rule_sets_dba_notif4

 

We’ll then edit the categories to limit to the two we spoke about above.  You’ll be able to see all the categories in the drop down and note in your incident notifications you receive what categories are important to you for email vs. pages.

rule_sets_dba_notif5

 

Click on Next to continue onto the actions.

2.  We are going to update the rule to use the new DBA_PAGE user to page for the metric alerts in these two categories.

rule_sets_dba_notif6

 

 

We’ve already got a pretty advanced set of actions to perform in the Action Summary.  I like to automate many of the Incident Manager tasks, but this time around, we are going to add a page option to the actions.  The firsts rule is already highlighted, so simply click on Edit in the menu bar to proceed.

rule_sets_dba_notif7

I’m going to have all paged Incidents assigned automatically to the DBA_PAGE user and then I’m going to add the DBA_PAGE in the Page section in the Basic Notification section.

rule_sets_dba_notif8

You can also enable Repeat notifications if you want to take this one step further, (we’ll go into this more in another blog post, but I thought I’d at least show it to you in this one… :))

If satisfied, I’d recommend clicking in the box to clear events permanently to keep your repository cleaned up and then click continue to review the changes you’ve made:

rule_sets_dba_notif9

 

You’ll now see that the owner for the incident will be set to the DBA_PAGE user and you now page the DBA_PAGE user along with emails sent.  Keep in mind, only during the notification schedule active times will this be able to notify, so Monday-Friday, 8-5, no paging will occur as that’s how we set it up in our Part I of this blog post series.

We now need to do this for each of the rules that we want to page on from this rule set:

  • Create incident for critical metric alerts, (for only two categories…)
  • Create incident for critical Service Level Agreement Alerts
  • Incident creation Rule for Target Down availability stattus
  • Incident creation rule for high-availability events.

As promised, I think created a new rule set to cover the categories no longer covered in the original.  Click on Create in the menu bar and create the following rule:

rule_sets_dba_notif10

 

Add the following Actions, ENSURING you do not choose to PAGE and assign the rule to SYSMAN as you would do your standard rules you did before.  This is just to cover all your other categories, but NOT PAGE on them.

rule_sets_dba_notif11

 

Complete the review, name the rule Metric Alert for All Other Categories to Email and then click Save and then SAVE ALL YOUR CHANGES you’ve made.

You have now updated all the rule sets to use the DBA_PAGE user we set up in Part I of this series and it will only notify that user during the scheduled time it is to be active, in this case, just for after hours support and to rotate on a one week schedule between three DBA’s SMS contact information!

 

 

 

 

I came across a discussion on Oracle-l on how after hours paging was handled for many companies and was kind of surprised how many DBAs still carry around a secondary pager/cell phone or are just expected to be woke up if on call or not.  I’m not one to go back to sleep once I’m woke, so I’ve been a proponent of EM notification schedules for after hours paging.  Now there are other ways to handle this in Enterprise Manager 12c as well, but we’ll use this method, as it is backward compatible to OEM 10g, too.

The requirement of this setup is to have an inbox alerting option separate from your SMS/Texting option on your smartphone, which can be Android or iphone, even blackberry, any of these are more than acceptable to satisfy the requirement.  The next is know your SMS notification email address.

Example-  TMobiles would be <cell number>@tmomail.net and ATT would be <cell number>@mms.att.net.

Your mobile provider can verify what your email extension is for your SMS address.  With this information in hand, you can then proceed with the setup.  The design is simple-  All email notification will continue to go to your email, no matter if warnings or critical, but we will create a secondary user that will have the SMS addresses for the DBAs oncall and set them up for a rotation to be notified of critical issues.

Sit down and first figure out how often you want to rotate your schedule, weekly, every two weeks, once per month and any vacation or time off coming up.  That should tell you what your rotation needs to be to keep things sane.

Create the Paging User in Enterprise Manager

First we need to create the user that will then be utilized for paging.  This can be done from the Setup, Security, Administrators menu in the console.

admin_pg_0416

We have all our DBAs currently listed, but for this user, we will need each of their SMS addresses and once you have those, click on Create.

Name the user DBA_PAGE and put in the following values, and it can be set to an Administrator, but just for this example, I shortened the process and created it as a Super Admin, (this is not required…)

dba_page_0416

Note that I’ve entered each of the DBA’s SMS addresses into the Email Address section, separated by commas and that I’ve entered a clear description of what this login is used for.

Click on Finish and your new DBA_PAGE user is now listed.

admin_pg_2_0416

Managing the DBA_PAGE User

Once you’ve completed the entry and then log out of the EM12c console as your current user and log back in as the DBA_PAGE user, (just easier to manage the notification schedule as this user…)

dba_page_login_0416

 

Next you need to create the notification schedule, but I’ll first show you how you can add more users easily to this Administrator:

dba_page_login_3_0416

 

Once you’ve accessed this page, you will note that new email addresses can be added easily:

dba_page_info

 

The Add Another Row option will allow you to add another SMS address and then you can check mark just that address and test it.  You can also remove addresses from here of DBAs that have moved on from the company.

Editing the My Notification Schedule

dba_page_notif_sched1

 

This is again accessed from the Setup menu on the right hand of screen and once you are in the My Notification Schedule, you will see the following:

dba_page_notif_sched2

This is the default schedule, starts from the day the user was created and would notify ALL email addresses in the list.  We are going to replace this schedule with a new one that supports a three week rotation of oncall, notifying on one DBA each week, switching automatically each week to the next DBAs SMS address.

Click on Edit Schedule Definition in the upper right of the screen.

dba_page_notif_sched3

As starting a rotation in the middle of a schedule would be difficult, we are going to choose the following Monday as our beginning.  You can see the default is a one week rotation and that the time zone is set to Canadian Central Time, (I have no idea where that came from… :))

dba_page_notif_sched4

 

For our example, we are going for a three week rotation, (three DBAs, three week rotation… :)) We’ll start the rotation this following Monday, April 21st and we’re going to say that the DBAs are on Mountain Time.

We are then given a schedule of three weeks to fill in for who is oncall.  By default, it will want to offer us all three SMS addresses entered for the user.  We will only use the first one, we will click on Week 1 to fill in, ensure we are VIEWING Week 1 and now to fill in our schedule.

dba_page_notif_sched5

 

To create a schedule that DOES NOT notify during business hours, there is a batch file for each week of three steps.  There is no way around this, so I’ll use the following schedule creation and then repeat it for the 2nd and 3rd week for the two other DBAs.

dba_page_notif_sched6

 

Now notice that we’re still viewing and editing week 1, and choosing all the days, but I’ve changed the time to show 12am-8am to fill in with the first DBA’s SMS address.

Finally, we fill in for the last time slot, Saturday and Sunday during the weekend daytime.

dba_page_notif_sched7

 

Your scheduled should now look something like this, with all sections above and below business hours filled in with the 1st DBA’s SMS Address for notifications:

dba_page_notif_sched8

 

We then will click on the down arrow button on the View menu and choose the second week, then do the same for the check mark in the box for Weeks, ensuring we are changing the schedule for Week 2.  We’ll repeat the scheduling, but will also change the Email Address to the second DBA’s SMS Address by clicking on the search.

dba_page_notif_sched9

Once you have filled in everything for the Second DBA, you will switch to View Week 3 and edit Week 3 for the third DBA and fill in that schedule:

dba_page_notif_sched10

 

Once completed, Save the changes.

The default view may alarm you at first, as it will be empty:

dba_page_notif_sched11

 

The reason is the first date that is shown, which is the current date and if you remember from our example, we decided to start our rotation on April 21st.  Go to the View Week Containing and change it to April 21st.

You will now see the rotation schedule for the first DBA for after hours.

dba_page_notif_sched13

 

Updating the date for the next week will display the 2nd DBA’s rotation:

dba_page_notif_sched14

 

And then if you switch the date to first week of May, we see the third DBA:

dba_page_notif_sched15

 

If you go past that, you’ll see that the rotation continues, starting again with the first DBA for the next week and will continue on without change unless you edit or replace the schedule.

If for some reason, let’s say a DBA can’t take a shift that is in their weekly schedule for one night, you can go into that DAY’s shift and edit it to one of the other two DBAs in the list.

Now you’ve finished setting up an after hours pager in a notification schedule.  In Part II, I will show you how to use the notification schedule/user with rule sets to page on critical notifications.

 

 

 

 

EM12c Auditing

Lately I’ve been having more discussions on securing the EM12c environment.  All of IT has a tendency to treat the Enterprise Manager as a afterthought in both hardware allocation, as well as security best practices.  No one is sure of exactly why this is-  they all have their theories, but we do know it happens often.

Today we are going to go over some of the auditing options within EM12c.  Basic auditing is turned on by default in the environment, but only covers basics processes.  There are over 150 auditing options and extensive information can be collected, retained within the repository, as well as turned into an externalized service to reside as log files on the OS file system.  These options include login/logout information, updates, OMS password changes and EM key copy and removals from the repository.

Basic auditing information can be gained through the console via the Setup, Security, Auditing Data menu option, but the auditing configuration, additional features, updates and externalized service setup, must be performed through the Enterprise Manager command line interface, (EM CLI).

If you haven’t used the EM CLI before, please refer to my blog post on Beginning with the Command Line Interface, otherwise log in a user with appropriate rights to run the EM CLI and connect to the repository.

First, let’s inspect the current operations list and what will impact the infrastructure if executed:

audit_blog_1

Note that the last option, APPLY_UPDATE, is to update the repository and yes, it will impact the infrastructure by doing so.

Next, let’s look at the current settings. As I stated earlier, auditing is turned on by default, but the next options are disabled for the externalized service, so it is marked as disabled.

audit_blog_2

The defaults for the externalized service, outside of the directory, (configured in the DBA_DIRECTORIES and read/write privileges granted to SYSMAN) are pre-configured with default information.

  • File prefix is the prefix used for all audit log files so that they are easily identified in the directory.
  • File size is default to 50M
  • Retention is default to 365 days.  Keep this in mind before enabling, as this could be impacting to disk space if you OS directory has limited space.

Notice that there is also a note informing you that Infrastructure Audit is always on, (go inspect the access.log and you will see information that can be sync’d up with the emctl.log and others to create a solid picture that this feature can create for you.)

Enabling/Disabling Features

To enable or disable audit features, the following syntax is used:

>emcli update_audit_settings -audit_switch="ENABLE/DISABLE" -
operations_to_enable="<insert operation name here or just say ALL>" -
operations_to_disable="<insert operation name here or just say ALL>" 

To demonstrate this, we’ll enable auditing for logins and logouts:

audit_blog_3

The response letting us know if the change was successful in the auditing configuration completes the task and we can move on to other tasks.

Next, we’ll configure the externalized service for auditing.  This is an excellent choice and should be considered for all EM12c environments.  Even with high availability options, the idea of keeping a minimum of 7-31 days of auditing information regarding the EM12c environment, especially considering the access and power of the EM12c, is a good idea.

The syntax for the configuration for the externalized auditing service is:

>emcli update_audit_settings -file_prefix=<file_prefix> -
directory_name=<directory_name> -file_size = <file size> -data_retention_period=<period in days> 

And in our example, we will update the service to file sizes of 25M each, with a prefix of “em12c_audit” and retain 31 days of audit files that our OS file system can easily handle.

>emcli update_audit_settings -externalization_switch=ENABLE -file_prefix=em12c_audit -directory=AUD_DMP -file_size=25000000 -data_retention_period=31

After executing this statement, the audit files will automatically start generating to the directory, (make sure you HAVE created a DBA Directory to hold this data first!) and we can then view logs as needed to inspect what activity is occurring in the EM12c environment.

This is a solid best practice to ensure you are offering one more line of protection to the database and software that is essential to you, your business and your environment.

 

I know that folks have been having some challenges downloading my slides from Collaborate for a couple of my sessions and I know I’ve received errors when updating two of them the other day, so I’ve added them to my slideshare location for your convenience.

Thank you for everyone who attended my sessions-  such great turn outs and fantastic attendee participation!  I had some real EM12c stars in the audience and appreciated the support and banter!

My slideshare list for download can be accessed from here.

Thank you to IOUG, Oracle and everyone for their support!

This is the second post on ASH Analytics Detail.   You can find the first post here.

This post we’re going to work into more Activity data within ASH Analytics, but I’m focusing on Resource Consumption, which you can find once you are into Ash Analtyics, (you may want to stop refreshes, which can be done at the upper right in the refresh settings) under the Activity button.

ash_0321_resource_consum

We’ll be working with the last three options in this menu marked off in red, as the Wait Class is just a similar view of what we already get in the main Activity graph.

Wait Class

I’m sure you’ve already noticed, but ASH Analytics has a tendency to always arrange any of the graphs by the heaviest usage at the bottom to the lightest at the top.  This makes it easier to view for the user.   This will be the case for pretty much all the activity graphs, where Load Maps will arrange largest to smallest, left to right.

ash_0321_main

Nope, not much new here-  We are still seeing the resource consumption, but as wait events.  If we compare it to the standard wait events view, not much different.  The reason we would really find value in this is if we had IORM, (IO Resource Manager) enabled.  Since we don’t, we’re going to skip it for now and it will get to be part of that blog post in the future… :)

Wait Event

when we filter by Wait Event, we get a very different view of our data than we did by Wait class.  Note the actual categories that are graph is broken up by, listed down on the right hand side.

ash_0321_wait_event

Scanning from the bottom –> up on the right hand side, we can see the largest resource consumption is “LNS Wait on SENDREQ” and the second highest consumer is “LGWR- LNS Wait on Channel”.  You are also seeing log file sync and parallel log file write.  All of this comes to a clear understanding of what we are dealing with here… Why?  This is a dataguard environment and the LNS* type waits are quite common and occur when-

  • LGWR writes redo to the online redo log on this primary database (when LGWR SYNC is not used, user commits are acknowledged once this step completes except when the parameter COMMIT NOWAIT is used.
  • The Data Guard LNS process on this primary database performs a network send to the Data Guard RFS process on the standby database. For redo write sizes larger than a MB, LNS will issue multiple network sends to the RFS process on the standby.
  • LNS posts LGWR that the all the redo has been successfully received and written to disk by the standby, which is a heavy consumer, as well.

On the primary database here is extended resource consumption represented with the “log file parallel write” wait event. We will then note repeated “LNS wait on SENDREQ” wait event, too. You can further divide the “LNS wait on SENDREQ” wait event into network time and RFS I/O time by subtracting the “RFS write” wait event obtained on the standby. These wait events can be assessed on the standby by using multiple queries in the “Data Guard Specific Wait Events” section for a physical standby or by using AWR for a logical standby if you need further proof.

Object

I’ve hidden the exact object names involved in this database, but you’ll note, they weren’t involved in much of the resource consumption anyway.  Now on an environment with heavy IO, this would change significantly and you would see a lot more of one or more objects being the focus of this type of graph.  This is where having the Top Activity box check marked is helpful, as it clearly shows you that object IO is not much to focus on for this environment.  The black mark showing the upper max of activity for any given time in the activity graph gives you a clear line to compare with.

ash_0321_object

Blocking Session

This is where many of you will become very interested again.  We all want to know about blocking sessions and how they may be impacting the resource consumption in our environment.  Although there was no *true* impact to production processing, there was some blocking sessions that we could inspect in our current example.

Only sessions that are experiencing some type of bl0cking are displayed in this view.  No matter if it’s transaction, (TX- Tran Lock contention or TM- DML enqueue contention , etc.) or UL, (user defined)  these sessions will be shown in this graph.

ash_0321_blocking_sess

I’ve included the bottom portion of the ASH Analytics section so you’re able to see the SQL_ID’s involved in the sessions and then to the right, you can also view the session information, displayed by Instance ID, (for RAC) session ID, (sid) and serial number, (serial#).

You can then click on a session in the lower section, which will filter the ASH graph by the session and the wait class to give you another view, demonstrating in our example that we are waiting on a commit(s) and then our favorite wait, “other”…. :)

ash_0321_blocking_sess_det

What’s helpful, is that the SQL_ID is clearly shown on the left and  you’re able to click on any of these, (although I’d highly recommend clicking on the top one that is 5 times the concern vs. the others. :))

This completes the review of the Resource Consumption Menu for the ASH Analytics Activity Page.  We’ll continue on in Part III later this week!

I have a request to hit on some EM performance blog posts, so we’re going to start with breaking down some of the ASH Analytics areas.  ASH Analytics is not your grandfather’s “Top Activity” and I recommend everyone begin to embrace it as it is the the future of performance activity in the Enterprise Manager.  The idea that we will be able to pull directly from ASH and AWR to present our performance data via the EM12c is exciting, to say the least.  The added accuracy and value of the aggregated historical data must be recognized as well.

The standard output for ASH Analytics looks quite similar to Top Activity in the way of focusing on activity graphed out by wait events, but I’m going to dig into a different ways to present the activity data, as it may answer questions that simply won’t show via wait event graphing.

Once you first enter the ASH Analytics interface, you’ll be presented with the section at the top, which will display the timeline for your examination and then below, the actual wait event graph as seen here in our example:

ash_standard

You’ll note that we haven’t used any filters, so we’re viewing all activity data and the standard wait classes we are accustomed to viewing as shown, i.e. I/O, (dark blue) CPU, (kelly green) and System, (light blue).  We can see the system is rather busy during multiple intervals, bypassing the CPU cores line, displayed in red across the image.

Now let’s display the data in a different format.  This is done by changing the Wait Class, shown under the Activity button, to another filtering type.

ash_menu

The section highlighted in red is where we will be spending our time today.  I’ll display the same section of time we see in the standard wait event displayed in our standard wait event view, but focus on the SQL defined displays and we’ll go over how they might assist you in troubleshooting an issue.

SQL_ID

Now this one might seem a bit self-explanatory, but displaying the data from the activity pane appears very different, then, let’s say from the load map:

ash_bld_out

Now let’s look at it from the Activity View defined by SQL_ID:

ash_by_sqlid

I find the load map an excellent way to get my point across to non-technical folks, but I’m a DBA-  I’m used to spending my time looking at the data in terms of activity, so this is a more comfortable view for me to move from seeing the data in terms of wait events and transitioning that view to seeing the percentage of activity allocated to each SQL_ID.

We can click on the SQL_ID displayed on the right to go to it’s detailed ASH page, which will show you all data pertaining to that SQL_ID and wait information involved with it.  By clicking on different sections, we’re not just digging down into more detailed information, keep in mind, we are also “filtering” out more data that could have been masking valuable information.

Top SQL_ID

You may also think that this one is not going to look any different than the last, that’s where you’re wrong.  This is not the SQL_ID that has resources allocated to it in activity, but the top level SQL_ID that is executing the SQL_ID active.  This is helpful if you are trying to locate what packages and procedures should be first on the list for code review or if you want to quickly locate the package or procedure responsible for a specific SQL_ID.

ash_top_level_sqlid

These can then be identified to sessions by clicking on links and then traced to users, hosts, etc.  If I click on one of the top SQL_IDs, it will take me to all SQL_ID’s involved in that Top SQL_ID and all the wait events, displayed in the same graph timeline.  From there, I can then dig down into the data pertaining to the waits, the SQL_ID’s involved as part of the Top SQL_ID or even switch to other views, such as a load map to present the data in another format for peers to view more easily.

Force Matching Signature

Force matching signature is a way of ensuring that SQL is using the same plan/profile, etc. even when it has literals in it.  It’s kind of like setting cursor_sharing=FORCE, but in a “hinted” way throughout a database.  This can be both good and bad, as let’s say that it forces the value to “1″, where the value “1″ really only makes up 2% of the rows and it would have been better if it knew what it was working with.

ash_force_matching

SQL Plan Hash Value

You should be seeing this next image [hopefully] as a mirror of the SQL_ID one.  We do like plan stability and the idea that we have a ton of hash plan values changing erratically is enough to make most of us feel queasy.  Having a quick visual that we’re not experiencing a lot of plan changes can be quite helpful.  It’s not foolproof, again, this is a visual representation of the data, but it’s helpful.

ash_sql_plan_hash

SQL Plan Operation

If you’re a DBA managing a data warehouse, you know when any DML is running and on what schedule.  The SQL Plan Operation view can give you a quick verification if something is amiss and there are updates, deletes or inserts happening that shouldn’t be.  You can see the percentage of activity that may also quickly tell you a change has occurred vs. the normal database activity.

ash_sql_oper

You can also see just how much of your activity is going to certain type of processing.

SQL Plan Operation Line

The plan operation line, you can see the operation type for the process, along with the description.  the process is then broke down by both the SQL_Id and the step the statement is performing.

ash_sql_plan_op_line

If you hover over any SQL_ID listed, it will also show this to you in the highlighted area:

spec_load_type_op

SQL OpCode

This one is self-explanatory.  We are simply looking at the activity level per statement type.  It is easy to see that during our busy intervals, queries were around a 1/3 or the activity, as were inserts.  This view can be helpful if you are retaining screenshots at different snapshot intervals for comparison.

ash_sql_op_code

Top Level SQL Opcode

You can also view the data simply by Operation code, displaying by each code.  Notice that these are displayed by the Instance number, (if on RAC), Session ID, (sid) and Serial number, (serial#).  This is a global view of the operations that are occurring in the environment and can offer a clear view of activity, but at an operation type level.

 

ash_sql_op_code

We’ve just touched on what the SQL section of the ASH Analytics Activity view can offer.  Stay Tuned and we’ll dig into each section and each area of this great interface for performance monitoring and analysis.

 

 

 

 

I’m in the midst of transferring over to my travel laptop to run all my VMs on and retiring my previous ASUS work laptop to my youngest son.  I was surprised to find out that not ALL laptops are set up to run virtual environments these days.

1.  Virtualization may not be enabled in the BIOS, (i.e. On-Boot UEFI in the Lenovo Yoga 11s.)

enable_vm

Once this is enabled, save the configuration and reboot, allowing you to now run a VM on your laptop.

If you are importing appliances, make sure you have set the location to import the disks to the appropriate location, especially if you are using an external disk to house your VMs.

Before you click, “Import”, look at the very bottom where the location of the virtual disk will reside and verify that is where you want them to live.  The default is commonly the data location for the PC user on Windows or the /home/user for Linux users.

imp_disk

 

Now onto the VM

2.  All VM images are not the same.  If you are using an image file or have created the image from scratch, be prepared to do some preparation to get the image ready before you can successfully install Oracle on it and/or any other product such as Enterprise Manager, etc.

The nicest part is that Oracle will let you know what is wrong in its prereq checks during the installation of Oracle.  It’s going to let you know if your OS and virtual host has what it takes to support the installation-

vm_updates

 

Most of these are pretty straight forward for anyone who’s installed Oracle for awhile, maybe not for those that are newer to the product.

1.  Most VM images have a small /tmp and/or it can be filled quickly by installation.

  • clear out files from /tmp after each installation or in this case, failed install.
  • create a soft link to point to a second temp directory that has more space and make it the default temporary directory.

2.  Increase is the swapfile space.  This can be created in a location you have space to reserve and read/write swap to.  On  our VM, we’re kind of limited at the moment, so we’re just going to create a swapfile off of /u01 and give it 2GB of space:

dd if=/dev/zero of=/u01/swapfile bs=1024 count=2097152
mkswap /u01/swapfile
swapon /u01/swapfile
swapon -a

3.  Hard limit/soft limit-  These can be fixed by the following inside your VM, most of the steps must be performed as ROOT, but verify if it’s looking for ROOT to perform or a check as the OS user-

Next, we’re alerted that the max open file descriptors is an issue, so let’s look at those for the system, the user and then increase them:

cat /proc/sys/fs/file-max
Do as the user!!
ulimit -Hn
4096
ulimit -Sn
1024

We’ve now verified that they are smaller than the requested amount, (in our installation, it requested 65536) so we’ll add this and then verify the change in the file-

sysctl -w fs.file-max=65536
vi /etc/sysctl.conf
We can also simply append to file, (as ROOT):
fs.file-max = 100000

Now save and verify with the following:

sysctl -p

Log out and log back in to set the new values.

Next post we’ll take on all those darn lib files… :)

 

I love my Lenovo Yoga 11s special build, ultra book.  It has 16G of memory, an Intel core i7 and 256 SSD, but that’s no where near the amount of space that I’m going to require to be running numerous virtual environments on it.  To help me out, I went on a fun buying and testing spree with a number of external disk solutions to find what worked best for my needs.  

The contenders, (yes, I was all over the board on these)-

The goal was to come up with a solid combination of performance, portability and storage.

I’m not going to go over any benchmarks as you can see the claims by the manufacturer and the reviews from folks who purchased them on the links, but I can tell you what ended up working for me.

The My Book was just to large and clunky.  I didn’t get the speed increases with the external power supply, (there were recommendations to get an external with added power as my Yoga doesn’t offer a whole lot in the tiny package…)  I think it would make a good backup drive, but not good for running VMs and no way was I hauling the literally “book” sized external drive overseas and to conferences!

The Toshiba Canvio, along with two other 5400rpm drives I already had were solid, had space of 500MB to 1.7TB, but they didn’t show the performance I required and the USB was 2.0 on the older ones.  It was good, but not good enough, so the Toshiba Canvio lost due to size vs.  price.

The PNY 128 GB flash drive was fast enough, but it’s just too darn small.  The price is really great at $50, (and for those Best Buy shoppers, Amazon has it for a third of what you’re paying at Best Buy!) so it is only good enough to be used as an external flash drive to SUPPORT the VM environment, (which I’ll explain in a little bit.)

Which leads me to the Patriot 256GB SuperSonic-  This 256GB, blazing fast flash drive is just big enough and plenty fast enough to run your VM for demos and webinars with the speed you want, but the price is the same as what you’d be paying for the Best Buy PNY version.  This is a fantastic deal for a 256GB fast USB 3.0 flash drive.

Then comes in the Touro 1TB 7200, 3.0 USB external drive.  It out performed all the other drives, the price was fantastic.  The combination of this for my standard VMs and the Patriot 256GB SuperSonic flash drive for demos is a nice combination.  I also kept the 128GB PNY flash drive-  why?  As you work with VMs, you come across the need to download and copy files as you are building.  It’s nice to have a fast flash drive with plenty of space to download to.

The end setup is portable enough that I can easily travel and present, but it’s fast enough that I can avoid some of the technical difficulties we all have run into when attempting to demonstrate via a VM.  A special thanks to Leighton Nelson, Tyler Muth and Connor McDonald for their recommendations.  This saved me a lot of time and I was able to test out just what fell inside that range to build out what would work for the setup I needed! :)

20140316_192114 

Yes, you heard that right-  DBA Goth Cowgirl is getting an upgrade to Enterprise Manager, (EM12c) Goth Cowgirl!  I will be starting as Oracle’s Consulting Member of the Technical Staff for the Strategic Customers Program, specializing in Enterprise Manager on March 17th.  The Strategic Customer Program is a group that rolls up under the Systems Management product line and is one of four primary product lines, (the other three being Database, Fusion Middleware and Applications.)  We comprise the whole of Enterprise Manager and OpsCenter.

This opportunity has been long in the making and I look forward to focusing on the Enterprise Manager product as both part of the development team and presenting at conferences.  Its been made quite clear to me that both my technical and marketing/presentation skills will be highly sought after in my new role and I’m looking forward to being a integral part of the EM team.  I will be primarily working remote, but will travel for short term technical engagements and marketing efforts whenever requested.

So for those of you who want to continue to master EM12c with me, EM12c Goth Cowgirl, continue to follow me as I’ll be offered invaluable expertise as I deep dive into the source of all that is EM12c with Oracle!

I want to thank all those that helped me in some way with this opportunity, including Dan Koloski, Pete Sharman, Tyler Muth, Mary Melgaard, Wendy Delmolino, Pramod Chowbey, Will Scelzo and Adilson Jardim.  These are all incredible folks within Oracle and their guidance and support is tremendously appreciated now and in the future.

em12c_img