Latest Entries »

The last couple weeks I’ve been lucky enough to get time with a new ZFSSA Simulator, (it will be out soon for everyone, so you’ll have to just play with the current one available, patience grasshopper! :)) and spent checking out the newest features available with Database as a Service, (DBaaS) Snapclone via Enteprise Manager 12c.  I’m really thrilled with Snapclone-  In two previous positions, I spent considerable time finding new ways of speeding up RMAN duplicates to ease the stress of weekly datamart builds that were sourced off of production and this feature would have been a lifesaver back then!

As Oracle’s DBaaS offering is so full featured, I think its easy to have misconceptions about the product or find blog posts and documentation on earlier releases that lead to misconceptions. Due to this, we thought it might help if I tried to dispel some of myths, letting more folks realize just how incredible Snapclone really is!

Misconception #1- Snapclone only works with the ZFS Storage Appliance

DBaaS does offer a fantastic hardware solution that requires a hardware NAS like Netapp and ZFS/ZFSSA, but that’s not the only option.  There is also the software solution which  can work on ANY storage.  There’s no requirement for the test master database, (used to track changes and save considerable space vs. a traditional cloning method…) on where it must reside and that means it can be on different storage than production.

There are benefits to both the hardware and software solutions of Snapclone.  Keep in mind, Oracle prefers to support hardware and software that are engineered to work together, but they realize that not all customers will have a ZFS or NetApp Solution in place, so they ensure that Snapclone solutions are available to all storage a customer may have in their shop.  Some of those benefits and requirements are listed below, but you can clearly see, Snapclone does not have a requirements of ZFS or NetApp:

tm_profile_2

Misconception #2-  Snapclone Requires a Cloud Implementation

It is true that Snapclone requires database pools to be configured and built, but this is in the environment’s best interest to ensure that it is properly governed placement is enforced. This feature is separate from an actual “cloud” implementation and the two shouldn’t be confused.  There are often dual definitions for terms and cloud is no different.  We have private cloud, public cloud, the over-all  term for cloud technology and then we have database pools that are part of Snapclone and those are configured in the cloud settings of DBaaS.  They should not be confused with having to implement “cloud”.

Misconception #3- Unable to Perform Continuous Refresh on Test Master Database

There are a couple easy ways to accomplish a refresh of the test master database used to clone from production.  Outside of the traditional profiles that would schedule a snapshot refresh, DBAs can set up active or physcial dataguard or can use storage replication technologies that they already have in place.  To see how these look from a high level, let’s first look at a traditional refresh of a test master database:

tm_profile_1

 

Now you’ll notice that the diagram states the test master is “regularly” refreshed with a current data set from production and if you inspect the diagram below, you will see an example of an example of a software or hardware refresh scenario to the test master database, (using datamasking and subsetting if required) and then creating Snap Clones.

 

tm_using_testmaster

Now as I said earlier, you can use a standby database, too, to perform a clone.  The following diagram shows the simplicity of a standby with Dataguard or Golden Gate.  Notice where Snap Clone takes over-  it’s at the Standby database tier, so you still get the benefit of the feature, but can utilize comfortable technology such as Dataguard or Golden Gate:

tm_using_standby

Misconception #4- Snapclone Can Only be Used with DB12c

Snapclone works with any supported version of the database 10gR2 to 12c. Per the DBaaS team of experts, it may work on earlier versions, they just haven’t certified it, so if there are any guinea pigs out there that want to test it out, we’d love to hear about your experience!

The process of Snapclone is very simple once the environment is set up.  With the Rapid Start option, it’s pretty much the “Easy button” for setting up the DBaaS environment and the process solidified once service templates are built and the service request has been submitted in the Self Service Portal.  There isn’t anymore confusion surrounding where an Oracle home installation should be performed or what prefix is used for database naming convention and other small issues that can end up costing an environment in unnecessary complexity later on.

Misconception #5- Snapclone doesn’t Support Exadata

A couple folks have asked me if Snapclone is suppored on Exadata and in truth, Exadata with Snapclone offers a unique opportunity with consolidations and creating a private cloud for the business.  I’ll go into it in depth in another blog post, as it deserves it’s own post, but the following diagram does offer a high level view of how Snapclone can offer a really cool option with Exadata:

tm_exadata

There are so many features that are provided by Snapclone that its difficult to keep on top of everything, but trying to dispel the misconceptions is important so people don’t miss out on this impressive opportunity to save companies time, money and storage.  I know my whitepaper was over 40 pages on DBaaS and I only focused on NetApp and ZFS, so I do understand how easy it is, but hopefully this first post will get people investigating DBaaS options more!

This is the first of many posts I’ll do where we take a particular issue and trouble-shoot.  I’ll go through multiple scenarios to show the common steps to take via the Enterprise Manager 12c from discovery to identification to research to explanation.

This blog will use the following features:

  • ASH Analytics
  • SQL Details
  • ASH Report

The first thing we need is an issue-

ash_b_0421

Note the CPU red line-  the IO is considerably over that and quite a lot of IO, along with Commit and Concurrency waits.  We’ll start with this as our issue that we can see, has quickly escalated over our timeline, (15 minutes was requested from the view…)

Scanning down to the Top SQL, (by default) we can then see that this IO pressure is primarily caused by two-three SQL_ID during the timeline:

ash_b_0421_2

We can click on the first SQL_ID and display the insert statement.  We also note the escalation from 1pm to 1:15pm in the Statistics tab that shows the active sessions.

ash_b_0421_3

So I know a few of you are asking, “Why isn’t there any Plan Hash Value?  If it is a simple insert, with no subquery, etc. then would it have a Plan Hash Value?   Think about that as we inspect the next tab, Activity.

ash_b_0421_4

We are now inspecting just the wait events connected with THIS SQL_ID and the sessions that are involved with it.  On the right, we can quickly see that there is a huge percentage of cell single block physical read waits involved with this insert.  What does that tell you about the hardware that the insert is running on?  Yes, it’s part of an Exadata environment.

Scrolling down allows us to view the sessions involved in this timeline.  This is not just the sessions for the 15 minutes of escalation we are interested in.  These are ALL the sessions in the timeline shown above.  We can view from the top, .77% of the activity for the first one and note that it decreases as we go through the list.  Also notice that it’s 1-10 of 200 sessions total that have been captured during the total timeline.  We already know a majority of them are part of our escalation in IO on the system.

ash_b_0421_5

Inspecting the last 10 helps us see the decrease in IO resources as concurrency waits increased-

ash_b_0421_8

We are down to .39% by the time we get to the 200th session captured in this timeline.  Now getting back to our “Should we have a plan hash value for our insert statement?” question.  No, we shouldn’t and there isn’t a plan, nor is there plan control that will be found for this simple insert.  This is verified when you inspect the Plan and Plan Control tabs for this SQL_ID.

ash_b_0421_6

We then can look at the Tuning History.

ash_b_0421_7

You might wonder how could it be that this insert, that doesn’t have a plan or plan hash value is found in the tuning task?  If it is a top SQL that is deemed an impact to performance, then it will show up here.  There is no escape for this insert.

At this time, we can see that we are looking at a concurrency issue that is not coming from a user or from a specific session in any way.  So the next step is to get a clearer look at the 5+ minutes that lead up to the heavy IO to see what is really going on and eliminate assumptions.

ash_b_0421_9

I simply click on “Run ASH Report” in the SQL Details window, change the times to 1:10pm –> 1:16pm and notice that we are focused on this SQL_ID right still.

The report shows the standard header and high level info, but then once you dig in, you’ll see the top events:

ash_b_0421_10

We see that yes, it is almost 96% of the activity with an average of 21 active sessions running at a given time.

ash_b_0421_11

If you were to have traced this session, you’d see something similar to the following in the trace file to represent the data in the first three lines that we now see in the ASH report:

WAIT #3: nam='cell single block physical read' ela= 945 cellhash#=3983857215 diskhash#=3558547146 bytes=8192 
obj#=12894 tim=1912764532115123
WAIT #3: nam='cell single block physical read' ela= 483 cellhash#=1017296094 diskhash#=1180032005 bytes=8192 
obj#=46129 tim=1213898462783761
WAIT #3: nam='cell single block physical read' ela= 697 cellhash#=1017296094 diskhash#=3896957901 bytes=8192 
obj#=47883 tim=1158963811267355

Note that there are THREE cell single block physical reads for the insert event.  Why would we see this?  This is the one question to the readers of this blog-  What would cause the single block physical reads on these three indexes?

 

ash_b_0421_13

It’s not the table we are having an issue with but the indexes that are on the table.  Three indexes show up very clearly in our Top DB Objects section showing our wait event, cell single block physical read.  The ASH report gained this information from the obj#, just like you see in the trace file info in red up above that I displayed for you.  For each of those entries shown, a very similar percentage of the event is displayed farther up in the Top Event P1/P2/P3 Values.

So from here, I think we’ve inspected the SQL_ID pretty fully and we know why the problem is occurring.  There are three indexes that require a scan

The final answer we want is why is this heavy load occurring?  Well, ASH answers this for us, too-

ash_b_0421_answer

Yeah, someone is TRYING to put a heavy load on the database with Swingbench and the active average sessions matches our active average sessions overall.   We’ll leave it as is-  no one wants to be responsible for killing someone’s load test… :)

 

There are a number of Verification Utilities for Enterprise Manager 12c, (EM12c) and I’ve written about them before, but today I’m going to start on the Repository Verification Utility, (REPVFY).  This will be an ongoing series, as there are so many valuable features rolled into the utility and new ones that will be added as new patches and releases happen.

For this post, I’m going to be reviewing a level 2 diagnostics report from a new repository database.  This is going to offer us all kinds of valuable data about the environment and should be considered by any EM12c administrator when doing a health check of their EM environment.

Inspecting the files from the output, you’ll note that they don’t require a lot of space, even when unzipped and this allows the admin to retain historical information about the EM12c environment.

Note that the files are broken up by type of review and marked with the timestamp for many of these files.

em_blog_post_0418

For this first blog post, I’m going to jump right into the first log alphabetically, the advisors log file, which as you might guess, is advising on performance.  The interesting thing is as small as this file is, it is actually comprised of a number of reports that one would be surprised how much work to run all of it manually.

The Advisors Log is comprised of the following reports:

  • ADDM Report
    • Space Advisor Report
    • ASH Report
  • AWR Report
  • SGA Resizing Report

The report then ends with the CBO Repository information before closing up this one of many log files generated by a simple and single execution from the command line.  I would definitely read through to this final section to understand the stats collection in the EM Repository database a bit more clearly, as I do touch on it in this post.

Advisor Log Header

When viewing the log file, the first thing it tells you is where the source SQL for the report comes from.  This is helpful if you want to inspect the SQL used.

The next is information about the release versions and such involved.  This is standard for most of the log files in the repvfy output:

 

COMPONENT          INFO
------------------ ----------------------------------------
EMDIAG Version     2014.0318
Repository Version 12.1.0.3.0
Database Version   11.2.0.3.0
Test Version       2014.0402
Repository Type    CENTRAL

The next sections of the report will go over the repository database and report on the type of database, (single instance or RAC, protected, version, etc.) and other pertinent information.  This is multiple query output I would like to see rewritten and compiled into one result set, but it’s still good to have this data, no matter what the output.

We then jump into SGA information and init/sp parameters.  What surprised me is that next section was dedicated to System statistics and the OS stats data.  This data is often only updated when a change to hardware has been made or if parallel is implemented and it’s set up in an upper level of the report indicating that the author of the report found this information valuable to the performance of the repository database, (not agreeing or disagreeing here, found it more curious than anything as I find it can be important in large databases, which the EM repository rarely is…)

--- Database OS statistics
SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- ----------------------------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    03-29-2014 02:00
SYSSTATS_INFO                  DSTOP                                     03-29-2014 02:00
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                           1405
SYSSTATS_MAIN                  IOSEEKTIM                              18
SYSSTATS_MAIN                  IOTFRSPEED                           4096

 

ADDM Report

Next the utility runs an ADDM report for the previous hour’s snapshot.  Now ADDM reports rarely receive a lot of love from DBAs.  It’s not the top info it includes or the findings, it’s the recommendations that commonly leave them less than thrilled.  How often do we read one that doesn’t recommend some type of hardware solution to a problem that we know with hardware applied, will only rear its ugly head again a few months to a year later?  If there was one recommendation I could make to this valuable report is to take those recommendations out and I think you’d see a lot more DBAs embracing ADDM reports.

The report for the repository database is true to form, only this time it’s Oracle’s objects being offered hardware solutions!

ECMAM_PK                                  0          0          0          0
Move object SYSMAN.ECMAM_PK to flash storage, estimated I/O wait reduction is 10 millisec

EM_CS_SCORE_HIST                          0          0          0          0
Move object SYSMAN.EM_CS_SCORE_HIST partition 2014-03-25 00:00 to flash storage, estimated I/O wait reduction is 125 millisec.

EM_EVENT_MSGS_PK                          0          0          0          0
Move object SYSMAN.EM_EVENT_MSGS_PK partition OPEN to flash storage, estimated I/O wait reduction is 5 millisec.

EM_METRIC_KEY_GUID_IDX                    0          0          0          0
Move object SYSMAN.EM_METRIC_KEY_GUID_IDX to flash storage, estimated I/O wait reduction is 336 millisec

Now who is to say that a savings of 10-336ms is worth the cost of flash storage?  I’m not, but this is just more proof to me that the recommendations section is still not where most DBAs would like to see the ADDM report.

Further sections of the report do live up to the hype and offer similar value to what we see in AWR for the EM repository database:

Top Background Events            DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Avg Active
Event                               Event Class     % Activity   Sessions
----------------------------------- --------------- ---------- ----------
os thread startup                   Concurrency          47.91       0.03
CPU + Wait for CPU                  CPU                   9.77       0.01

 

To clearly show that it is Concurrency that is causing the OS Thread Startup % of activity is important.  This helps keep the administrator on the right track if they were to investigate the wait event.

The following section goes through the code and action involved in the last hour’s ADDM activity.  You can see if there is anything causing an issue, but as we can see in our repository, not much of anything-  it’s nice an quiet…. J

Top Service/Module               DB/Inst: OEM/OEM  (Apr 02 14:00 to 15:00)

Service        Module                   % Activity Action               % Action
-------------- ------------------------ ---------- ------------------ ----------
SYS$USERS      DBMS_SCHEDULER                 8.84 EM_REPOS_SEV_EVAL        2.79
EM_SYSTEM_MEMBER_S       1.40
EM_TASK_WORKER_3         1.40
MGMT_COLLECTION.COLLECTI       8.37 Task:881 Repositor       4.19
Task:37 Performanc       1.40
OEM.PbsSystemPool              6.51 PingHeartBeatWork:       2.33
LoaderWorker             1.40
GATHER_SYSMAN_STATS            2.33 UNNAMED                  0.93

 

Next it reports on sessions and then on percentage of activity was allocated to different SQL types and the percentage allocated to different phases of the SQL, (what percentage of activity was consumed by hard parsing vs. SQL or PL/SQL execution….)

Just like an ADDM report, the top SQL statements, including SQL_ID and percentage of activity is reported:

Sampled #
SQL ID             Planhash        of Executions     % Activity
----------------------- -------------------- -------------------- --------------
Event                          % Event Top Row Source                    % RwSrc
------------------------------ ------- --------------------------------- -------
7mqykaw68fkhb           2691428406                    6           4.19
CPU + Wait for CPU                4.19 TABLE ACCESS - FULL                  2.79
SELECT COUNT(*) FROM DBA_SCHEDULER_JOB_RUN_DETAILS WHERE OWNER = MGMT_USER.GET_R
EPOSITORY_OWNER AND JOB_NAME = UPPER(:B3 ) AND STATUS = 'SUCCEEDED' AND ACTUAL_S
TART_DATE >= :B2 AND ACTUAL_START_DATE < :B1

 

The report will go through this for top row source, literals, actions, etc.  It will then jump into the top procedures that were called:

PLSQL Entry Subprogram                                            % Activity
----------------------------------------------------------------- ----------
PLSQL Current Subprogram                                           % Current
----------------------------------------------------------------- ----------
SYSMAN.EM_TASK.WORKER                                                   8.84
SQL                                                                  6.05

SYS.DBMS_ASSERT.SIMPLE_SQL_NAME                                      1.40
SYSMAN.EM_PING.RECORD_BATCH_HEARTBEAT                                   3.26
SQL                                                                  1.86

 

This can help you quickly find what procedures to search for the SQL shown in the sections before.

Blocking sessions, top sessions and top latches are just some of the areas, just as in a standard AWR report that are covered, but this is all for the EM repository database!

This section of the report is finished up with information about the AWR retention so if you need to go back and investigate further, you know up front what you have to work with in the AWR, (Automatic Workload Repository.)

--- AWR retention

INTERVAL  RETENTION
---------- ----------
60      11520

 

No, the report is not even close to being finished for the Advisor log at this point.  It jumps promptly into a workload report to give you a good view of what kind of processing this repository database is handling daily.

FULL AWR Report

Just as with a standard AWR report, the basic information about the repository for the previous hour is included with the Advisor log.  Keep this in mind as you review this report.  If this is not a heavy use time for your EM repository database, the data may not be as valid as if we were looking at a larger timeframe.

If you are more comfortable with statspack or AWR reports, then you will want to spend more time here than the ADDM section, but both provide their own view on the data provided.

The IO Statistics by Function Summary

Reads:   Reqs   Data    Writes:  Reqs   Data    Waits:    Avg
Function Name   Data    per sec per sec Data    per sec per sec Count    Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re    818M     2.1 .227007      0M     0.0      0M    7628     0.1
Others              62M     1.1 .017205     44M     0.8 .012210    6643     0.8
DBWR                 0M     0.0      0M     94M     1.7 .026086    5938     0.5
LGWR                 1M     0.0 .000277     40M     2.2 .011100   15.3K     0.5
Direct Reads         2M     0.0 .000555      0M     0.0      0M      17     0.0
Direct Writes        0M     0.0      0M      1M     0.0 .000277      11     0.7
TOTAL:             883M     3.2 .245045    179M     4.6 .049675   35.5K     0.5
------------------------------------------------------------

Direct Reads
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Reads (Data File)
2M     0.0 .000555      0M     0.0      0M       0     N/A
Direct Writes
0M     0.0      0M      1M     0.0 .000277       0     N/A
Direct Writes (Data File)
0M     0.0      0M      1M     0.0 .000277       0     N/A
TOTAL:
883M     3.2 .245045    178M     4.6 .049397   11.6K     0.1
------------------------------------------------------------

Enqueue Activity

Enqueue Type (Request Reason)
------------------------------------------------------------------------------
Requests    Succ Gets Failed Gets       Waits  Wt Time (s) Av Wt Time(ms)
------------ ------------ ----------- ----------- ------------ --------------
JS-Job Scheduler (queue lock)
25,724       25,724           0          59            0           1.19
KO-Multiple Object Checkpoint (fast object checkpoint)
30           30           0           3            0           6.67
SQ-Sequence Cache
75           75           0           3            0           3.33
------------------------------------------------------------

Also spend some time inspecting the Latch and Mutex sections.  These are areas of memory that many Oracle databases are experiencing challenges with from time to time.  These allocations in memory are good to understand and even the EM repository database experiences issues with both when concurrency is commonly involved.

I’ve been a long time proponent of the Segment reporting section in the AWR.  This report can often seal up remaining questions that DBAs have after seeing the top SQL by elapsed time that were answered upon viewing the execution plan.

Segments by Physical Reads
-> Total Physical Reads:         105,015
-> Captured Segments account for   98.1% of Total

Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
SYS        SYSAUX     SCHEDULER$_JOB_RUN_D            TABLE       59,463   56.62
SYS        SYSAUX     SCHEDULER$_EVENT_LOG            TABLE       43,048   40.99
SYS        SYSTEM     OBJ$                            TABLE          266     .25
SYS        SYSAUX     I_WRI$_OPTSTAT_H_OBJ            INDEX          107     .10
SYSMAN     MGMT_TABLE EM_EV_METRIC_ALERT   P201404    TABLE           47     .04

 

There will be sections for logical, physical and unoptimized reads, along with row locks, table scans, writes and even ITL waits.  The report continues with library cache and all variations of memory information.  This is an area that I feel more than an hour could be beneficial for the REPVFY utility output and a future enhancement might be to follow with an argument for time frame of snapshots to be utilized for the report.

SGA Resizing Report

This report provides not just SGA information, but also PGA and caches.  It starts out by giving the reader a view of any specially sized buffer caches and each of the pools, (large, java, ASM, etc.)

-- SGA Dynamic Components
--  Cache: D:DEFAULT, K:KEEP, R:RECYCLE
--   Mode: DEF:DEFerred mode, IMM:IMMediate mode

Last Oper  Last
Component              Size (M) Type/Mode  Operation Time
---------------------- -------- ---------- --------------------
D: 2K buffer cache            0 STATIC
D: 4K buffer cache            0 STATIC
D: 8K buffer cache            0 STATIC
D:16K buffer cache            0 STATIC
D:32K buffer cache            0 STATIC
D:buffer cache              448 INITIALIZI

From there, it steps into resize operations for SGA and then PGA.

-- PGA Resize Operations overview

I# SNAP_TIME            PGA_TARGET PGA_ALLOC    PROCS        AUTO      MANUAL
---- -------------------- ---------- --------- -------- ----------- -----------
1 02-APR-2014 15:00:00        493       281       60        0.00        0.00
1 02-APR-2014 14:00:00        493       290       66        0.00        0.00
1 02-APR-2014 13:00:00        493       311       70        0.00        0.00
1 02-APR-2014 12:00:00        493       286       66        0.00        0.00

 

This part of the report is not held to the 1 hour snapshot as we experience in the ADDM and AWR section of the Advisor log.  The data for these sections went back for one week and provided a clear picture of resizing activities for the SGA and PGA.

Repository CBO Report

This section is built out of SQL added to the advisor.sql in the REPVFY utility.  Its only found in the Advisor report and is essential to understanding specific information in terms to CBO, (cost based optimizer) information to the repository database.  To think of this as schema specific is more in line with the value provided from the information.

If you remember back to when you installed the EM12c on the repository database, it has you disable the automated dbms_stats job?  This is now providing you with the internal stats job information that provides the correct statistical information to the CBO-

TABLE_NAME                       INCREMENTAL GRANULARITY     STALE_PCT
-------------------------------- ----------- --------------- ---------
ADP_METRIC0                      TRUE
ADP_METRIC1                      TRUE
ADP_METRIC2                      TRUE
EMCT_CBA_CHARGE_1DAY             TRUE
EMCT_CBA_CHARGE_1HOUR            TRUE
EM_EVENTS_INCIDENT               TRUE                        10
EM_EVENT_ANNOTATIONS             TRUE                        10
EM_EVENT_CONTEXT                 TRUE
EM_EVENT_MSGS                    TRUE                        10
EM_EVENT_PROCESSING_STATE        TRUE
EM_EVENT_RAW                     TRUE                        10
EM_EVENT_SEQUENCES               TRUE                        10
EM_EVENT_SEQ_STATE                                           10
EM_METRIC_VALUES                             AUTO            10
EM_METRIC_VALUES_DAILY                       AUTO            10
EM_METRIC_VALUES_HOURLY                      AUTO            10

 

As the repository database scales, the EM stats feature can be scaled to support more extensive statistics information as well.  I am cautious to say this and will say it with a grain of salt, but it backs up my experience that 95% of databases are served successfully with the current automated stats job, but that there are types that require a DBA with knowledge of the schema, data and code to ensure the CBO has what it needs to perform efficiently.  The biggest challenge is that there aren’t a lot of DBAs with that knowledge or time to invest in this task, (I know, I did this in two companies and it must be a true passion.)  EM12c has removed this challenge from the DBAs plate by setting up the one-off stats collection requirement for them as part of the repository installation.

They’ve enhanced that in the REPVFY by having a utility step that checks to see what is currently in place and then the ability to review it as part of this output.

TABLE_NAME                       RUN_TYPE                         INCREMENTAL GRANULARITY     STALE_PCT
-------------------------------- -------------------------------- ----------- --------------- ---------
EM_EVENTS_INCIDENT               DAILY                                                        20
EM_EVENT_ANNOTATIONS             DAILY                                                        20
EM_EVENT_MSGS                    DAILY                                                        20
EM_EVENT_RAW                     DAILY                                                        20

 

Curious about when and how long the jobs that do this work run?  This is included as well-

[----- CBO statistics gathering log (last 3 days) ---------------------------]

LOG_DATE             RUN_TYPE                           DURATION OBJECTS_ANALYZED OBJECTS_STALE OBJECTS_FILTER VALUE
-------------------- -------------------------------- ---------- ---------------- ------------- -------------- --------------------------------------------------------------------------------
02-APR-2014 14:24:16 PROACTIVE                                 5                3            69             24 P:1.86:EM_EVENT_ANNOTATIONS;1.13:MGMT_JOB_EXECUTION;.71:EM_EV_METRIC_ALERT;
02-APR-2014 12:24:16 PROACTIVE                                 9                5            71             24 P:2.47:EM_EVENT_RAW;1.97:EM_EVENT_SEQUENCES;1.34:MGMT_JOB_EXECUTION;.87:EM_EVENT
02-APR-2014 10:24:16 DAILY                                   123              272           457                D:6.91:EM_JOB_METRICS;5.66:MGMT_SYSTEM_PERFORMANCE_LOG;4.81:MGMT_ARU_PATCH_FIXES
02-APR-2014 08:24:16 PROACTIVE                                13                7            74             24

 

Table stats in the repository?  Yep, that’s included, too!

[----- Table statistics -----------------------------------------------------]

TABLE_NAME                       PARTITION_NAME                 LAST_ANALYZED          NUM_ROWS     BLOCKS AVG_ROW_LEN GLO USE STATT STA
-------------------------------- ------------------------------ -------------------- ---------- ---------- ----------- --- --- ----- ---
EM_METRIC_VALUES_DAILY                                          29-MAR-2014 10:25:19      95257       2040          16 YES NO  ALL   YES
EM_METRIC_VALUES_HOURLY                                         01-APR-2014 10:26:49     681018      15820          16 YES NO  ALL   YES
MGMT_DB_FEATUREUSAGE                                            02-APR-2014 10:25:13       5713        244         149 YES NO  ALL   YES

 

And with all of that,  (pages and pages…) we’re done with the first of many reports.  I’ll keep to the most essential data in each, but hopefully you’ve learned something you hadn’t known before about the REPVFY utility and I’ll update with the next log report soon!

Now that we learned in Part I how to create a notification schedule for a specified user so that it will only page one individual on an oncall rotation, I’ll now show you how to use this in conjunction with rulesets to complete the process of modernizing and automating your oncall.

In my Test scenario, I’ve already made copies of the main rule sets, have deviated rules by three rules and four groups, (mission critical, production, test and development.)

  • Mission critical are all systems that I WANT to be paged for after hours and need to know immediately if there is an issue.
  • Production are important systems that I want to be notified of, but do NOT want to wake up resources at night time of, as these incidents will not cause an outage and can be addressed in the business day.
  • Development and test, for this example, will be treated the same way and will only email issues to the DBA team, as the development group has privileges to often address issues they have caused in the process of developing and testing.

With this said, now we look at our Rule Sets.  Notice I’ve disabled the default rules that come with the EM12c, but have made copies that I can use to offer the best value to the business.

rule_sets_dba_notif2

 

Let’s click on the left box to the rule named Mission Critical Incident Rule Set and click on Actions, Edit.

rule_sets_dba_notif3

 

Notice that the rule set is only specific to Specific Targets by group and the group is Mission Critical.  We then click on the Rules tab and we can update the current rule setting to use our new DBA_PAGE user!

Editing the Rule Set

There are two things we are going to do to the first, major rule set for metric alerts.

1.  We are going to limit the categories that we want to be notified for.  I want you to see this option so you can decide what categories that you want to page on and can update this as you want.  For our example, I am going to limit our metric rule set to the categories of Capacity and Availability.  At the end, I’ll show you how to create a second rule to email on the rest.

Click on the left of the rule set and then click on Edit up in the menu bar.

rule_sets_dba_notif4

 

We’ll then edit the categories to limit to the two we spoke about above.  You’ll be able to see all the categories in the drop down and note in your incident notifications you receive what categories are important to you for email vs. pages.

rule_sets_dba_notif5

 

Click on Next to continue onto the actions.

2.  We are going to update the rule to use the new DBA_PAGE user to page for the metric alerts in these two categories.

rule_sets_dba_notif6

 

 

We’ve already got a pretty advanced set of actions to perform in the Action Summary.  I like to automate many of the Incident Manager tasks, but this time around, we are going to add a page option to the actions.  The firsts rule is already highlighted, so simply click on Edit in the menu bar to proceed.

rule_sets_dba_notif7

I’m going to have all paged Incidents assigned automatically to the DBA_PAGE user and then I’m going to add the DBA_PAGE in the Page section in the Basic Notification section.

rule_sets_dba_notif8

You can also enable Repeat notifications if you want to take this one step further, (we’ll go into this more in another blog post, but I thought I’d at least show it to you in this one… :))

If satisfied, I’d recommend clicking in the box to clear events permanently to keep your repository cleaned up and then click continue to review the changes you’ve made:

rule_sets_dba_notif9

 

You’ll now see that the owner for the incident will be set to the DBA_PAGE user and you now page the DBA_PAGE user along with emails sent.  Keep in mind, only during the notification schedule active times will this be able to notify, so Monday-Friday, 8-5, no paging will occur as that’s how we set it up in our Part I of this blog post series.

We now need to do this for each of the rules that we want to page on from this rule set:

  • Create incident for critical metric alerts, (for only two categories…)
  • Create incident for critical Service Level Agreement Alerts
  • Incident creation Rule for Target Down availability stattus
  • Incident creation rule for high-availability events.

As promised, I think created a new rule set to cover the categories no longer covered in the original.  Click on Create in the menu bar and create the following rule:

rule_sets_dba_notif10

 

Add the following Actions, ENSURING you do not choose to PAGE and assign the rule to SYSMAN as you would do your standard rules you did before.  This is just to cover all your other categories, but NOT PAGE on them.

rule_sets_dba_notif11

 

Complete the review, name the rule Metric Alert for All Other Categories to Email and then click Save and then SAVE ALL YOUR CHANGES you’ve made.

You have now updated all the rule sets to use the DBA_PAGE user we set up in Part I of this series and it will only notify that user during the scheduled time it is to be active, in this case, just for after hours support and to rotate on a one week schedule between three DBA’s SMS contact information!

 

 

 

 

I came across a discussion on Oracle-l on how after hours paging was handled for many companies and was kind of surprised how many DBAs still carry around a secondary pager/cell phone or are just expected to be woke up if on call or not.  I’m not one to go back to sleep once I’m woke, so I’ve been a proponent of EM notification schedules for after hours paging.  Now there are other ways to handle this in Enterprise Manager 12c as well, but we’ll use this method, as it is backward compatible to OEM 10g, too.

The requirement of this setup is to have an inbox alerting option separate from your SMS/Texting option on your smartphone, which can be Android or iphone, even blackberry, any of these are more than acceptable to satisfy the requirement.  The next is know your SMS notification email address.

Example-  TMobiles would be <cell number>@tmomail.net and ATT would be <cell number>@mms.att.net.

Your mobile provider can verify what your email extension is for your SMS address.  With this information in hand, you can then proceed with the setup.  The design is simple-  All email notification will continue to go to your email, no matter if warnings or critical, but we will create a secondary user that will have the SMS addresses for the DBAs oncall and set them up for a rotation to be notified of critical issues.

Sit down and first figure out how often you want to rotate your schedule, weekly, every two weeks, once per month and any vacation or time off coming up.  That should tell you what your rotation needs to be to keep things sane.

Create the Paging User in Enterprise Manager

First we need to create the user that will then be utilized for paging.  This can be done from the Setup, Security, Administrators menu in the console.

admin_pg_0416

We have all our DBAs currently listed, but for this user, we will need each of their SMS addresses and once you have those, click on Create.

Name the user DBA_PAGE and put in the following values, and it can be set to an Administrator, but just for this example, I shortened the process and created it as a Super Admin, (this is not required…)

dba_page_0416

Note that I’ve entered each of the DBA’s SMS addresses into the Email Address section, separated by commas and that I’ve entered a clear description of what this login is used for.

Click on Finish and your new DBA_PAGE user is now listed.

admin_pg_2_0416

Managing the DBA_PAGE User

Once you’ve completed the entry and then log out of the EM12c console as your current user and log back in as the DBA_PAGE user, (just easier to manage the notification schedule as this user…)

dba_page_login_0416

 

Next you need to create the notification schedule, but I’ll first show you how you can add more users easily to this Administrator:

dba_page_login_3_0416

 

Once you’ve accessed this page, you will note that new email addresses can be added easily:

dba_page_info

 

The Add Another Row option will allow you to add another SMS address and then you can check mark just that address and test it.  You can also remove addresses from here of DBAs that have moved on from the company.

Editing the My Notification Schedule

dba_page_notif_sched1

 

This is again accessed from the Setup menu on the right hand of screen and once you are in the My Notification Schedule, you will see the following:

dba_page_notif_sched2

This is the default schedule, starts from the day the user was created and would notify ALL email addresses in the list.  We are going to replace this schedule with a new one that supports a three week rotation of oncall, notifying on one DBA each week, switching automatically each week to the next DBAs SMS address.

Click on Edit Schedule Definition in the upper right of the screen.

dba_page_notif_sched3

As starting a rotation in the middle of a schedule would be difficult, we are going to choose the following Monday as our beginning.  You can see the default is a one week rotation and that the time zone is set to Canadian Central Time, (I have no idea where that came from… :))

dba_page_notif_sched4

 

For our example, we are going for a three week rotation, (three DBAs, three week rotation… :)) We’ll start the rotation this following Monday, April 21st and we’re going to say that the DBAs are on Mountain Time.

We are then given a schedule of three weeks to fill in for who is oncall.  By default, it will want to offer us all three SMS addresses entered for the user.  We will only use the first one, we will click on Week 1 to fill in, ensure we are VIEWING Week 1 and now to fill in our schedule.

dba_page_notif_sched5

 

To create a schedule that DOES NOT notify during business hours, there is a batch file for each week of three steps.  There is no way around this, so I’ll use the following schedule creation and then repeat it for the 2nd and 3rd week for the two other DBAs.

dba_page_notif_sched6

 

Now notice that we’re still viewing and editing week 1, and choosing all the days, but I’ve changed the time to show 12am-8am to fill in with the first DBA’s SMS address.

Finally, we fill in for the last time slot, Saturday and Sunday during the weekend daytime.

dba_page_notif_sched7

 

Your scheduled should now look something like this, with all sections above and below business hours filled in with the 1st DBA’s SMS Address for notifications:

dba_page_notif_sched8

 

We then will click on the down arrow button on the View menu and choose the second week, then do the same for the check mark in the box for Weeks, ensuring we are changing the schedule for Week 2.  We’ll repeat the scheduling, but will also change the Email Address to the second DBA’s SMS Address by clicking on the search.

dba_page_notif_sched9

Once you have filled in everything for the Second DBA, you will switch to View Week 3 and edit Week 3 for the third DBA and fill in that schedule:

dba_page_notif_sched10

 

Once completed, Save the changes.

The default view may alarm you at first, as it will be empty:

dba_page_notif_sched11

 

The reason is the first date that is shown, which is the current date and if you remember from our example, we decided to start our rotation on April 21st.  Go to the View Week Containing and change it to April 21st.

You will now see the rotation schedule for the first DBA for after hours.

dba_page_notif_sched13

 

Updating the date for the next week will display the 2nd DBA’s rotation:

dba_page_notif_sched14

 

And then if you switch the date to first week of May, we see the third DBA:

dba_page_notif_sched15

 

If you go past that, you’ll see that the rotation continues, starting again with the first DBA for the next week and will continue on without change unless you edit or replace the schedule.

If for some reason, let’s say a DBA can’t take a shift that is in their weekly schedule for one night, you can go into that DAY’s shift and edit it to one of the other two DBAs in the list.

Now you’ve finished setting up an after hours pager in a notification schedule.  In Part II, I will show you how to use the notification schedule/user with rule sets to page on critical notifications.

 

 

 

 

EM12c Auditing

Lately I’ve been having more discussions on securing the EM12c environment.  All of IT has a tendency to treat the Enterprise Manager as a afterthought in both hardware allocation, as well as security best practices.  No one is sure of exactly why this is-  they all have their theories, but we do know it happens often.

Today we are going to go over some of the auditing options within EM12c.  Basic auditing is turned on by default in the environment, but only covers basics processes.  There are over 150 auditing options and extensive information can be collected, retained within the repository, as well as turned into an externalized service to reside as log files on the OS file system.  These options include login/logout information, updates, OMS password changes and EM key copy and removals from the repository.

Basic auditing information can be gained through the console via the Setup, Security, Auditing Data menu option, but the auditing configuration, additional features, updates and externalized service setup, must be performed through the Enterprise Manager command line interface, (EM CLI).

If you haven’t used the EM CLI before, please refer to my blog post on Beginning with the Command Line Interface, otherwise log in a user with appropriate rights to run the EM CLI and connect to the repository.

First, let’s inspect the current operations list and what will impact the infrastructure if executed:

audit_blog_1

Note that the last option, APPLY_UPDATE, is to update the repository and yes, it will impact the infrastructure by doing so.

Next, let’s look at the current settings. As I stated earlier, auditing is turned on by default, but the next options are disabled for the externalized service, so it is marked as disabled.

audit_blog_2

The defaults for the externalized service, outside of the directory, (configured in the DBA_DIRECTORIES and read/write privileges granted to SYSMAN) are pre-configured with default information.

  • File prefix is the prefix used for all audit log files so that they are easily identified in the directory.
  • File size is default to 50M
  • Retention is default to 365 days.  Keep this in mind before enabling, as this could be impacting to disk space if you OS directory has limited space.

Notice that there is also a note informing you that Infrastructure Audit is always on, (go inspect the access.log and you will see information that can be sync’d up with the emctl.log and others to create a solid picture that this feature can create for you.)

Enabling/Disabling Features

To enable or disable audit features, the following syntax is used:

>emcli update_audit_settings -audit_switch="ENABLE/DISABLE" -
operations_to_enable="<insert operation name here or just say ALL>" -
operations_to_disable="<insert operation name here or just say ALL>" 

To demonstrate this, we’ll enable auditing for logins and logouts:

audit_blog_3

The response letting us know if the change was successful in the auditing configuration completes the task and we can move on to other tasks.

Next, we’ll configure the externalized service for auditing.  This is an excellent choice and should be considered for all EM12c environments.  Even with high availability options, the idea of keeping a minimum of 7-31 days of auditing information regarding the EM12c environment, especially considering the access and power of the EM12c, is a good idea.

The syntax for the configuration for the externalized auditing service is:

>emcli update_audit_settings -file_prefix=<file_prefix> -
directory_name=<directory_name> -file_size = <file size> -data_retention_period=<period in days> 

And in our example, we will update the service to file sizes of 25M each, with a prefix of “em12c_audit” and retain 31 days of audit files that our OS file system can easily handle.

>emcli update_audit_settings -externalization_switch=ENABLE -file_prefix=em12c_audit -directory=AUD_DMP -file_size=25000000 -data_retention_period=31

After executing this statement, the audit files will automatically start generating to the directory, (make sure you HAVE created a DBA Directory to hold this data first!) and we can then view logs as needed to inspect what activity is occurring in the EM12c environment.

This is a solid best practice to ensure you are offering one more line of protection to the database and software that is essential to you, your business and your environment.

 

I know that folks have been having some challenges downloading my slides from Collaborate for a couple of my sessions and I know I’ve received errors when updating two of them the other day, so I’ve added them to my slideshare location for your convenience.

Thank you for everyone who attended my sessions-  such great turn outs and fantastic attendee participation!  I had some real EM12c stars in the audience and appreciated the support and banter!

My slideshare list for download can be accessed from here.

Thank you to IOUG, Oracle and everyone for their support!

I’m often present to overhear comments like the following when issues arise:

  • I *think* someone changed something.
  • I bet some DBA changed a parameter!
  • I know <insert name of person on the bad list> is running that process I told him/her not to!

Making assumptions vs. having data is a good way to alienate peers, coworkers and customers.

There is a great feature in the EM12c that I’ve recommended that can easily answer the “What changed?” questions and deter folks from making so many assumptions about who’s guilty without any data to support the conclusion.  It’s called the ADDM Compare Report.

The ADDM Compare Report is easy to access, but may require the installation of a set of views to populate the report in the EM console.  If asked to install, simply ensure you are logged in as a Super Administrator to the console and as a user with AWR rights to install to the EM repository.

Once installed, you will return to the same access menu from the Performance menu in the database page to access the comparison:

addm_cm_0326_menu

Now lets give an example of why you might want to use this feature.  In the following example, we can see that performance in this RAC environment has degraded significantly in a short period of time:

addm_cm_0326_main

We can see from this snapshot from the EM Top Activity graph that performance has degraded significantly from about 9am with a significant issue with concurrency around 11am-12pm on March 21st.

Gather Information and Run the ADDM Compare Report

We then go to our ADDM Compare report and enter in the time for the comparison period that we are concerned about:

addm_cm_0326_times

I’ve chosen a custom window for my base period, picking 8-9am, when performance was normal in Top Activity.  If you receive an error, first thing to check is your date and times.  It is very easy to mistakenly choose a wrong date or time.  Also remember that as this is ADDM, it will work on AWR snapshots, so you must choose interval times, (1 hour by default) so depending on your interval for snapshots, no matter what time you enter, the ADDM Compare report will choose the closest snapshot to the time entered.

The High Level Comparison

The report will then be generated and the report will then be displayed.

addm_cm_0326_top_grph

What we see in the top section of our report is a graph display of activity of our Base Period, (8-9am) vs. our Comparison Period, (11am-12pm).  You can see a significant change in activity, but you can also see at the top left of the graph is the SQL Commonality is only 43%.  This means that there is only 43% of the same SQL in the two times we are comparing.  57% is a large percent of difference to chase down, but we are aware, there were significant differences in what SQL was running in the base period vs. what ran during the comparison.  Also note the Average Active Sessions, shown in red at the bottom center of our example.  The Comparison Period has 5.19 on average, which is significantly more than the Base Period.

It’s All in the Details

The default in the second section of the report is the Configuration.  This is where you would quickly discern if there were any changes to parameters, memory, etc. that are global to the database system.

addm_cm_0326_config

As you can see, our example had no changes to any global settings, including parameters, etc.  If you would like to inspect the current settings, you can click on the All radio button and view everything.

The next step is to view the Findings for the report.  Depending on your report, you will have different data that will show in the Performance Differences column.  In this example, we’ll go over a couple examples and what you should look for, expand upon, etc. to give you the most from this feature.

addm_cm_0326_f_io

This section of the report is broken down by two or three sections, sometimes requiring you to click on a + sign to expand into the section of the findings and/or use drop down arrows to extend the data detail.  The first section displayed here shows the current segments causing significant increases in User IO in the comparison period.  It is then calculated by Change Impact, Base Period and Comparison Period.  This allows the user to quickly access the total impact, as well as see the exact percentage of impact to each of the periods.

In the bottom section, if there were any global changes that were responsible for part of the impact, (System Change Breakdown) it will show for each one.  As we see so far, only the difference to the average active sessions have caused concurrency, which has been directly responsible for the issue in performance degradation.

We’ll now move into the next section down, Buffer Busy waits.  There is a significant decrease from the previous impact in the Change Period, (5.08% down to 1.64%) but we can still clearly see the increase from the Base Period, (.87% and .17%).   We are able to extend the information provided by inspecting the bottom Description, which shows us each of the objects involved in the Buffer Busy Waits, percentage of Change Impact, the Base Period % and the Comparison Period %.

addm_cm_0326_f_buffer_busy

We can then drop down to the bottom of the report and dig into the System Change Breakdown.  This includes the Workload Change, which we do see here and data is provided by SQL_ID.  We see that the buffer busy wait was null for the Base Period, but contains values for the Comparison Period.  The data may be a bit misleading as it shows 0% Percent Change Impact.  The calculation for this report must have some type of value in the Base Period for it to function correctly, so take care when analyzing this part of the report.

addm_cm_0326_f_bb2_wrkld_chng

System Changes in Workload

What you can do is hover your mouse over the SQL_ID and inspect the SQL statement.

addm_cm_0326_f_bb3_sqlid

We can then …….

addm_cm_f_commits

We also have the option to decide the percentage of changes we would like to inspect.  the default is 5%, but if we want to look at less or more, we can update the value and so a search, then clicking on the + to show the SQL Breakdown.  We can only display the Regression or Improvement information for any given section to filter as needed for the investigation.

addm_cm_0326_look_for

Once into the SQL Breakdown, we can hover over any SQL_ID to display the SQL in question.  We are able to see if the performance increased or degraded from the base period and the percentage of impact.

addm_cm_0326_top_sql

With each section displayed in the Performance Difference category, we then can go down to the description and view the data, (as we see below in the the table for Row Lock Waits).  You can hide any area that is providing too much information for the inspection period, or filter what is displayed in the middle options area.

addm_cm_0326_row_lock

Resource Tab

The final tab in the ADDM Compare Report is the Resource section.  this is broken down into graphs that when any area is hovered over, display details about any given resource wait.  The first section, highlighted in light blue is the CPU.  Both the Base period and Comparison are noted to not be CPU bound.  The CPU usage by instance is shown by varying shades of green for this two instance RAC environment.

addm_cm_0326_resource_cpu

Second link in is for Memory.  Both the Base and Comparison show that no virtual memory, (paging) was experienced.  The amounts are displayed in a table format with totals.

addm_cm_0326_resource_vm

Our third Resource link is for I/O.  There are more than one graph involved in this view, the first shows if in either period if I/O was an issue and then there is a break down of the type of I/O usage was seen in both.  If you hover your mouse over any of the graph waits, such as Data Read in the example, you can see the the percentage and totals for the period is displayed.  Note-  Temp usage for read and writes are displayed in this section.

addm_cm_0326_resource_throughput

The right hand graph for I/O is all about Single Block Read Latency.  This is helpful in many systems, but especially Exadatas, where  single block reads can be especially painful.  Note that this data is displayed by the base and comparison, by milliseconds and no instance filtering.

addm_cm_0326_resource_latency

The last Resource link is for Interconnect.  For RAC environments, this graph will display the base and comparison throughput in the left hand graph, along with if either period was Interconnect Bound.  You can switch from the default display of GBit to GBit per second and remember, there won’t be any instance displayed, as this is the interconnect between the nodes on average.

addm_cm_0326_resource_interconnect_bnd

The right hand then displays Interconnect Latency.  This is based off averages, displays graphs for 8KB pings, Fast Grant and Fast Block Transfer for the base and comparison period.

addm_cm_0326_resource_interconnect_latency

Now this data may make some DBAs scratch their heads on the value, so we’ll talk about this for just a minute and try not to make this blog post any longer than is already is… :)

8KB Ping- can be found via SQL*Plus via the DBA_HIST_INTERCONNECT_PINGS

The 8KB Ping is from the WAIT_8KB column and is a sum of the round-trip times for messages of size 8 KB from the INSTANCE_NUM to TARGET_INSTANCE since  startup of the primary instance.  Dividing by CNT_8KB, (another column) gives the average latency.  So think of this as a bit about response across your interconnect.

Fast Grant- Cache fusion transfers in most cases are faster than disk access latencies , i.e. a single instance grant, so the Fast Grant is when the RAC environment experiences this and totals are averaged vs. how often a single instance grant is offered.

Fast Block Transfer-  There are multiple situations that occur for block transfers, but this is displaying a percentage of only FAST block transfers experienced on average across the nodes of the RAC environment.  Due to the fact that resolving contention for database blocks involves sending the blocks across the cluster interconnect, efficient inter-node messaging is the key to coordinating fast block transfers between nodes.

This calculation for the graph is dependent on three primary factors:

  • The number of messages required for each synchronization sequence
  • The frequency of synchronization– the less frequent, the better
  • The latency, (i.e. speed) of inter-node communications

What the Report Told us About our Example

In the example we used today, the change had been the addition of 200+ targets monitored without the next step in tuning the OMS having been taken yet.  As we were only seeing a change in the amounts of executions, weight to existing objects, etc., this was quickly addressed as the administrators of the system updated the OMS to handle the additional resource demands.

Summary

With all this data in one simple report, you’ll find that you can quickly diagnose and answer the question “What changed” without any assumptions required.  The high-level data provided answers the often quick conclusions that can send a group of highly skilled technical folks in the wrong direction.

Let’s be honest- not having a map on a road trip really is asking to wander aimlessly.

Tim and I leave for Oslo on Monday for the OUGN, (Oracle User Group Norway) conference.  This is a great conference that many of my friends have talked about for the last couple years and I finally let Oyvind talk me into going.  The conference is on a cruise ship that leaves from Oslo and sails for three days.

We’ll be quite busy-  Tim and I will be doing a keynote, “Assumptions, Killer of First Dates and Databases Everywhere”, then I’ll be presenting “DBaaS in a DBAs World” and “Master EM12c Monitoring”.  I will also have the opportunity to host my first European WIT round table, which I’m looking forward to.  I’ve had the opportunity to discuss WIT issues with men and women from Europe in the last year, so I’m really excited about talking about the initiatives they are taking on in Europe and the challenges still to be addressed.

So, hide your vikings, I’m on my way to the great North and sorry Canada, not you this time… :)

 

We’re going to continue onto Session Identifiers in the ASH Analytics Activity view.  You can use the links to view Part I and Part II to catch up on the ASH Analytics fun!  Knowing just how many ways you can break down ASH data is very helpful when you are chasing an issue and need specific data to resolve it.   We’ve already taken a deeper look into SQL and Resource Consumption, so not onto the session identifiers.

Session identifiers in ASH Analytics cover the following-

ash_0324_main

A session identifier provides distinct information about the session or sessions.  Like previous blog posts, all data for each graph is from the same timeline, so comparisons to more readily understand the differences in data has been chosen.

Instance

The data is broken down by total sessions per instance, which is extremely helpful when working in a RAC environment.  The graph will show if there is “weight” to the database resource consumption or unusual sessions, etc. from one instance or another.

ash_0324_instance

As you can see in our example, it’s pretty much 50/50 between the two, with a couple, minimal spikes from sessions that we can drill down on the right or by SQL_ID on the left.  Notice that you can see the percentage of SQL that is being executed on each of the instances.

Services

Your services are your connections into the database that each session is using.  The database environment we are using for our example only has one distinct “user” service, which is shown in hot pink.  If you have more, you can gauge the amount of resources, per user service that’s been deployed to each application/user group/host, etc.

ash_0324_services

As we’ve seen in the other activity graphs, we can easily drill down by SQL_ID on the left or instance/SID/Serial# on the right.

User Session

The activity in this graph is quite heavy.  We can identify the majority of user sessions that are consuming the highest amount of resources individually.  The rest who can’t be isolated and graphed out individually, are then grouped at the bottom and marked as “Other”.  This view comes in handy when you have one session that is an overall consumer that would be out of the norm.

ash_0324_sessions

Notice the mix of sessions that are involved in each of the SQL_ID’s on the left.  This tells us that most of the users are either accessing via an application or through packaged code.  If you were to see one session running on top SQL_ID, this might be a reason to quickly investigate for an issue.  The “Other” category may look interesting, but remember, if any one of these sessions or SQL_ID were a significant consumer of resources, they would have their own identified section in the graph.  Don’t fall into the rabbit hole… :)

Parallel Processes

There isn’t a lot that I get to show you for activity here.  The environment that I’m using for my example just isn’t set up to use or need a lot of parallel.  I have all of two parallel processes and you can see a process running in parallel on the first instance and another on the second instance, identified by color.

ash_0324_parallel

Notice that the bottom sections is our parallel process shown on both the left and the right side.  You have the option to start filtering by more information, same as what you see at the top, but this is secondary filtering.  A simple change on the right could gather a different set of data for the same processes, such as wait event or resource consumption.

User ID

As you can see, two primary User IDs are accessing the environment.  We already know that its numerous sessions, but this could signify an application ID that is used for a login, resulting in the user login security being housed at the application or OS level.  We can see that the primary usage is from the User ID with the magenta graphed section.

ash_0324_user_id

The bottom section would show the same as the top, just broken down by percentage.  What we can do instead, is break it down by a different type of data, such as wait event or even objects as I’ve shown here:

ash_0324_break_it_dwn

Program

For the Programs Activity graph, I’ve done something a bit more with the bottom section.  As you can see in the graph, we have the programs displayed across the graphs and notice for this one, the OMS is the majority of the usage, which is expected, (this is an OMS environment…. :))  Now look at the bottom section-

ash_0324_program

The default would have been to show me the Programs by percentage, but I’ve updated them by displaying now SQL information in the left and right sections.  The left section, I’ve now requested to know what SQL Plan Operation there are and by what programs.  The right I’m displaying the SQL_ID’s and from what Programs they are sourcing from.

Session Type

Last one is Session Type.  This is to simply distinguish background vs. foreground processes.  the bottom section already displays data on SQL_ID on the left and SID on the right.  You’re welcome to mix up this data and filter anyway you need to, but this is the default for this activity view.

ash_0324_session_type

Keeping to our areas we’ve already discussed in our previous posts, I’ll switch up the left and break it down further by Wait Event.  On the right, I’m just starting to choose a new filter, but notice that I’m simply clicking on the drop down menu and picking the event type I want to use for my new filter.

ash_0324_break_it_dwn_2

Using the activity graphs in ASH Analytics this way grants the DBA access to various types of diagnostic and analysis information that can be used to trouble shoot an issue or dig deeper into a performance tuning challenge when it comes to session information.