Latest Entries »

Now that we learned in Part I how to create a notification schedule for a specified user so that it will only page one individual on an oncall rotation, I’ll now show you how to use this in conjunction with rulesets to complete the process of modernizing and automating your oncall.

In my Test scenario, I’ve already made copies of the main rule sets, have deviated rules by three rules and four groups, (mission critical, production, test and development.)

  • Mission critical are all systems that I WANT to be paged for after hours and need to know immediately if there is an issue.
  • Production are important systems that I want to be notified of, but do NOT want to wake up resources at night time of, as these incidents will not cause an outage and can be addressed in the business day.
  • Development and test, for this example, will be treated the same way and will only email issues to the DBA team, as the development group has privileges to often address issues they have caused in the process of developing and testing.

With this said, now we look at our Rule Sets.  Notice I’ve disabled the default rules that come with the EM12c, but have made copies that I can use to offer the best value to the business.

rule_sets_dba_notif2

 

Let’s click on the left box to the rule named Mission Critical Incident Rule Set and click on Actions, Edit.

rule_sets_dba_notif3

 

Notice that the rule set is only specific to Specific Targets by group and the group is Mission Critical.  We then click on the Rules tab and we can update the current rule setting to use our new DBA_PAGE user!

Editing the Rule Set

There are two things we are going to do to the first, major rule set for metric alerts.

1.  We are going to limit the categories that we want to be notified for.  I want you to see this option so you can decide what categories that you want to page on and can update this as you want.  For our example, I am going to limit our metric rule set to the categories of Capacity and Availability.  At the end, I’ll show you how to create a second rule to email on the rest.

Click on the left of the rule set and then click on Edit up in the menu bar.

rule_sets_dba_notif4

 

We’ll then edit the categories to limit to the two we spoke about above.  You’ll be able to see all the categories in the drop down and note in your incident notifications you receive what categories are important to you for email vs. pages.

rule_sets_dba_notif5

 

Click on Next to continue onto the actions.

2.  We are going to update the rule to use the new DBA_PAGE user to page for the metric alerts in these two categories.

rule_sets_dba_notif6

 

 

We’ve already got a pretty advanced set of actions to perform in the Action Summary.  I like to automate many of the Incident Manager tasks, but this time around, we are going to add a page option to the actions.  The firsts rule is already highlighted, so simply click on Edit in the menu bar to proceed.

rule_sets_dba_notif7

I’m going to have all paged Incidents assigned automatically to the DBA_PAGE user and then I’m going to add the DBA_PAGE in the Page section in the Basic Notification section.

rule_sets_dba_notif8

You can also enable Repeat notifications if you want to take this one step further, (we’ll go into this more in another blog post, but I thought I’d at least show it to you in this one… :))

If satisfied, I’d recommend clicking in the box to clear events permanently to keep your repository cleaned up and then click continue to review the changes you’ve made:

rule_sets_dba_notif9

 

You’ll now see that the owner for the incident will be set to the DBA_PAGE user and you now page the DBA_PAGE user along with emails sent.  Keep in mind, only during the notification schedule active times will this be able to notify, so Monday-Friday, 8-5, no paging will occur as that’s how we set it up in our Part I of this blog post series.

We now need to do this for each of the rules that we want to page on from this rule set:

  • Create incident for critical metric alerts, (for only two categories…)
  • Create incident for critical Service Level Agreement Alerts
  • Incident creation Rule for Target Down availability stattus
  • Incident creation rule for high-availability events.

As promised, I think created a new rule set to cover the categories no longer covered in the original.  Click on Create in the menu bar and create the following rule:

rule_sets_dba_notif10

 

Add the following Actions, ENSURING you do not choose to PAGE and assign the rule to SYSMAN as you would do your standard rules you did before.  This is just to cover all your other categories, but NOT PAGE on them.

rule_sets_dba_notif11

 

Complete the review, name the rule Metric Alert for All Other Categories to Email and then click Save and then SAVE ALL YOUR CHANGES you’ve made.

You have now updated all the rule sets to use the DBA_PAGE user we set up in Part I of this series and it will only notify that user during the scheduled time it is to be active, in this case, just for after hours support and to rotate on a one week schedule between three DBA’s SMS contact information!

 

 

 

 

I came across a discussion on Oracle-l on how after hours paging was handled for many companies and was kind of surprised how many DBAs still carry around a secondary pager/cell phone or are just expected to be woke up if on call or not.  I’m not one to go back to sleep once I’m woke, so I’ve been a proponent of EM notification schedules for after hours paging.  Now there are other ways to handle this in Enterprise Manager 12c as well, but we’ll use this method, as it is backward compatible to OEM 10g, too.

The requirement of this setup is to have an inbox alerting option separate from your SMS/Texting option on your smartphone, which can be Android or iphone, even blackberry, any of these are more than acceptable to satisfy the requirement.  The next is know your SMS notification email address.

Example-  TMobiles would be <cell number>@tmomail.net and ATT would be <cell number>@mms.att.net.

Your mobile provider can verify what your email extension is for your SMS address.  With this information in hand, you can then proceed with the setup.  The design is simple-  All email notification will continue to go to your email, no matter if warnings or critical, but we will create a secondary user that will have the SMS addresses for the DBAs oncall and set them up for a rotation to be notified of critical issues.

Sit down and first figure out how often you want to rotate your schedule, weekly, every two weeks, once per month and any vacation or time off coming up.  That should tell you what your rotation needs to be to keep things sane.

Create the Paging User in Enterprise Manager

First we need to create the user that will then be utilized for paging.  This can be done from the Setup, Security, Administrators menu in the console.

admin_pg_0416

We have all our DBAs currently listed, but for this user, we will need each of their SMS addresses and once you have those, click on Create.

Name the user DBA_PAGE and put in the following values, and it can be set to an Administrator, but just for this example, I shortened the process and created it as a Super Admin, (this is not required…)

dba_page_0416

Note that I’ve entered each of the DBA’s SMS addresses into the Email Address section, separated by commas and that I’ve entered a clear description of what this login is used for.

Click on Finish and your new DBA_PAGE user is now listed.

admin_pg_2_0416

Managing the DBA_PAGE User

Once you’ve completed the entry and then log out of the EM12c console as your current user and log back in as the DBA_PAGE user, (just easier to manage the notification schedule as this user…)

dba_page_login_0416

 

Next you need to create the notification schedule, but I’ll first show you how you can add more users easily to this Administrator:

dba_page_login_3_0416

 

Once you’ve accessed this page, you will note that new email addresses can be added easily:

dba_page_info

 

The Add Another Row option will allow you to add another SMS address and then you can check mark just that address and test it.  You can also remove addresses from here of DBAs that have moved on from the company.

Editing the My Notification Schedule

dba_page_notif_sched1

 

This is again accessed from the Setup menu on the right hand of screen and once you are in the My Notification Schedule, you will see the following:

dba_page_notif_sched2

This is the default schedule, starts from the day the user was created and would notify ALL email addresses in the list.  We are going to replace this schedule with a new one that supports a three week rotation of oncall, notifying on one DBA each week, switching automatically each week to the next DBAs SMS address.

Click on Edit Schedule Definition in the upper right of the screen.

dba_page_notif_sched3

As starting a rotation in the middle of a schedule would be difficult, we are going to choose the following Monday as our beginning.  You can see the default is a one week rotation and that the time zone is set to Canadian Central Time, (I have no idea where that came from… :))

dba_page_notif_sched4

 

For our example, we are going for a three week rotation, (three DBAs, three week rotation… :)) We’ll start the rotation this following Monday, April 21st and we’re going to say that the DBAs are on Mountain Time.

We are then given a schedule of three weeks to fill in for who is oncall.  By default, it will want to offer us all three SMS addresses entered for the user.  We will only use the first one, we will click on Week 1 to fill in, ensure we are VIEWING Week 1 and now to fill in our schedule.

dba_page_notif_sched5

 

To create a schedule that DOES NOT notify during business hours, there is a batch file for each week of three steps.  There is no way around this, so I’ll use the following schedule creation and then repeat it for the 2nd and 3rd week for the two other DBAs.

dba_page_notif_sched6

 

Now notice that we’re still viewing and editing week 1, and choosing all the days, but I’ve changed the time to show 12am-8am to fill in with the first DBA’s SMS address.

Finally, we fill in for the last time slot, Saturday and Sunday during the weekend daytime.

dba_page_notif_sched7

 

Your scheduled should now look something like this, with all sections above and below business hours filled in with the 1st DBA’s SMS Address for notifications:

dba_page_notif_sched8

 

We then will click on the down arrow button on the View menu and choose the second week, then do the same for the check mark in the box for Weeks, ensuring we are changing the schedule for Week 2.  We’ll repeat the scheduling, but will also change the Email Address to the second DBA’s SMS Address by clicking on the search.

dba_page_notif_sched9

Once you have filled in everything for the Second DBA, you will switch to View Week 3 and edit Week 3 for the third DBA and fill in that schedule:

dba_page_notif_sched10

 

Once completed, Save the changes.

The default view may alarm you at first, as it will be empty:

dba_page_notif_sched11

 

The reason is the first date that is shown, which is the current date and if you remember from our example, we decided to start our rotation on April 21st.  Go to the View Week Containing and change it to April 21st.

You will now see the rotation schedule for the first DBA for after hours.

dba_page_notif_sched13

 

Updating the date for the next week will display the 2nd DBA’s rotation:

dba_page_notif_sched14

 

And then if you switch the date to first week of May, we see the third DBA:

dba_page_notif_sched15

 

If you go past that, you’ll see that the rotation continues, starting again with the first DBA for the next week and will continue on without change unless you edit or replace the schedule.

If for some reason, let’s say a DBA can’t take a shift that is in their weekly schedule for one night, you can go into that DAY’s shift and edit it to one of the other two DBAs in the list.

Now you’ve finished setting up an after hours pager in a notification schedule.  In Part II, I will show you how to use the notification schedule/user with rule sets to page on critical notifications.

 

 

 

 

EM12c Auditing

Lately I’ve been having more discussions on securing the EM12c environment.  All of IT has a tendency to treat the Enterprise Manager as a afterthought in both hardware allocation, as well as security best practices.  No one is sure of exactly why this is-  they all have their theories, but we do know it happens often.

Today we are going to go over some of the auditing options within EM12c.  Basic auditing is turned on by default in the environment, but only covers basics processes.  There are over 150 auditing options and extensive information can be collected, retained within the repository, as well as turned into an externalized service to reside as log files on the OS file system.  These options include login/logout information, updates, OMS password changes and EM key copy and removals from the repository.

Basic auditing information can be gained through the console via the Setup, Security, Auditing Data menu option, but the auditing configuration, additional features, updates and externalized service setup, must be performed through the Enterprise Manager command line interface, (EM CLI).

If you haven’t used the EM CLI before, please refer to my blog post on Beginning with the Command Line Interface, otherwise log in a user with appropriate rights to run the EM CLI and connect to the repository.

First, let’s inspect the current operations list and what will impact the infrastructure if executed:

audit_blog_1

Note that the last option, APPLY_UPDATE, is to update the repository and yes, it will impact the infrastructure by doing so.

Next, let’s look at the current settings. As I stated earlier, auditing is turned on by default, but the next options are disabled for the externalized service, so it is marked as disabled.

audit_blog_2

The defaults for the externalized service, outside of the directory, (configured in the DBA_DIRECTORIES and read/write privileges granted to SYSMAN) are pre-configured with default information.

  • File prefix is the prefix used for all audit log files so that they are easily identified in the directory.
  • File size is default to 50M
  • Retention is default to 365 days.  Keep this in mind before enabling, as this could be impacting to disk space if you OS directory has limited space.

Notice that there is also a note informing you that Infrastructure Audit is always on, (go inspect the access.log and you will see information that can be sync’d up with the emctl.log and others to create a solid picture that this feature can create for you.)

Enabling/Disabling Features

To enable or disable audit features, the following syntax is used:

>emcli update_audit_settings -audit_switch="ENABLE/DISABLE" -
operations_to_enable="<insert operation name here or just say ALL>" -
operations_to_disable="<insert operation name here or just say ALL>" 

To demonstrate this, we’ll enable auditing for logins and logouts:

audit_blog_3

The response letting us know if the change was successful in the auditing configuration completes the task and we can move on to other tasks.

Next, we’ll configure the externalized service for auditing.  This is an excellent choice and should be considered for all EM12c environments.  Even with high availability options, the idea of keeping a minimum of 7-31 days of auditing information regarding the EM12c environment, especially considering the access and power of the EM12c, is a good idea.

The syntax for the configuration for the externalized auditing service is:

>emcli update_audit_settings -file_prefix=<file_prefix> -
directory_name=<directory_name> -file_size = <file size> -data_retention_period=<period in days> 

And in our example, we will update the service to file sizes of 25M each, with a prefix of “em12c_audit” and retain 31 days of audit files that our OS file system can easily handle.

>emcli update_audit_settings -externalization_switch=ENABLE -file_prefix=em12c_audit -directory=AUD_DMP -file_size=25000000 -data_retention_period=31

After executing this statement, the audit files will automatically start generating to the directory, (make sure you HAVE created a DBA Directory to hold this data first!) and we can then view logs as needed to inspect what activity is occurring in the EM12c environment.

This is a solid best practice to ensure you are offering one more line of protection to the database and software that is essential to you, your business and your environment.

 

I know that folks have been having some challenges downloading my slides from Collaborate for a couple of my sessions and I know I’ve received errors when updating two of them the other day, so I’ve added them to my slideshare location for your convenience.

Thank you for everyone who attended my sessions-  such great turn outs and fantastic attendee participation!  I had some real EM12c stars in the audience and appreciated the support and banter!

My slideshare list for download can be accessed from here.

Thank you to IOUG, Oracle and everyone for their support!

I’m often present to overhear comments like the following when issues arise:

  • I *think* someone changed something.
  • I bet some DBA changed a parameter!
  • I know <insert name of person on the bad list> is running that process I told him/her not to!

Making assumptions vs. having data is a good way to alienate peers, coworkers and customers.

There is a great feature in the EM12c that I’ve recommended that can easily answer the “What changed?” questions and deter folks from making so many assumptions about who’s guilty without any data to support the conclusion.  It’s called the ADDM Compare Report.

The ADDM Compare Report is easy to access, but may require the installation of a set of views to populate the report in the EM console.  If asked to install, simply ensure you are logged in as a Super Administrator to the console and as a user with AWR rights to install to the EM repository.

Once installed, you will return to the same access menu from the Performance menu in the database page to access the comparison:

addm_cm_0326_menu

Now lets give an example of why you might want to use this feature.  In the following example, we can see that performance in this RAC environment has degraded significantly in a short period of time:

addm_cm_0326_main

We can see from this snapshot from the EM Top Activity graph that performance has degraded significantly from about 9am with a significant issue with concurrency around 11am-12pm on March 21st.

Gather Information and Run the ADDM Compare Report

We then go to our ADDM Compare report and enter in the time for the comparison period that we are concerned about:

addm_cm_0326_times

I’ve chosen a custom window for my base period, picking 8-9am, when performance was normal in Top Activity.  If you receive an error, first thing to check is your date and times.  It is very easy to mistakenly choose a wrong date or time.  Also remember that as this is ADDM, it will work on AWR snapshots, so you must choose interval times, (1 hour by default) so depending on your interval for snapshots, no matter what time you enter, the ADDM Compare report will choose the closest snapshot to the time entered.

The High Level Comparison

The report will then be generated and the report will then be displayed.

addm_cm_0326_top_grph

What we see in the top section of our report is a graph display of activity of our Base Period, (8-9am) vs. our Comparison Period, (11am-12pm).  You can see a significant change in activity, but you can also see at the top left of the graph is the SQL Commonality is only 43%.  This means that there is only 43% of the same SQL in the two times we are comparing.  57% is a large percent of difference to chase down, but we are aware, there were significant differences in what SQL was running in the base period vs. what ran during the comparison.  Also note the Average Active Sessions, shown in red at the bottom center of our example.  The Comparison Period has 5.19 on average, which is significantly more than the Base Period.

It’s All in the Details

The default in the second section of the report is the Configuration.  This is where you would quickly discern if there were any changes to parameters, memory, etc. that are global to the database system.

addm_cm_0326_config

As you can see, our example had no changes to any global settings, including parameters, etc.  If you would like to inspect the current settings, you can click on the All radio button and view everything.

The next step is to view the Findings for the report.  Depending on your report, you will have different data that will show in the Performance Differences column.  In this example, we’ll go over a couple examples and what you should look for, expand upon, etc. to give you the most from this feature.

addm_cm_0326_f_io

This section of the report is broken down by two or three sections, sometimes requiring you to click on a + sign to expand into the section of the findings and/or use drop down arrows to extend the data detail.  The first section displayed here shows the current segments causing significant increases in User IO in the comparison period.  It is then calculated by Change Impact, Base Period and Comparison Period.  This allows the user to quickly access the total impact, as well as see the exact percentage of impact to each of the periods.

In the bottom section, if there were any global changes that were responsible for part of the impact, (System Change Breakdown) it will show for each one.  As we see so far, only the difference to the average active sessions have caused concurrency, which has been directly responsible for the issue in performance degradation.

We’ll now move into the next section down, Buffer Busy waits.  There is a significant decrease from the previous impact in the Change Period, (5.08% down to 1.64%) but we can still clearly see the increase from the Base Period, (.87% and .17%).   We are able to extend the information provided by inspecting the bottom Description, which shows us each of the objects involved in the Buffer Busy Waits, percentage of Change Impact, the Base Period % and the Comparison Period %.

addm_cm_0326_f_buffer_busy

We can then drop down to the bottom of the report and dig into the System Change Breakdown.  This includes the Workload Change, which we do see here and data is provided by SQL_ID.  We see that the buffer busy wait was null for the Base Period, but contains values for the Comparison Period.  The data may be a bit misleading as it shows 0% Percent Change Impact.  The calculation for this report must have some type of value in the Base Period for it to function correctly, so take care when analyzing this part of the report.

addm_cm_0326_f_bb2_wrkld_chng

System Changes in Workload

What you can do is hover your mouse over the SQL_ID and inspect the SQL statement.

addm_cm_0326_f_bb3_sqlid

We can then …….

addm_cm_f_commits

We also have the option to decide the percentage of changes we would like to inspect.  the default is 5%, but if we want to look at less or more, we can update the value and so a search, then clicking on the + to show the SQL Breakdown.  We can only display the Regression or Improvement information for any given section to filter as needed for the investigation.

addm_cm_0326_look_for

Once into the SQL Breakdown, we can hover over any SQL_ID to display the SQL in question.  We are able to see if the performance increased or degraded from the base period and the percentage of impact.

addm_cm_0326_top_sql

With each section displayed in the Performance Difference category, we then can go down to the description and view the data, (as we see below in the the table for Row Lock Waits).  You can hide any area that is providing too much information for the inspection period, or filter what is displayed in the middle options area.

addm_cm_0326_row_lock

Resource Tab

The final tab in the ADDM Compare Report is the Resource section.  this is broken down into graphs that when any area is hovered over, display details about any given resource wait.  The first section, highlighted in light blue is the CPU.  Both the Base period and Comparison are noted to not be CPU bound.  The CPU usage by instance is shown by varying shades of green for this two instance RAC environment.

addm_cm_0326_resource_cpu

Second link in is for Memory.  Both the Base and Comparison show that no virtual memory, (paging) was experienced.  The amounts are displayed in a table format with totals.

addm_cm_0326_resource_vm

Our third Resource link is for I/O.  There are more than one graph involved in this view, the first shows if in either period if I/O was an issue and then there is a break down of the type of I/O usage was seen in both.  If you hover your mouse over any of the graph waits, such as Data Read in the example, you can see the the percentage and totals for the period is displayed.  Note-  Temp usage for read and writes are displayed in this section.

addm_cm_0326_resource_throughput

The right hand graph for I/O is all about Single Block Read Latency.  This is helpful in many systems, but especially Exadatas, where  single block reads can be especially painful.  Note that this data is displayed by the base and comparison, by milliseconds and no instance filtering.

addm_cm_0326_resource_latency

The last Resource link is for Interconnect.  For RAC environments, this graph will display the base and comparison throughput in the left hand graph, along with if either period was Interconnect Bound.  You can switch from the default display of GBit to GBit per second and remember, there won’t be any instance displayed, as this is the interconnect between the nodes on average.

addm_cm_0326_resource_interconnect_bnd

The right hand then displays Interconnect Latency.  This is based off averages, displays graphs for 8KB pings, Fast Grant and Fast Block Transfer for the base and comparison period.

addm_cm_0326_resource_interconnect_latency

Now this data may make some DBAs scratch their heads on the value, so we’ll talk about this for just a minute and try not to make this blog post any longer than is already is… :)

8KB Ping- can be found via SQL*Plus via the DBA_HIST_INTERCONNECT_PINGS

The 8KB Ping is from the WAIT_8KB column and is a sum of the round-trip times for messages of size 8 KB from the INSTANCE_NUM to TARGET_INSTANCE since  startup of the primary instance.  Dividing by CNT_8KB, (another column) gives the average latency.  So think of this as a bit about response across your interconnect.

Fast Grant- Cache fusion transfers in most cases are faster than disk access latencies , i.e. a single instance grant, so the Fast Grant is when the RAC environment experiences this and totals are averaged vs. how often a single instance grant is offered.

Fast Block Transfer-  There are multiple situations that occur for block transfers, but this is displaying a percentage of only FAST block transfers experienced on average across the nodes of the RAC environment.  Due to the fact that resolving contention for database blocks involves sending the blocks across the cluster interconnect, efficient inter-node messaging is the key to coordinating fast block transfers between nodes.

This calculation for the graph is dependent on three primary factors:

  • The number of messages required for each synchronization sequence
  • The frequency of synchronization– the less frequent, the better
  • The latency, (i.e. speed) of inter-node communications

What the Report Told us About our Example

In the example we used today, the change had been the addition of 200+ targets monitored without the next step in tuning the OMS having been taken yet.  As we were only seeing a change in the amounts of executions, weight to existing objects, etc., this was quickly addressed as the administrators of the system updated the OMS to handle the additional resource demands.

Summary

With all this data in one simple report, you’ll find that you can quickly diagnose and answer the question “What changed” without any assumptions required.  The high-level data provided answers the often quick conclusions that can send a group of highly skilled technical folks in the wrong direction.

Let’s be honest- not having a map on a road trip really is asking to wander aimlessly.

Tim and I leave for Oslo on Monday for the OUGN, (Oracle User Group Norway) conference.  This is a great conference that many of my friends have talked about for the last couple years and I finally let Oyvind talk me into going.  The conference is on a cruise ship that leaves from Oslo and sails for three days.

We’ll be quite busy-  Tim and I will be doing a keynote, “Assumptions, Killer of First Dates and Databases Everywhere”, then I’ll be presenting “DBaaS in a DBAs World” and “Master EM12c Monitoring”.  I will also have the opportunity to host my first European WIT round table, which I’m looking forward to.  I’ve had the opportunity to discuss WIT issues with men and women from Europe in the last year, so I’m really excited about talking about the initiatives they are taking on in Europe and the challenges still to be addressed.

So, hide your vikings, I’m on my way to the great North and sorry Canada, not you this time… :)

 

We’re going to continue onto Session Identifiers in the ASH Analytics Activity view.  You can use the links to view Part I and Part II to catch up on the ASH Analytics fun!  Knowing just how many ways you can break down ASH data is very helpful when you are chasing an issue and need specific data to resolve it.   We’ve already taken a deeper look into SQL and Resource Consumption, so not onto the session identifiers.

Session identifiers in ASH Analytics cover the following-

ash_0324_main

A session identifier provides distinct information about the session or sessions.  Like previous blog posts, all data for each graph is from the same timeline, so comparisons to more readily understand the differences in data has been chosen.

Instance

The data is broken down by total sessions per instance, which is extremely helpful when working in a RAC environment.  The graph will show if there is “weight” to the database resource consumption or unusual sessions, etc. from one instance or another.

ash_0324_instance

As you can see in our example, it’s pretty much 50/50 between the two, with a couple, minimal spikes from sessions that we can drill down on the right or by SQL_ID on the left.  Notice that you can see the percentage of SQL that is being executed on each of the instances.

Services

Your services are your connections into the database that each session is using.  The database environment we are using for our example only has one distinct “user” service, which is shown in hot pink.  If you have more, you can gauge the amount of resources, per user service that’s been deployed to each application/user group/host, etc.

ash_0324_services

As we’ve seen in the other activity graphs, we can easily drill down by SQL_ID on the left or instance/SID/Serial# on the right.

User Session

The activity in this graph is quite heavy.  We can identify the majority of user sessions that are consuming the highest amount of resources individually.  The rest who can’t be isolated and graphed out individually, are then grouped at the bottom and marked as “Other”.  This view comes in handy when you have one session that is an overall consumer that would be out of the norm.

ash_0324_sessions

Notice the mix of sessions that are involved in each of the SQL_ID’s on the left.  This tells us that most of the users are either accessing via an application or through packaged code.  If you were to see one session running on top SQL_ID, this might be a reason to quickly investigate for an issue.  The “Other” category may look interesting, but remember, if any one of these sessions or SQL_ID were a significant consumer of resources, they would have their own identified section in the graph.  Don’t fall into the rabbit hole… :)

Parallel Processes

There isn’t a lot that I get to show you for activity here.  The environment that I’m using for my example just isn’t set up to use or need a lot of parallel.  I have all of two parallel processes and you can see a process running in parallel on the first instance and another on the second instance, identified by color.

ash_0324_parallel

Notice that the bottom sections is our parallel process shown on both the left and the right side.  You have the option to start filtering by more information, same as what you see at the top, but this is secondary filtering.  A simple change on the right could gather a different set of data for the same processes, such as wait event or resource consumption.

User ID

As you can see, two primary User IDs are accessing the environment.  We already know that its numerous sessions, but this could signify an application ID that is used for a login, resulting in the user login security being housed at the application or OS level.  We can see that the primary usage is from the User ID with the magenta graphed section.

ash_0324_user_id

The bottom section would show the same as the top, just broken down by percentage.  What we can do instead, is break it down by a different type of data, such as wait event or even objects as I’ve shown here:

ash_0324_break_it_dwn

Program

For the Programs Activity graph, I’ve done something a bit more with the bottom section.  As you can see in the graph, we have the programs displayed across the graphs and notice for this one, the OMS is the majority of the usage, which is expected, (this is an OMS environment…. :))  Now look at the bottom section-

ash_0324_program

The default would have been to show me the Programs by percentage, but I’ve updated them by displaying now SQL information in the left and right sections.  The left section, I’ve now requested to know what SQL Plan Operation there are and by what programs.  The right I’m displaying the SQL_ID’s and from what Programs they are sourcing from.

Session Type

Last one is Session Type.  This is to simply distinguish background vs. foreground processes.  the bottom section already displays data on SQL_ID on the left and SID on the right.  You’re welcome to mix up this data and filter anyway you need to, but this is the default for this activity view.

ash_0324_session_type

Keeping to our areas we’ve already discussed in our previous posts, I’ll switch up the left and break it down further by Wait Event.  On the right, I’m just starting to choose a new filter, but notice that I’m simply clicking on the drop down menu and picking the event type I want to use for my new filter.

ash_0324_break_it_dwn_2

Using the activity graphs in ASH Analytics this way grants the DBA access to various types of diagnostic and analysis information that can be used to trouble shoot an issue or dig deeper into a performance tuning challenge when it comes to session information.

 

First Week at Oracle

I was warned it would be difficult.  I was warned that it would be time-consuming and painful, but I survived without as much as a scratch on me.

This last week was my first week at Oracle and it was really, really great.  The on board process wasn’t difficult at all.

Maybe it was because I was told to be prepared for a difficult and challenging week, so I was prepared for something very difficult and it was much easier than I imagined.

Maybe it was because I had new challenges and interesting new environments to work in, which made the week go by fast and the on board tasks seem minimal.

Maybe it was because I have an excellent manager and peers who made sure I had everything I needed.

Maybe it was because I had excellent advisers and support inside so when I had a question, Tyler, Jeff, Courtney, Pete, Werner and others were there to quickly help me out.

Maybe it’s just that Oracle has a lot of processes, applications and good people in place that makes coming on board a pretty pleasant experience.

Onto week two! :)

This is the second post on ASH Analytics Detail.   You can find the first post here.

This post we’re going to work into more Activity data within ASH Analytics, but I’m focusing on Resource Consumption, which you can find once you are into Ash Analtyics, (you may want to stop refreshes, which can be done at the upper right in the refresh settings) under the Activity button.

ash_0321_resource_consum

We’ll be working with the last three options in this menu marked off in red, as the Wait Class is just a similar view of what we already get in the main Activity graph.

Wait Class

I’m sure you’ve already noticed, but ASH Analytics has a tendency to always arrange any of the graphs by the heaviest usage at the bottom to the lightest at the top.  This makes it easier to view for the user.   This will be the case for pretty much all the activity graphs, where Load Maps will arrange largest to smallest, left to right.

ash_0321_main

Nope, not much new here-  We are still seeing the resource consumption, but as wait events.  If we compare it to the standard wait events view, not much different.  The reason we would really find value in this is if we had IORM, (IO Resource Manager) enabled.  Since we don’t, we’re going to skip it for now and it will get to be part of that blog post in the future… :)

Wait Event

when we filter by Wait Event, we get a very different view of our data than we did by Wait class.  Note the actual categories that are graph is broken up by, listed down on the right hand side.

ash_0321_wait_event

Scanning from the bottom –> up on the right hand side, we can see the largest resource consumption is “LNS Wait on SENDREQ” and the second highest consumer is “LGWR- LNS Wait on Channel”.  You are also seeing log file sync and parallel log file write.  All of this comes to a clear understanding of what we are dealing with here… Why?  This is a dataguard environment and the LNS* type waits are quite common and occur when-

  • LGWR writes redo to the online redo log on this primary database (when LGWR SYNC is not used, user commits are acknowledged once this step completes except when the parameter COMMIT NOWAIT is used.
  • The Data Guard LNS process on this primary database performs a network send to the Data Guard RFS process on the standby database. For redo write sizes larger than a MB, LNS will issue multiple network sends to the RFS process on the standby.
  • LNS posts LGWR that the all the redo has been successfully received and written to disk by the standby, which is a heavy consumer, as well.

On the primary database here is extended resource consumption represented with the “log file parallel write” wait event. We will then note repeated “LNS wait on SENDREQ” wait event, too. You can further divide the “LNS wait on SENDREQ” wait event into network time and RFS I/O time by subtracting the “RFS write” wait event obtained on the standby. These wait events can be assessed on the standby by using multiple queries in the “Data Guard Specific Wait Events” section for a physical standby or by using AWR for a logical standby if you need further proof.

Object

I’ve hidden the exact object names involved in this database, but you’ll note, they weren’t involved in much of the resource consumption anyway.  Now on an environment with heavy IO, this would change significantly and you would see a lot more of one or more objects being the focus of this type of graph.  This is where having the Top Activity box check marked is helpful, as it clearly shows you that object IO is not much to focus on for this environment.  The black mark showing the upper max of activity for any given time in the activity graph gives you a clear line to compare with.

ash_0321_object

Blocking Session

This is where many of you will become very interested again.  We all want to know about blocking sessions and how they may be impacting the resource consumption in our environment.  Although there was no *true* impact to production processing, there was some blocking sessions that we could inspect in our current example.

Only sessions that are experiencing some type of bl0cking are displayed in this view.  No matter if it’s transaction, (TX- Tran Lock contention or TM- DML enqueue contention , etc.) or UL, (user defined)  these sessions will be shown in this graph.

ash_0321_blocking_sess

I’ve included the bottom portion of the ASH Analytics section so you’re able to see the SQL_ID’s involved in the sessions and then to the right, you can also view the session information, displayed by Instance ID, (for RAC) session ID, (sid) and serial number, (serial#).

You can then click on a session in the lower section, which will filter the ASH graph by the session and the wait class to give you another view, demonstrating in our example that we are waiting on a commit(s) and then our favorite wait, “other”…. :)

ash_0321_blocking_sess_det

What’s helpful, is that the SQL_ID is clearly shown on the left and  you’re able to click on any of these, (although I’d highly recommend clicking on the top one that is 5 times the concern vs. the others. :))

This completes the review of the Resource Consumption Menu for the ASH Analytics Activity Page.  We’ll continue on in Part III later this week!

I have a request to hit on some EM performance blog posts, so we’re going to start with breaking down some of the ASH Analytics areas.  ASH Analytics is not your grandfather’s “Top Activity” and I recommend everyone begin to embrace it as it is the the future of performance activity in the Enterprise Manager.  The idea that we will be able to pull directly from ASH and AWR to present our performance data via the EM12c is exciting, to say the least.  The added accuracy and value of the aggregated historical data must be recognized as well.

The standard output for ASH Analytics looks quite similar to Top Activity in the way of focusing on activity graphed out by wait events, but I’m going to dig into a different ways to present the activity data, as it may answer questions that simply won’t show via wait event graphing.

Once you first enter the ASH Analytics interface, you’ll be presented with the section at the top, which will display the timeline for your examination and then below, the actual wait event graph as seen here in our example:

ash_standard

You’ll note that we haven’t used any filters, so we’re viewing all activity data and the standard wait classes we are accustomed to viewing as shown, i.e. I/O, (dark blue) CPU, (kelly green) and System, (light blue).  We can see the system is rather busy during multiple intervals, bypassing the CPU cores line, displayed in red across the image.

Now let’s display the data in a different format.  This is done by changing the Wait Class, shown under the Activity button, to another filtering type.

ash_menu

The section highlighted in red is where we will be spending our time today.  I’ll display the same section of time we see in the standard wait event displayed in our standard wait event view, but focus on the SQL defined displays and we’ll go over how they might assist you in troubleshooting an issue.

SQL_ID

Now this one might seem a bit self-explanatory, but displaying the data from the activity pane appears very different, then, let’s say from the load map:

ash_bld_out

Now let’s look at it from the Activity View defined by SQL_ID:

ash_by_sqlid

I find the load map an excellent way to get my point across to non-technical folks, but I’m a DBA-  I’m used to spending my time looking at the data in terms of activity, so this is a more comfortable view for me to move from seeing the data in terms of wait events and transitioning that view to seeing the percentage of activity allocated to each SQL_ID.

We can click on the SQL_ID displayed on the right to go to it’s detailed ASH page, which will show you all data pertaining to that SQL_ID and wait information involved with it.  By clicking on different sections, we’re not just digging down into more detailed information, keep in mind, we are also “filtering” out more data that could have been masking valuable information.

Top SQL_ID

You may also think that this one is not going to look any different than the last, that’s where you’re wrong.  This is not the SQL_ID that has resources allocated to it in activity, but the top level SQL_ID that is executing the SQL_ID active.  This is helpful if you are trying to locate what packages and procedures should be first on the list for code review or if you want to quickly locate the package or procedure responsible for a specific SQL_ID.

ash_top_level_sqlid

These can then be identified to sessions by clicking on links and then traced to users, hosts, etc.  If I click on one of the top SQL_IDs, it will take me to all SQL_ID’s involved in that Top SQL_ID and all the wait events, displayed in the same graph timeline.  From there, I can then dig down into the data pertaining to the waits, the SQL_ID’s involved as part of the Top SQL_ID or even switch to other views, such as a load map to present the data in another format for peers to view more easily.

Force Matching Signature

Force matching signature is a way of ensuring that SQL is using the same plan/profile, etc. even when it has literals in it.  It’s kind of like setting cursor_sharing=FORCE, but in a “hinted” way throughout a database.  This can be both good and bad, as let’s say that it forces the value to “1″, where the value “1″ really only makes up 2% of the rows and it would have been better if it knew what it was working with.

ash_force_matching

SQL Plan Hash Value

You should be seeing this next image [hopefully] as a mirror of the SQL_ID one.  We do like plan stability and the idea that we have a ton of hash plan values changing erratically is enough to make most of us feel queasy.  Having a quick visual that we’re not experiencing a lot of plan changes can be quite helpful.  It’s not foolproof, again, this is a visual representation of the data, but it’s helpful.

ash_sql_plan_hash

SQL Plan Operation

If you’re a DBA managing a data warehouse, you know when any DML is running and on what schedule.  The SQL Plan Operation view can give you a quick verification if something is amiss and there are updates, deletes or inserts happening that shouldn’t be.  You can see the percentage of activity that may also quickly tell you a change has occurred vs. the normal database activity.

ash_sql_oper

You can also see just how much of your activity is going to certain type of processing.

SQL Plan Operation Line

The plan operation line, you can see the operation type for the process, along with the description.  the process is then broke down by both the SQL_Id and the step the statement is performing.

ash_sql_plan_op_line

If you hover over any SQL_ID listed, it will also show this to you in the highlighted area:

spec_load_type_op

SQL OpCode

This one is self-explanatory.  We are simply looking at the activity level per statement type.  It is easy to see that during our busy intervals, queries were around a 1/3 or the activity, as were inserts.  This view can be helpful if you are retaining screenshots at different snapshot intervals for comparison.

ash_sql_op_code

Top Level SQL Opcode

You can also view the data simply by Operation code, displaying by each code.  Notice that these are displayed by the Instance number, (if on RAC), Session ID, (sid) and Serial number, (serial#).  This is a global view of the operations that are occurring in the environment and can offer a clear view of activity, but at an operation type level.

 

ash_sql_op_code

We’ve just touched on what the SQL section of the ASH Analytics Activity view can offer.  Stay Tuned and we’ll dig into each section and each area of this great interface for performance monitoring and analysis.