Nothing is more annoying that getting alerted on things that are not critical to you or that you already know is occurring and there is not a darn thing you can do about it.

I’ve also been frustrated to wake up in the morning and to see my inbox flooded with a ton of alerts from numerous EM12c systems and really-  none of them are truly critical, but can appear to be simply overwhelming!

How to we steer the inbox from madness to manageable?

The answer to this question is not just managing metric settings and thresholds-  there are a number of different ways to control what end up in your inbox or alerting you at every waking and sleeping hour.

The part in this series is focused on fine grain rule sets.  This is a post that will show you how in a few, simple steps, with the hopes that you can recreate this as simple or complex as you wish, based to make your EM12c server you better.

Note:  This next post is a change, that as I said, is about changing a rule set-  Remember-  changing a rule set is a change to ALL targets utilizing the rule set, vs. changing a metric policy or threshold at a target level, so think through what you want to change and how you want to make the change BEFORE you make the change.  All options I show you here can be performed in many, many different ways to address the problem.  Rationally think about the change you need to make, test it out and use common sense.

So for our email today, one of my clients systems have started to send this incident to my inbox at night:

 Host=hostxyz.client7.com
 Target type=Oracle Management Service
 Target name=hostxyz.client7.com:4892_Management_Service
 Categories=Performance
 Message=Loader Throughput (rows per second) for Loader_D exceeded the critical threshold (75). Current value: 65.08 
 Severity=Critical
 Event reported time=Dec 4, 2013 8:50:10 AM CST
 Operating System=Linux
 Platform=x86_64
 Associated Incident Id=51218
 Associated Incident Status=New
 Associated Incident Owner=SYSMAN
 Associated Incident Acknowledged By Owner=No
 Associated Incident Priority=Very High
 Associated Incident Escalation Level=0
 Event Type=Metric Alert
 Event name=Management_Loader_Status:load_processing
 Metric Group=Active Loader Status
 Metric=Loader Throughput (rows per second)
 Metric value=65.08
 Key Value=Loader_D
 Key Column 1=Loader Name
 Rule Name= Incident Management Ruleset,Incident creation Rule for metric alerts.
 Rule Owner=SYSMAN
 Update Details:
 Loader Throughput (rows per second) for Loader_D exceeded the critical threshold (75). Current value: 65.08
 Incident created by rule (Name = Incident Management Ruleset, Incident creation Rule for metric alerts.; Owner = SYSMAN).

I could really care less about this and am already receiving this information once it hits the warning threshold for review.  I don’t want to be woke up in the middle of the night, nor is it something that I can really address as a critical outage for this client.

Let’s edit our rule set by taking the information offered us in the email from the “Rule Name” section:

Rule Name= Incident Management Ruleset,Incident creation Rule for metric alerts.

Incident Rule Sets

Log into the EM12c and click on “Setup”, “Incidents” and then “Incident Rules”

You should see your Incident Rule Sets listed.

em_incident_rules

 

We’ll take the following information from our Incident email we recevied:

Rule Name= Incident Management Ruleset,Incident creation Rule for metric alerts.

and then one more section, “Categories”, (remember, some Incidents can belong to more than one category):

Categories=Performance

Taking just these two lines above, this tells us what incident rule is alerting us.  What we may not realize, is that by default, critical metric alerts notify on ALL categories and you then distinguish this by rule set, by target, by group, etc.  This is where the EM12c again proves itself to be a self-service product, giving the power to the administrator to receive notifications on any incident in the way that they want to receive or NOT receive.

Armed with this information, we are now going to take this example for our client and edit the rule set-

em_edit_rule

 

We’ve highlighted the rule set that matches the first part of the Rule name “combination” and click on Edit.

em_edit_rule2

Then upon entering the info for the rule set, we’ll need to edit the actual rule, which is the second part of the combination offered in the “Rule Name” from the email.  Click on the Rule Tab, the rule which we wish to edit, then click on “Edit”.

em_edit_incident_rule

 

This will take you into the rules basic information on what it uses for requirements it needs to trigger an incident.

By default, most rules are created to be triggered by very simple choices-

  • Type of Alert, in this case-  Metric Alert
  • Severity, in this case-  Critical

All other granularity has been left wide open, but you can change this and finely tune the granularity.

Now in the above email, we are told that the Category involved is “Performance”, but we really don’t want this waking us up in the middle of the night, as-

1.  This may be a server that regularly has high resource usage.

2.  It’s not a critical “pending outage” issue, but an issue that we would need to investigate in the morning or may already be a known issue that is scheduled to be addressed.

To address the emails,  we are going to make two changes.

1.  Only email on what is mission critical outage issues for Metric Alerts

2.  Create a new rule that will create an incident for any categories that are outside of what we want to be notified of for metric alerts.

em_incident_rule_categories

 

As you can see in the above example, I’ve added a check-mark for the Category and chosen the following:

  • Availability
  • Capacity
  • Error
  • Fault

I can then click on Next and Save the change.

We still have the other categories that are important to us, but we just don’t want them emailing anymore.  I want them to create an incident and I’ll review them when I review my Incident Manager, as I’ve been a strong proponent of using the Incident manager in this manner.

Create a New Rule for Category Level Metric Alert Coverage

We have now returned to the Rules that make up the Rule Set-  Click on “Create” to create a new rule.  We are going to create a rule very similar to the one we just edited, (just in case you need an example to use as a reference…) but we are going to choose the other categories and have the rule handle these categories of Metric Alerts incidents differently.

em_create_new_rule2

 

Choose the default radio button, “Incoming events and updates to events” and click Continue, which will take you to the rule wizard.

em_new_rule_metric_2

 

For this rule, we choose the following:

  • Type:  Metric Alert
  • Severity: Critical
  • Category: All the categories we didn’t choose for our rule that is still in place.

Click on Next to proceed to the next set the step of actions when the Metric is triggered in the wizard:

em_add_Actions_rule2

We want the EM to ALWAYS perform the actions when a metric alert for these categories are triggered, so leave the default here.

To save us time and energy, I believe is automating the assignment of the metric alerts and other common incident.

  • Assign to SYSMAN or a User created for this purpose in the EM12c.  You can even assign a specific email address if one person or group are in charge of addressing these types of incidents, (more options, more options… :))
  • This is still critical, so assign the priority to “Urgent” or “Very High”.

We want these categories to NOT email, so skip the email section.

Choose to clear events permanently, unless you wish to retain this data.

Proceed to the next section of the wizard, where you can review your Condition and Action Summary.

em_new_rule_actions_rvw2

 

Click to proceed to the next screen and add a meaningful name for your rule and a meaningful description-

em_rule_desc_2

 

Click on Save and then you will need to click on OK to save the rule to your existing rule set.

 

By following this process, we have 

1.  Removed specific categories from emailing from a specific rule so that only critical “possible” production outage incidents are going to email/paging.

2.  Added a second rule to handle those categories no longer in the original and to create incidents for review when appropriate time.

This could easily be built out so that a unique user is created with an email address to page uniquely and only assign mission critical, production OUTAGE to alert.

One rule could handle this for just production admin group, one business line, etc. to happen only after hours by editing the notification schedule.

You have the power in your hands to build out your Enterprise Manager 12c environment in the way you need it to support you and to do what your business needs to be more productive.

 

 

 

« »