Metric Thresholds and the Power to Adapt

Metric thresholds have come a long way since I started working with OEM 10g.  I remember how frustrating it could be if an ETL load impacted the metric values that had to be set for a given IO or CPU load for a database when during business hours, a much lower value would be preferable.  Having to explain to the business why a notification wasn’t sent during the day due to the threshold set for resource usage for night time batch processing often went unaccepted.

With EM12c, release 4, we now have Time-based Static thresholds and Adaptive thresholds.  Both are incredibly valuable to ensuring the administrator is aware of issues before they become a problem and not let environments with askew workloads leave them unaware.

Both of these new features are available once you are logged into a target, then from the left side menu, <Target Type Drop Down Below Target Name>, Monitoring, Metric and Collection Settings.  Under the Metrics tab you will find a drop down that can be changed from the default of Metrics with Thresholds to Time-based Static and Adaptive Thresholds which will allow you to view any current setup for either of these advanced threshold management.

adv_thresh_page2

To access the configuration, look below on the page for the Advanced Threshold Management link-

adv_thresh_page3

Time-Based Static Thresholds

The concept behind Time-based Static thresholds is that you have very specific workloads in a 24hr period and you wish to set thresholds based on the resource cycle.  This will require the administrator to be very familiar with the workload to set this correctly.  I understand this model very well, as most places I’ve been the DBA for, I was known for memorizing EXACTLY the standard flow of resource usage for any given database.

In the Time-based Static Threshold tab from the Metrics tab, we can configure, per target, (host, database, cluster) the thresholds by value and time that makes sense for the target by clicking on Register Metrics.

This will take you to a Metric Selector page that will help you set up the time-based static thresholds for the target and remember, this is target specific.  You can choose to set up as many metrics for a specific target or just one or two.  The search option allows for easy access to the metrics.

adaptive12

Choose which metrics you wish to set the time-based static thresholds for and click OK.

You can then set the values for each metric that was chosen for weekday or weekend, etc.

adaptive13

You will be warned that your metric thresholds will not be set until you hit the Save button.  Note: You won’t be able to click on it until you close this warning, as the Save button is BEHIND the pop-up warning.

If the default threshold changes for weekday day/night and weekend day/night are not adequate to satisfy the demands of the system workload, you can edit and change these to be more definitive-

adv_thrhld5

Once you’ve chosen the frequency change, you can then set up the threshold values for the more comprehensive plan and save the changes.  That’s all there is to it, but I do recommend tweaking as necessary if any “white noise” pages result from the static settings.

Removing Time-based Static Thresholds

To remove a time-based threshold for any metric(s), click on the select for each metric with thresholds that you wish to remove and click the Remove button.  You will be asked to confirm and the metric(s) time-based static threshold settings will be reverted to the default values or to values set in a default monitoring template for the target type.

Adaptive Thresholds

Unlike the Time-based Static Thresholds, which are based off of settings configured manually, Adaptive Thresholds source their threshold settings off of a “collected” baseline.  This is more advanced than static set thresholds as it takes the history of the workload collected in a baseline into consideration when calculating the thresholds.  The most important thing to remember is to ensure to use a baseline that includes a clear example of a standard workload of the system in the snapshot.

There are two types of baselines, static and moving.  A static baseline is for a given snapshot of time and does not change.  A moving baseline is recollected on a regular interval and can be for anywhere from 7-31 days.

The reason to use a moving baseline over a static one is that a moving baseline will incorporate changes to the workload over time, resulting in a system that has metric growth to go with system growth.  The drawback?  If there is a problem that happens on a regular interval, you may not catch it, where the static baseline could be verified and not be impacted by this type of change.

After a baseline of performance metric data has been collected from a target, you can then access the Adaptive Thresholds configuration tab via the Advanced Threshold Management page.

You have the option from the Advanced Threshold Management page to set up the default settings for the baseline type, threshold change frequency and how long the accumulation of baseline data should be used to base the adaptive threshold value on.

adaptive11

Once you choose the adaptive settings you would like to make active, click on the Save button to keep the configuration.

Now let’s add the metrics we want to configure adaptive thresholds for by clicking on Register Metrics-

adaptive14

You will be taken to a similar window that you saw for the Time-based Static Thresholds.  Drill down in the list and choose the metrics that could benefit from an adaptive threshold setting and once you are done choosing all the metrics that you want from the list, click on OK.

Note:  Once you hit OK, there is no other settings that have to be configured.  Cloud Control will then complete the configuration, so ensure you have the correct you wish to have registered for the target.

adaptive15

Advanced Reporting on Adaptive Thresholds

For any adaptive threshold that you have register, you can click on the Select, (on the right side of the Metric list) and view analysis of the threshold data to see how the adaptive thresholds are supporting the metric.

adaptvie16

You can also test out different values and preview how they will support the metric and decide if you want to move away from an adaptive threshold and to a static one.

You can also choose click on the Test All which will look at previous data and see how the adaptive thresholds will support in theory in the future by how data in the baseline has been analyzed for the frequency window.

For my metric, I didn’t have time behind my baseline to give much in the way of a response, but the screenshot gives you an idea of what you will be looking at-

adaptive18

Removing Adaptive Thresholds

If there is a metric that you wish to no longer have a metric threshold on, simply put a check mark in the metric’s Select box and then click on Deregister

adaptive17

You will be asked if you want to continue, click Yes and the adaptive threshold will be removed from the target for the metric(s) checked.

Advanced threshold management offers the administrator a few more ways to gain definitive control over monitoring of targets via EM12c.  I haven’t found an environment yet that didn’t have at least one database or host that could benefit from these valuable features.

 

 

 

 

Print Friendly
October 29th, 2014 by

facebook comments:

  • John Dillinger

    Thanks for shedding light on this. This is a welcomed enhancement from EM 11g. Given the available options today I’m not clear how I can adequately monitor a custom KPI which encounters a monthly workload spike on the 1st of the month. The highest adaptive setting only looks at the last 28 days which if fine for the month of February three years out of four 😉

  • Facebook
  • Google+
  • LinkedIn
  • Twitter