EM12c- Clearing Stateless Alerts vs. Clearing Other Alerts
I won’t apologize to Oracle for saying that the new Incident Manager has some quirks to it, but it does, especially when it comes to managing incidents, events and how to clear out both. I’m going to attempt in this post to explain how to check to see if you have alerts that can be cleared with an EMCLI command and then also take you through clearing out from the GUI, which is available in the Release 2 of EM12c.
If you want to know what kinds of stateless alerts that can be cleared manually, then once logged into the EMCLI, (Enterprise Manager Command Line Interface) you can do the following:
./emcli login -username=<login name>
./emcli get_metrics_for_stateless_alerts -target_type=osm_instance
This command will return all metric types and alert types that can be cleared out for ASM alerts, (osm_instance). You can do this for oracle_database or any of the other target_types.
As always, if you need help, EMCLI offers a good help option. You can save off the file for reference and then gather more help info on specific verbs by doing the following:
./emcli help > emcli_verb.lst <--creates the output file of the emcli verbs and descriptions.
./emcli help <verb> <--gives you a descriptions, syntax and examples for a specific verb.
If you aren’t sure of Target Type names, you can query the database or you can do wild card searches through the EMCLI:
./emcli get_targets -targets=%database%
./emcli get_targets -targets=%osm%
./emcli get_targets -targets=%
This first option above returns a list of all targets that are database type targets. The second returns all ASM targets and the third returns everything. Some of the output from the third looks like the following:
ID Status Target Type Target Name 0 Down oracle_listener LISTENER_SCAN2_exa1-cluster 1 Up oracle_oms oemdb02.orcl.com:48 1 Up oracle_oms_console oemdb02.orcl.com:48 1 Up oracle_oms_pbs oemdb02.orcl.com:48 1 Up oracle_database orcl_db3.world_orcl2 1 Up oracle_database orcl_db4.world_orcl1 1 Up oracle_exa_pdu exa1-pdua.orcl.com 1 Up oracle_exadata exa1cel07.orcl.com 1 Up osm_cluster +ASM_exa1-cluster 1 Up osm_cluster +ASM_exa1-cluster 1 Up osm_instance +ASM1_exa1db01.orcl.com
Before attempting to run a clearstate on incidents, lets look at how to check for these.Note that we use two arguments here with the clear_stateless_alerts verb:
- Unacknowledged_only– Only those alerts that have not been acknowledged, which helps distinguish from any that may be currently worked by someone.
- Preview– I want to just check and see how many are available to be cleared. The command will not clear anything this way, only tell me how many!
$ ./emcli clear_stateless_alerts -older_than=0 -target_type=oracle_database -target_name=<target name> -unacknowledged_only -ignore_notifications -preview The following alerts can be cleared: Total Alerts ============== 1
Now, if you want to clear the one alert that is available to be cleared as stateless, you can:
$ ./emcli clear_stateless_alerts -older_than=0 -target_type=oracle_database -target_name=<target_name> 1 alerts were cleared successfully.
Now if you have a huge quantity of alerts that are not available for clearing with this process, the EM12c console Incident interface offers the ability to mass clear alerts in release 2, (apologies to anyone still using BP1 and lower, this option will not work!)
From the EM12c console, there are a number of ways to access alerts, via the Incident Manager from the Enterprise drop down or from a database target summary page, but for this example, we will work from groups.
By clicking on All Targets, then Groups, we are brought to the groups summary page. As we see, our Prod group has 932 open incidents identified with this group:
By clicking on the incidents link, (which is the number of incidents, so for this group, we would click on the “932”) we are then taken to the overall incidents summary for this group.
If you hover your cursor over the left hand side column, just left of the Severity column, you are able to hold your left mouse button and choose multiple alerts:
By dragging the mouse down, you can choose as many incidents/events as you wish. Once you have chosen the amount you wish to remove, note that the “Rows Selected” will show in the left bottom corner, (for our example, we have 7.) You can then click the “Clear” button, identified with the pencil eraser.
The confirmation pop-up screen will then ask you to confirm and click OK, but remember to change it to not send email notifications so as not to “spam” everyone.
The worst thing you can do is send ridiculous amounts of emails telling everyone you are removing each of these incidents. Remember, we hate “white noise”… 🙂 Once you have confirmed this is what you want to do, click “OK”. You can do this for as many or as little incidents to clear out the system as you would like at a time.
Remember- Any incidents that are still an incident and need attention will recreate on the next metric collection for the target.
I tried it and it cleared 500 of 6154 alerts… Then it automatically created a job to clear the rest. The trouble is that the job is single threaded and is failing…. Kind of a mess…
What is the error you are receiving when it fails or a bit more info on the failure(s) and can you tell me what all the alerts were for that were generated?
Thanks!
Kellyn
Typical oracle error: Error attempting to invoke console command, retrying….
I’ll keep you updated on what I find…. Take care and THANKS!
Understood… Some have found relief from this error by increasing the job_queue_processes parameter value.
Good luck!
Kellyn
Hi Kellyn,
2 questions on this post.
1) Why it is required to clear the incident/alert manually ? As per my understanding it should autoclear once issue resolved.
2) When we clear it forcefully without solving the exact issue on target what happened to it in next collection ?
Thanks,
Happy New Year!
I think you may have missed the last statement in this post: “Remember- Any incidents that are still an incident and need attention will recreate on the next metric collection for the target.”
1. The first section is to remove “stateless” incidents. Let’s say your job is set to clear out every 7 days, but you’re Incident Manager is “cluttered” with 100’s-1000’s of incidents from an issue that no longer exists and didn’t clear automatically. This would be a reason to use the first option.
2. As stated in the post, if you clear the stateless alert-
a. It must not have an active metric threshold attached to it.
b. If it occurs again, a new incident will promptly be created.
c. clearing a stateless incident/alert is not the same as “suppressing” it.
Thanks,
Kellyn
Thanks Kellyn & Wish you a very Happy new year.
Kellyn,
Thank you for a lot of great information. I have just recently started to play with Incident Rules. I was hoping you could help me with a problem I am having:
We have a requirement to be Paged for Fatal/Critical errors. This is no problem, I can set this up very generically for now. However, I cannot figure out how to send a Page when a Critical/Fatal event is Cleared. When I selected “Cleared”, I started receiving all sort of Cleared pages and that is not good for sleep.
Is there something easy I’m missing or has OEM 12c made this (seemingly simple) task very difficult.
Thank you,
Jeff
I may need you to elaborate- you say you want to receive a notification when a critical alert is cleared, but you are receiving a all sorts of pages? The best way of figuring out is being received and shouldn’t be is start by emailing me at dbakevlar at gmail and send me one notification of what you want and one that you hadn’t wanted to receive… 🙂