Oracle

OEM OC4J False Down/Timeouts

May 3, 2010 - By Kellyn Gorman

After upgrading to 10.2.0.5 on Linux, our Oracle Enterprise Manager would report intermittently that the OC4J was down-

Target Name=EnterpriseManager0.serve3r
Target type=Oracle Application Server
Host=mtlincoln
Occurred At=March 12, 2010 3:09:52 PM MDT
Message=The application server instance is down
Severity=Critical
Acknowledged=No
Notification Rule Name=Application Server Availability and Critical States
Notification Rule Owner=SYSMAN

If you immediately checked the status of the OEM, all responses reported acceptable-

./opmnctl status

Processes in Instance: EnterpriseManager0.serv3r

I first blamed the introduction of flash and additional targets being monitored by the OEM, extending the interval on the thread timeouts for the alert errors per numerous recommendations from Oracle and others affected by the same issue:
$OMS_HOME/Apache/Apache/conf/httpd.conf

#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300
#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to “Off” to deactivate.
#
KeepAlive On
# Changed parameter to address bug 5717633 KJP, 4/26/10
#KeepAlive Off <–Commented out for the bug shown above
#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100
#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 15
#

this unfortunately did not correct the problem and we continued to be paged from time to time, without a particular issue being experienced as an instigator.

I was finally able to locate the actual source of the problem while digging around deep in the agent for the Oracle Application Server that is part of the OEM.

Through the OEM interface, Go to the OEM host > Middleware > Application Server Name

Status Up

Availability (%) 99

(Last 24 Hours)
Application URL http://serv3r:3338/
Version 10.1.2.3.0
Installation Type J2EE and Web Cache
Oracle Home /u01/app/oracle/product/10.2.0/oms10g
Host : Serv3r

Components

Select All | Select None

Select	Name	Type	Current Status
	home	OC4J	Up
	HTTP_Server	Oracle HTTP Server	Up
	OC4J_EM	OC4J	Up
	OC4J_EMPROV	OC4J	Up
	Web Cache	Web Cache	Up

Each link worked well except for one, which reported issues- OC4J_EM. When clicked on, I received an error, “Can’t load oc4j_all_instances_rollup” . I did a quick Google search on “oc4j_all_instances_rollup” and received only two responses, but one of them was to the OEM XML file that supports this final “up check” for the OC4J processes-

The file, $OMS_HOME/j2ee/OC4J_EM/applications/em/em/WEB-INF/config/webappTargetTypes.xml

I noted that I had two lines that did not match, mine referred to an “oc4j_instances_rollup”, but not the “oc4j_all_instances_rollup” that the OEM was searchin for. Since the example was very close to my own file, I updated the two lines metric names to match the one from the web example, only after making a backup copy of the original, (always best to keep a copy!)

I then saved the file and reloaded the OEM-

./opmnctl reload

Upon viewing the same link in the GUI interface, post OEM reload, no error was received and the response times are shown successfully. The timeout alert stopped now that ALL checks for up status resolve successfully, but this was an inaccurately reporting error deep in the agent mechanism for the OC4J monitoring that does not reside in the Apache or standard directories we would inspect for misconfiguration.

One comment on “OEM OC4J False Down/Timeouts”

Hi,
I have the same problem.
After a server crash (Linux/Oracle NFS), I reboot the system… but in order to start repository, oms, oma i have to unlock a bunch of files (control file, datafiles, … emkey.ora,… and somehow the GC stared to work but …
After this “crash” I’m getting the same error message like you:
Notification Rule Name=Application Server Availability and Critical States
I checked oms (./opmnctl status) everithing up and running, (exactly the same problem)
I took the following steps to fix this problem:
– in the file $OMS_HOME/jdk/jre/lib/security/java.security
I set networkaddres.cache=180 (default -1)
The system become more stable.
Meanwhile I unlock all files and reboot the system.
The problem is gone!
Unfortunately I’m not sure which step fixed this problem (networkaddres.cache=180 or unlock/reboot)!
Sometime encrypted data in Enterprise Manager will become unusable if the emkey.ora file is lost or corrupted.
So check the emkey.ora:
$emctl status emkey

Regards,
Andjelko Miovcic

Comments are closed.

Related Posts

Prepping an Oracle Database for a Cloud Migration

World Backup Day- Backing up an Oracle Database using RMAN to Azure Blob Storage

Oracle and the Future

One comment on “OEM OC4J False Down/Timeouts”