Monitoring a Microsoft OS Failover Cluster
I know, I know- none of you are using Microsoft Windows. This is why I get so many questions on this topic and why there is so much interest in a white paper that no one thought I needed to write. Well, while that pesky ol’ white paper is in review, I’m going to go onto a secondary topic of how to monitor a Microsoft Active/Passive cluster with Enterprise Manager 12c, release 4.
There are some disclaimers I’m going to put in this blog post-
1. monitoring a clustered host is not explicitly stated as not supported with EM12c, but there is also some documentation that I’m still pouring over that is a bit contradicting…
2. I am only covering MS Windows 2008 R2 server with this article. It will be quite simple to cover MS Windows 2012 server, but no guarantees on 2003, as the security model for clustering, etc. is very different in the earlier releases.
3. This is a blog post. It’s not a MOS note, it’s not a white paper, so remember, this is just me blogging on how I accomplished this and there is no official support documentation behind this topic.
And so here we go!
The concept of OS level clustering can be very foreign to DBAs who’ve spent much of their time with application level clustering, (i.e. RAC/Real Application Clustering) but I believe it is important for us to understand different clustering models, along their benefits and drawbacks.
Microsoft OS level clustering, in an Active/Passive configuration for Microsoft includes the following basic, physical design-
- Two hosts, identified by a physical name and physical IP Address
- A Virtual Cluster name, along with Virtual Cluster address.
- A quorum disk, (similar to a voting disk in RAC)
- Shared storage
The overall design appears similar to the following when in an active/passive mode with the Management agent installed with two databases monitored:
The most important thing you need to recognize is that no matter what the HOSTS are named or their IP addresses, the databases and the management server, (along with the listener, that I didn’t include in my diagram!) is all configured with the VIRTUAL CLUSTER NAME. They do not recognize the host names at all and are only monitoring and running on the ACTIVE host via the VIRTUAL cluster name.
This is achieved through the same method as you would any other Windows host. I recommend a silent installation, even with EM12c, Release 4, (126.96.36.199) using a response file to enter the commands and ensure you are completing all information as you normally would, but with the cluster information.
1. The first requirement is your cluster MUST BE STABLE.
If a Windows Server cluster is not stable, don’t proceed. Correct whatever DNS, Active Directory or shared storage issues exist before proceeding to install Oracle or the agent. This will save you a lot of headaches.
2. Failover Group should exist to add the Management Agent to.
If an application failover group doesn’t already exist in the Microsoft Failover Cluster Administration Manager, then you will need to request a virtual host name and virtual IP Address to be used for the Central Access Point, (CAP). This is the group service that will manage the failover for the databases, etc, (so if databases already exist, this should exist and all you will do is add the agent to it….)
Test out all cluster connectivity via Name Server (nslookup) commands
nslookup <cluster name> nslookup <cluster IP Address> nslookup <CAP name> nslookup <CAP IP Address>
The management agent is going to be installed on the shared storage, which means it will only know about the active host in the cluster, (disclaimer alert!) To perform the installation, we are going to use a new method of PsExec with Windows Servers.
Download Agent Software
Check your software library for the correct version of the agent and download it via EM CLI commands:
emcli get_supported_platforms emcli get_agentimage -destination=G:/swlib/ –platform="Microsoft Windows x64 (64-bit)"
Exit from EM CLI, unzip the file and prep for installation.
Setting up PsExec on the OMS
Create your folder you wish to install to: C:\agent12c
Download the PsExec bat file and the utility, following instructions in the DOC ID 1636851.1
Download the agentDeployPsExec.bat and the PsExec utility to a folder on C:\psexec.
Create the psexec.rsp response file with the following information and save it to the C:\psexec directory with the other files:
HOST_NAMES=<Virtual cluster host name> USER_NAME=<domain\login name> PASSWORD=<OS Password> AGENT_IMAGE_PATH=C:\agent12c_ins AGENT_BASE_DIR=C:\agent12c OMS_HOST=<OMS_HOSTNAME> EM_UPLOAD_PORT=<Port> AGENT_REGISTRATION_PASSWORD=<OMS Reg Password> PSEXEC_DIR=C:\psexec
If you are unsure of the values for your upload port, etc., run the following on the OMS:
emctl status oms -details
It requires the SYSMAN password and will return all information pertinent to your EM environment.
Once you have this all filled in, you are ready to deploy from the OMS to the 1st node of the OS clustered server.
Run the following:
agentDeployPsExec.bat AGENT_BASE_DIR=<directory for installation on remote host> RESPONSE_FILE=<directory with response file>\psexec.rsp
C:\psexec>agentDeployPsExec.bat PROPERTIES_FILE=C:\psexec\psexec.rsp C:\psexec>echo off =================================================================== Agent deployment started on host : host1.us.oracle.com Creating installation base directory ...
Note: You must have remote admin privileges to the target host to perform this successfully. If the account in the response file does NOT have privileges granted to create directories, start remote services and such, it will fail.
It will take some time to deploy the agent and once complete, will show the following:
Agent deployed successfully.
Duplicate registry and services to second host
Click on Start, Run and type in regedit.exe on the first host of the failover cluster.
Go to HKEY_LOCAL_MACHINE\SOFTWARE\oracle and right click on the folder and choose Export. Save the registry file in a secure location.
FTP the registry file to the second host.
Log into the second host and double click on the registry file. It will ask you if you are sure you want to copy the registry key, click OK.
Create the service
The service for the agent must now be duplicated on the second host. This is OS level clustering, so no application level service creation should be performed, (emctl, emcli, oradim…) Use sc commands, (Windows Service Control) to create the service.
Open up a command prompt in administrator mode and the duplicate the OracleAgent12c1:
The syntax for creating the service is as follows:
sc create <Service Name> binPath= "<Path to executable>" start= auto
By opening up Windows services on the first host, you can go to the OracleAgent12c1 and double click on it to view the values you need for above:
Run the sc command on the second host to create the support service for the installation performed on the first host.
Add the Agent to the Failover Group
In the Server Manager, go to the Failover Management and open up the Failover group:
Right click on Add a resource and choose Generic Service. You can then choose the Agent service listed, (for the example above, Oracleagent12c1Agent) and follow through with the defaults to finish.
The service for the agent is now set to be cluster aware and will failover if the first host it was installed becomes unavailable for some reason.
The installation is now complete on the shared storage of the failover cluster for the first host and you’ve now copied over the registry settings and duplicated service, so you are ready to test the failover and ensure the agent is successful.
There are a couple ways to test the failover:
1. Reboot the first host- this will cause a failover.
2. Right click on the Failover group and click on More Actions, then Simulate failure of this resource.
You should now see the drives and services, including the agent, failover and start on the second host. Verify that all services come online and log into the EMCC , (EM12c console) to verify uploading occurs to the OMS. Verify that all the targets you added are showing correctly for the virtual hostname.
Test failover multiple times to both hosts. If a failure occurs on the second host, check the services, comparing to the 2nd to 1st host and/or dependencies on start up of your Failure group.
The clear drawback of OS level clustering through the agents is that only one host is being monitored at a time. As the targets, (MSSQL database, applications, etc.) that are being monitored are active on only one host at a time, there would be manual intervention required if dual agents were deployed.
A workaround in the form of a monitoring script to ping the hosts at all times, only alerting if no response received is a second level of host monitoring availability.
I’m also inspecting the option of Failsafe with EM12c, (not currently supported) and the new Partner Agents to see if there are more opportunities to monitoring OS level clustering.