I’ve been part of multiple conversations, via Twitter, Facebook, personal and professional email on the choice of housing Enterprise Managers on a Virtual Machine.
Now in the title, I am not to be taken seriously, this is a bit of sarcasm, so please know, I am having a bit of fun with the title and hope to seriously look the logic of why I avoid VM’s for my Enterprise Manager homes.
The conversations didn’t get heated, but included many who were passionately against an EM on VM, another set who stated that “Virtualization is the future, including the Enterprise Manager- EMBRACE!” and I agree with both sides of the argument, but will continue to sit in the first group for reasons of experience.
My history involves four Enterprise Manager environments that resided on VM’s. One was an EM12c linux/windows VM x86/VMWare combo, one 11g Linux VM x86 and two on 10g, one Linux, one Windows but too far back to remember much in specs. I did not build or design any of them, solely came in as support after the environment was already in production. For three of these environments, I was a remote DBA supporting the customers, so this must be taken into consideration.
I want to start off with the goal of any Enterprise Manager environment:
- Robust, 24×7 environment monitoring.
- No “white noise”- in other words, no alerting or paging outside of actual issues/incidents.
- Secure, non-impacted by other applications/systems.
- Complete, multi-tiered monitoring for host, database, application, cloud environments and anything else the DBA can find and need a plug-in for.
Some of the cool features and goals of a VM:
- Software and hardware isolation and part of the VM, but still able to share one set of hardware.
- Para-virtualization, which can also be seen as load-balancing
- IO resource allocation across VM’s as needed.
- Memory resource allocation across VM’s as needed
- Cost-saving distribution of a server for many purposes
- Resource scheduling- yes, a schedule of resources and where they may be needed most at scheduled times.
With both of these bullet point areas we can surmise a quick one line goal of each:
Enterprise Manager- Consistent, reliable monitoring and alerting of the environment a DBA is responsible for.
Virtual Machine- Flexible architecture allowing dynamic re-allocation of resources to where most needed and often saving money by extending use of one server to many.
Looking at this again, different wording:
The concept of each, lends the DBA and those they support to view the Enterprise Manager as their window, the sentinel of their environment- It must be trusted to be there for them 24 hrs a day, 7 days a week, rain or shine. The administrator and those they support view Visualization as a less expensive way to get things done, dynamic allocation of resource features, hosts on demand.
On a stand alone server, what is the DBA most concerned with, let alone for their Enterprise Manager environment?
- Do I have enough Memory?
- Do I have the IO to perform the tasks that are necessary to day to day business?
- Is my environment secure?
- Who has access to these resources and could impact what I think I have, restricting my database and in turn, impacting me?
In each environment that I’ve been a DBA, Lead DBA or DBA and Developer in, there was a learning curve that was demanded of the server administration team to manage VM servers. A VM doesn’t appear much different to the DBA than any other server, so the learning curve was much less on our side. I was very sensitive to this for the administrator- I experienced it for Oracle, SQL Server and MySQL installations, along with applications that interacted with the databases I supported. Many of them released the servers to users with the default settings, receiving no additional training to offer anything more, which was acceptable for development or test or lesser impacting for a file server solution, but for anything mission critical, caused repeated impacts to service up-time.
For Windows environments, I’ve experienced automatic updates causing outages in my EM environment once during a critical incident period, so no notifications were sent by the EM. I received a page that the EM was down due to a secondary server with a EM monitoring cron, but this didn’t let me know there was an issue in another environment being monitored by that EM. When the hosting company was contacted in regards to the outage, I completely understood when I was informed that they hosted over 4000 Windows VMware hosts and it is in the SLA that automatic updates are turned on.
For a Linux VM that housed my OMS repository, each night, somewhere between 1:30am and 3am, there were pages escalated to the DBA oncall due to loss of contact between the OMS and the console, (Message=Agent is unable to communicate with the OMS. (REASON = Agent is Unreachable (REASON : Agent to OMS Communication is brokenOMS application is unavailable )
What is the issue with this? Can’t you just ignore it? Yes, you can, but this is the risk:
1. White noise- you learn to ignore pages at these times, assuming it’s just this bogus alert and sooner or later, you experience a real issue and ignore it. This is not efficient alerting!
2. If the console can’t communicate with the OMS, how do you know your agents are uploading all appropriate data in a timely manner?
3. If the console can’t communicate with the OMS, are you receiving alerts in a timely manner if there is an issue in our environment from any given target?
The cause of the OMS communication error? a second VM sharing resources is utilized as an FTP server and floods the network each night around this time, impacting the OMS and console’s ability to communicate.
As their is definitive hardware isolation it took a while and some research to figure out what was causing the outage. This is where there is a catch-22 to the VM environment for trouble-shooting issues.
With 10g as many have noted, there were the standard whining about memory and CPU starvation, poor management of the immature VM environments, etc. I started down the path with my previous employer tasked with building an 11g EM on a Windows VMWare, that due to such poor network connectivity and inconsistent resource allocation, sat completely idle and quickly was upgraded to a Linux stand alone server with EM12c last fall as part of a huge 11g upgrade project.
I think any DBA knows what is required for their Enterprise Manager hardware. I think this is a solid case for RAC or Data Guard, but then we get into higher licensing cost and remember, the choice to put the EM on a VM was most likely to KEEP COSTS DOWNN. I believe that the EM is often looked upon as a luxury for the DBA by the business. It is not producing revenue. The users do not utilize it, it is for support. It is easy to comprehend why it is often going to be an application/system that is deemed perfect for virtualization.
Pointing fingers is also easy when the EM does not get the resources it requires due to the basic nature and features of the VM without clear and skilled knowledge of what the EM requires. I have spent a large amount of my time, due to this type of issue, creating very politically correct explanations of why the DBA staff was not aware of an outage/critical issue/spam bogus alerting and it is important to me to do so. Everyone is a professional and they are doing their job to the best of their ability. Knowing the critical importance of my Enterprise Manager environment to the business to notify me as the technical specialist to address issues and to keep revenue flowing, why would I put myself, the administrator of the VM environment or the company I am there to support in the position of having this critical environment reside on a technology who’s basic nature is better suited to other applications/uses?
There are times when a VM is going to be the only choice for hardware for an Enterprise Manager project or environment a customer has and I will do everything in my power to ensure it has the best support and that I continue to learn more about virtualization. I hope that enhancements and education on how best to build virtualized environments to support production 24×7 mission critical systems will continue and that at some point VM will be as easy a choice to recommend as other options when it comes to Oracle’s Enterprise Manager.