Solaris Cluster and Decreasing Resource Needs After Upgrade

August 30, 2017 - By Kellyn

Delphix Engineering and Support are pretty amazing folks. They continue to pursue for solutions, no matter how much time it takes and the complex challenges they’re faced with supporting heterogenous environments, hardware configurations and customer needs.

This post is in support of the effort from our team that resulted in stability to a previously impacted Solaris 11.2 cluster configuration. The research, patching, testing and then resulting certification from Oracle was a massive undertaking from our team and I hope this information serves the community, but in no way is recommended by Delphix. It’s just what was done to resolve the problem, after logical decisions for the use of the system by our team.

Challenge

Environment: Solaris 11.3 (with SRU 17.5) + Oracle 12.2 RAC + ESX 5.5

Situation:

Post an upgrade to 12.2, environments were experiencing significant cluster instability, memory starvation due to the new demands for memory post the upgrade.

Upon inspection, it was found that numerous features required more memory than previous and the system simply didn’t have the means as to support it. As our environment was a Solaris environment with 12.2, there was a documented patch we needed to request from Oracle for RAC performance and node evictions. The environment was still experiencing node evictions, etc data showed that we’d have to triple the memory on each node to have continue using the environment as it had before. Our folks aren’t one to give up that easily, so secondary research was performed to find out if some of the memory use could be trimmed down.

What we discovered, is that what is old can become new again. My buddy and fellow Oakie, Marc Fielding had blogged, (along with links to other posts, including credit to another Oakie, Jeremy Schneider) about how he’d limited resources back in 2015 after patching to 12.1.0.2 and this post really helped the engineers at Delphix get past the last hump on the environment, even after implementing the patch to address a memory leak. Much of what you’re going to see here, came from that post, focused on its use in a development/test system, (Delphix’s sweet spot.)

Research

Kernel memory out of control

Starting with kernel memory usage, the mdb -k command can be used to inspect at a percentage level:

$ echo “::memstat” | mdb -k
Page Summary           Pages                 MB          %Tot
  ————                 —————-             —————-           —-
  Kernel               151528              3183            24%
  Anon                 185037              1623            12%
  ...

We can also look at it a second way, breaking down the kernel memory areas with kmsastat:

::kmsastat

cache                        buf    buf    buf    memory     alloc alloc 
name                        size in use  total    in use   succeed  fail 
------------------------- ------ ------ ------ --------- --------- ----- 
kmem_magazine_1               16   3371   3556     57344      3371     0 
kmem_magazine_3               32  16055  16256    524288     16055     0 
kmem_magazine_7               64  29166  29210   1884160     29166     0 
kmem_magazine_15             128   6711   6741    876544      6711     0 
...

Oracle ZFS ARC Cache

Next- Oracle ZFS has a very smart cache layer, also referred to as ARC (Adaptive replacement cache). Both a blessing and a curse, ARC consumes as much memory that is available, but is supposed to free up memory to other applications if it’s needed. This memory is used to supplement any slow disk I/O. When inspecting our environment, a significant amount was being over-allocated to ARC. This may be due to the newness of Oracle 12.2, but in a cluster, memory starvation can be a common cause of node eviction.

We can inspect the size stats for the ARC in the following file:

view /proc/spl/kstat/zfs/arcstats

This assumes ZFS is mounted on /proc, so your actual arcstats file may reside in a different path location than shown above. Inside the file, review the following information:

c is the target size of the ARC in bytes
c_max is the maximum size of the ARC in bytes
size is the current size of the ARC in bytes

Ours was eating up everything left, taking 100% of memory left, as we’ll discuss in the next section of this post.

Oracle Clusterware Memory

The Oracle clusterware is a third area that was investigated for frivolous memory usage that could be trimmed down. There’s some clear documented steps to investigate issues with misconfigurations and feature issues from Oracle that can assist in identifying many of these.

So, post upgrade and patching, what can you do to trim down memory usage to avoid memory upgrades to support the cluster upgrade?

Changes

From the list of features and installations that weren’t offering a benefit to a development/test environment, these were what made the list and why:

Update were made to the /etc/system file, (requires a reboot and must be performed as root):

Added set user_reserve_hint_pct=80
- This change was made to limit the ZFS on how much memory for the ARC cache. There was a significant issue for the customer when CRS processes weren’t able to allocate memory. 80% was the highest percentage this could be set without a node reboot being experienced, something we all prefer not to happen.
Stopped the Cluster Health Monitor, (CHM) process. This is a brand new background process in 12c Clusterware and collects workload data, which is significantly more valuable in a production environment, but in development and test? It can easily be a subsequent drain on CPU and memory that could be better put to use for more virtual databases.
To perform this, the following commands were used as the root user:

$ crsctl stop res ora.crf -init

$ crsctl delete res ora.crf -init

Removed the Trace File Analyzer Collector (tfactl). This background process collects the many trace files Oracle generates into a single location. Handy for troubleshooting, but it’s Java-based and has a significant memory footprint and subject to java heap issues.
It was uninstalled with the following command as the $ORACLE_HOME owner on each node of the cluster:

$ tfactl uninstall

Engineering stopped and disabled the Cluster Verification Utility, (CVU). In previous version this was a utility that could be manually added to the installation or performed post to troubleshoot issues via an Admin. This is another feature that simply eats up resources that could be reallocated to dev and test environments, so it was time to stop and disable it with the following:

$ srvctl cvu stop
$ srvctl cvu disable

Additional Changes

Reduced memory allocation for the ASM instance.
- The ASM instance in 12.2 is now using 1Gb of memory, where previous 256Mb. That’s a huge change that can impact other features dependent on that memory.
- Upon research, it was found that 750Mb was adequate, so if more memory reallocation is required, consider lowering the memory on each node to 750Mb.
To perform this set of instance level parameter change, run the following on any of the nodes and then restart each node until the cluster has been cycled to put the change into effect:

$ export ORACLE_HOME=<Grid Home>

$ export ORACLE_SID=<Local ASM SID>

$ sqlplus / as sysasm
alter system set "_asm_allow_small_memory_target"=true scope=spfile;
alter system set memory_target=750m scope=spfile;
alter system set memory_max_target=750m scope=spfile;

High CPU usage features can be troubling for most DBAs, but when it’s experienced on development and test databases that are often granted less resources to begin with vs. production, a change can often enhance the stability and longevity of these environments.

Disabled high-res time ticks in all databases, including ASM DBs, regular DBs, and the Grid Infrastructure Management Repository DB (GIMR, SID is -MGMTDB). High-res ticks are a new feature in 12c, and they seem to cause a lot of CPU usage from cluster time-keeping background processes like VKTM. Here’s the SQL to disable high-res ticks (must be run once in each DB):

alter system set "_disable_highres_ticks"=TRUE scope=spfile;

The team, after all these changes, found the Solaris kernel was still consuming more memory than before the upgrade, but it was more justifiable:

Solaris Kernel: 1GB of RAM
ARC Cache: between 1-2GB
Oracle Clusterware: 3Gb

Memory Upgrade

We Did Add Memory, but not as much as expected to.

After all the adjustments, we still were using over 5GB of memory for these three features, so upped each node from 8GB to 16GB to ensure enough resources to support all dev and test demands post the upgrade. We wanted to provision as many Virtual databses, (VDBs) for any development or test the groups needed, so having a more than 3Gb free for databases was going to be required!

The Solaris cluster, as this time, has experienced no more kernel panics, node evictions or unexpected reboots, which we need to admit is the most important outcome. It’s more difficult to explain an outage to users than why we shut down and uninstalled unused features to Oracle…. 🙂