Making Technology Bulletproof

OracleOracle Basics

Oracle RAC Monitoring: V$, GV$ and the Rest

Furthering on our Oracle Real Application Clusters (RAC) knowledge, we’re going to go deeper into what we watch for a RAC database that may be different than a single instance.  RAC is built for scale and instance resilience, distributing workloads across multiple nodes.  At the same time, what gives it strength introduces monitoring complexity, especially when you’re not just watching a single instance but multiple, interconnected ones. To manage performance effectively in RAC, you need to understand the difference between V$ and GV$ views, what they show you, and how to interpret cluster-level wait events.  Along with performance, the overall health of the RAC cluster and interconnect must be known, too.

Let’s break it down.

Local vs. Global Perspectives

Using the V$ and GV$ views, along with the services and utilities provided with Oracle Grid Control clusterware, we can provide significant information around the health and performance of a RAC database.  Understanding when to use one set of views over the other is important.

V$ Views (i.e., V$SESSION, V$SQL, V$SYSTEM_EVENT)

  • Show instance-level data (local to the connected node).
  • Useful when you’re logged into a specific RAC node and want to see what’s happening there.
  • Faster to query for pinpoint analysis.

GV$ Views (i.e., GV$SESSION, GV$SQL, GV$SYSTEM_EVENT)

  • Show global data across all RAC instances.
  • Include a INST_ID column to identify the instance that collected the row.
  • Essential for holistic RAC performance and correlation between nodes.

Clusterware Service and Utilities

  • OSWatcher for network latency
  • Cluster Health Monitor (CHM / oclumon)
  • Cluster Ready Services (CRSD / CSSD)

Rule of Thumb:

  • Use V$ when you’re diagnosing local issues.
  • Use GV$ when understanding cross-instance contention or load balancing.

Important Wait Classes in Oracle RAC

In RAC, you’ll want to focus on the Cluster wait class and associated events, in addition to usual suspects like I/O or concurrency.  We enter into the waits at the class level, (category) and then dig into the wait event to troubleshoot the detailed wait issue.  The below list is not exhaustive, but just common ones I’ve had to work with when managing Oracle RAC environments.

Key Wait Classes

  • Cluster
    Related to interconnect traffic and block transfer delays.
    • gc cr request
    • gc current block busy
    • gc buffer busy acquire
    • gc buffer busy release
  • Concurrency
    Related to latching, locks, and row-level contention.
    • buffer busy waits
    • enq: TX – row lock contention
  • User I/O
    Physical read/write bottlenecks.
    • db file sequential read
    • db file scattered read
  • CPU + Scheduler
    Waits from overloaded CPUs.
    • resmgr:cpu quantum
    • os thread startup

What to Monitor in RAC

There’s significant complexity in a RAC environment that needs to be monitored to ensure health and performance.  Knowing the above wait classes and subsequent wait events is important, as are the following queries which can be used to gather vital information about a RAC environment.

  1. Instance Health and Load Distribution

This query shows how load is distributed across RAC nodes. Imbalances might point to connection issues or misconfigured services.  I’ve had considerable experiences where load balancing was replaced with specific instance targeted services.  Although not my favorite design for a RAC environment, this is an option and an awareness of the different configurations for RAC should be noted.

SELECT INST_ID, COUNT(*) AS active_sessions
FROM GV$SESSION
WHERE STATUS = 'ACTIVE'
GROUP BY INST_ID;
  1. Global Wait Events and Bottlenecks

The below query helps identify which nodes are waiting and on what. You’ll want to focus on interconnect-related events in RAC as it can demonstrate communication latency between nodes.

SELECT INST_ID, EVENT, TOTAL_WAITS, TIME_WAITED
FROM GV$SYSTEM_EVENT
WHERE EVENT NOT LIKE 'rdbms ipc message'
AND TIME_WAITED > 0
ORDER BY TIME_WAITED DESC;
  1. Global Cache Statistics

These stats reveal how much inter-node communication is happening—i.e., how often one instance is asking another for data blocks.  GC waits are one of the common wait classes in a RAC database.  There an incredible amount of communication between each node to keep each comprised of the status of another.

SELECT INST_ID, NAME, VALUE
FROM GV$SYSSTAT
WHERE NAME IN ('gc cr blocks received', 'gc buffer busy', 'gc current blocks received');

SQL Examples to Analyze RAC Performance

We scale out with a RAC solution when vertical scaling isn’t an option or may not be enough.  RAC architecture offers numerous nodes to process the workload vs. a single host, but how do you check the performance and ensure from a global view there isn’t a performance bottleneck?

Identify High GC Waits (Block Contention Between Nodes)

SELECT EVENT, INST_ID, COUNT(*) AS waits, SUM(SECONDS_IN_WAIT) AS total_seconds
FROM GV$SESSION
WHERE EVENT LIKE 'gc%'
GROUP BY EVENT, INST_ID
ORDER BY total_seconds DESC;

Check Service Load Distribution

SELECT SERVICE_NAME, INST_ID, COUNT(*) AS session_count
FROM GV$SESSION
WHERE USERNAME IS NOT NULL
GROUP BY SERVICE_NAME, INST_ID;

The above query shows how Oracle services are distributing sessions. Uneven service allocation can result in overloading one instance.

Track SQL Using Most Cluster Waits

What if you have a query in parallel that’s running across multiple nodes?  Can you identify it at the global level when it’s a performance issue?  The following query helps trace problematic SQL statements causing RAC-wide contention.

SELECT ss.INST_ID, se.EVENT, sq.SQL_ID, sq.PARSING_SCHEMA_NAME, sq.SQL_TEXT
FROM GV$SESSION ss
JOIN GV$SESSION_WAIT se ON ss.SID = se.SID AND ss.INST_ID = se.INST_ID
JOIN GV$SQL sq ON ss.SQL_ID = sq.SQL_ID AND ss.INST_ID = sq.INST_ID
WHERE se.EVENT LIKE 'gc%'
AND ROWNUM <= 10;

Best Practices for RAC Monitoring

Monitoring a RAC database isn’t that difficult, although it’s essential that you monitor at both the global database level and the instance.  There are times where an issue will only be at a node level (instance) and globally it simply doesn’t show as impactful.  The most important thing to remember is to not overthink it and use traditional troubleshooting and critical thinking skills to work through the issue.  Focus on what has changes, what is different and what consumes time – everything in Oracle is about DB Time, so if you focus on what is waiting, because something else is consuming resources, using up the resource time that another process needs, then you can identify and address.  If you are assessing health, it’s all about availability and accessibility.  Alert on what matters, present what gives value and the rest will come together.

  • Always correlate wait events with load (GV$ACTIVE_SESSION_HISTORY is useful for time-based queries).
  • Set up baselines for gc wait times, session distribution, and block transfers – know your workload!
  • Use Monitoring Tools for visualization and alerting.
  • Monitor interconnect health—even fast databases can stall on slow networks.
  • RAC uses shared storage, so if storage isn’t healthy, another node in RAC can’t take over. Fast IO is essential to support a RAC workload.

Monitoring Oracle RAC isn’t just about watching one node—it’s about watching the whole orchestra and then digging down to the details as required. The difference between V$ and GV$ is the difference between hearing a single instrument and hearing the symphony. RAC brings complexity, but it also brings power, if you know where to look.  Understanding how RAC nodes interact, where waits accumulate, and how SQL flows between instances is key to optimizing your environment. With the right queries and alerting in place, you can ensure that your cluster not only stays healthy, but performs well. 

Kellyn

http://about.me/dbakevlar