I keep saying I’m going to start sharing what I’m doing in the Analytics space soon, but heck, there’s too much I need to keep adding to on the Oracle in Azure arena!
So, as most people know, I’m not a big fan of Oracle RAC, (Real Application Cluster). My opinion was that it was often sold for use cases that it doesn’t serve, (such as HA) and the resource demands between the nodes, as well as what happens when a node is evicted to those that are left are not in the best interest for most use cases. On the other hand, I LOVE Oracle Data Guard, active or standard, don’t matter, the product is great and it’s an awesome option for those migrating their Oracle databases to Azure VMs.
What is Oracle Data Guard
Oracle Data Guard is a cross between what Always on Availability Groups and SQL Server log shipping. It “ships” the redo logs from the primary database to a standby database, (or active database that can be used for reporting, hence the similarities to AG) and with a sync process. They is a complex configuration, but surprisingly simple failover process for the business to switch to one of the standbys if there is a failure on the primary.
The product has received excellent support from Oracle and the community and it’s something I highly recommend for those businesses now migrating databases over to Azure, but want something similar to what they had on-premises. Unlike RAC, I can put a standby, (secondary replica for you Microsoft folks) in a second location, then use a Far Sync Instance as a “pass through” from the primary to the standby vs. using the built in Sync direct to the standby database.
Why Use a Far Sync Instance
Oracle has done some great research on the benefit of a Far Sync Instance and the truth is, it improves the performance of the Data Guard sync on a VM by over 12% vs. using a standard Sync between a primary and a standby in Data Guard.
Twelve may not seem like a high number, but when you’re looking to decrease latency any way you can and working between two different location zones in the cloud, this can be significant improvement.
A Far Sync Instance, is just a small allocation of memory for the Oracle background processes, a matching number of redo logs, (+1 for threads used by the Fast Sync) and no actual datafiles. The primary then “ships” the redo logs to the Fast Sync Instance and then the Fast Sync can easily focus on its one task- send the redo logs wherever they need to go, no matter where that standby is located.
All of this is build out on Azure Virtual Machines, (VMs) with the appropriate version of Oracle installed on each VM, but of course, the Fast Sync instance can be placed on a very low VM sizing tier, as it doesn’t require many resources or it can be on a shared server if the existing product sharing the VM isn’t very “chatty”. The Oracle SGA for the Fast Sync can be as small as 300Mb and the CPU_COUNT=1. Remember, this is pushing redo logs to and from it all day, so it’s going to be busy on that front.
The above diagram demonstrates the best practices of both Oracle, as well as Azure for an Oracle Database, using Oracle Data Guard, but having the standby in a secondary location. By using the Far Sync instance, we’re able to ensure that the standby commit of the redo logs is timely enough for the primary, (as this is part of the configuration for the Oracle Data Guard.)
In this configuration, you’ll notice that to support Oracle Data Guard in Azure, we’ve included Express Route to ensure solid performance to the users, since we all know, the network is the last bottleneck.
There are a few best practices to consider when building out this solution and this includes:
- Use Async compression to decrease latency on the Far Sync transfers.
- Configure redundant network links on the customer side of the Express Route to tolerate outages.
- Choose IOPS for the standby VM that is faster than that of the redo on the primary VM to keep up with the sync, (something you might not consider for a standby server, but it’s necessary for this configuration)
- The Far Sync instance should have the same number of redo logs as the primary, +1 for every thread
- Instead of using RMAN, use Azure Site Recovery to take snapshots of each of the VMs, including the Far Sync VM, eliminating extra stress on the database tier.
If you’ve wondered how you would design an Oracle Data Guard environment, either standard or active in Azure, this is how I recommend my customers to do it.