Troubleshooting RAC GoldenGate relocation
During Oracle RAC online patching, the GoldenGate resource must be relocated to a surviving node to maintain availability. When this relocation fails, several logs become critical for identifying the root cause.
In this scenario, the Oracle Grid Infrastructure Standalone Agents for Oracle Clusterware (XAG) was not a standalone installation but was instead utilizing the version existing within the $GRID_HOME.
Identifying the Environment and GoldenGate Resources
Before troubleshooting the failure, you must verify the versions of the tools in use and identify the exact name of the GoldenGate resource managed by the cluster.
Verifying Software Versions
You can determine the XAG and srvctl versions using the following commands:
[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl query releaseversion The Oracle Grid Infrastructure Agents release version is 3.1.0 [oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/srvctl -V srvctl version: 12.1.0.2.0
Locating the GoldenGate Instance
To find the specific instance name (in this case, gg_xx) and its current status across the RAC nodes, use crsctl:
[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/crsctl stat res -t | grep -A2 xag xag.gg_xx-vip.vip 1 ONLINE ONLINE racnode-dc1-2 STABLE xag.gg_xx.goldengate 1 ONLINE ONLINE racnode-dc1-2 STABLE
Analyzing the Relocation Failure
The relocation process can be initiated from any node. However, if the start of the GoldenGate process fails on the target node, Oracle Clusterware will attempt to "clean" the resource and restore it to its original location.
The Relocation Error Stack
Below is the output of a failed relocation attempt from racnode-dc1-2 to racnode-dc1-1. Note the transition from a successful VIP start to a failed GoldenGate start:
[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl relocate goldengate gg_xx --node racnode-dc1-1 CRS-2673: Attempting to stop 'xag.gg_xx.goldengate' on 'racnode-dc1-2' CRS-2677: Stop of 'xag.gg_xx.goldengate' on 'racnode-dc1-2' succeeded CRS-2673: Attempting to stop 'xag.gg_xx-vip.vip' on 'racnode-dc1-2' CRS-2677: Stop of 'xag.gg_xx-vip.vip' on 'racnode-dc1-2' succeeded CRS-2672: Attempting to start 'xag.gg_xx-vip.vip' on 'racnode-dc1-1' CRS-2676: Start of 'xag.gg_xx-vip.vip' on 'racnode-dc1-1' succeeded CRS-2672: Attempting to start 'xag.gg_xx.goldengate' on 'racnode-dc1-1' CRS-2674: Start of 'xag.gg_xx.goldengate' on 'racnode-dc1-1' failed CRS-2679: Attempting to clean 'xag.gg_xx.goldengate' on 'racnode-dc1-1' CRS-2681: Clean of 'xag.gg_xx.goldengate' on 'racnode-dc1-1' succeeded CRS-2564: Failed to relocate resource 'xag.gg_xx.goldengate'. Will attempt to restore it on 'racnode-dc1-2' now. ... CRS-4000: Command Relocate failed, or completed with errors.
After the failure, a status check confirms the resource has rolled back to the original node:
[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl status goldengate gg_xx Goldengate instance 'gg_xx' is running on racnode-dc1-2
Strategic Troubleshooting: Which Logs to Check?
When a relocation fails, efficiency is key. Checking the right logs in the correct order can significantly reduce downtime. Here are the logs to investigate, ranked by personal preference:
1. GoldenGate Error Log
The most direct source of information regarding why the GoldenGate processes (Manager, Extract, or Replicat) failed to initialize.
- Location:
$GG_HOME/ggserr.log
2. XAG Agent Trace File
Since GoldenGate is managed by the XAG agent, this trace file captures the interaction between the clusterware and the GoldenGate scripts.
- Location:
$ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/crsd_scriptagent_ggsuser.trc - (Note: Adjust the username
ggsuserto match your environment's GoldenGate owner.)
3. Clusterware Alert Log
Use this to see the broader cluster perspective and any high-level resource dependency failures.
- Location:
$ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/alert.log
In summary, GoldenGate relocation issues in a RAC environment often involve multiple layers of the stack. By verifying your resource names and starting your search with the ggserr.log and XAG agent traces, you can make your troubleshooting process far more efficient.
Oracle Database Consulting Services
Ready to optimize your Oracle Database for the future?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
REMOTE_LOGIN_PASSWORDFILE = EXCLUSIVE in Oracle 10g
Oracle RMAN Restore to the Same Machine as the Original Database
RMAN Recipes: How to Switch Oracle Logs Automatically
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.