Blog | Pythian

Troubleshooting RAC GoldenGate relocation

Written by Michael Dinh | Aug 17, 2018 4:00:00 AM

During Oracle RAC online patching, the GoldenGate resource must be relocated to a surviving node to maintain availability. When this relocation fails, several logs become critical for identifying the root cause.

In this scenario, the Oracle Grid Infrastructure Standalone Agents for Oracle Clusterware (XAG) was not a standalone installation but was instead utilizing the version existing within the $GRID_HOME.

Identifying the Environment and GoldenGate Resources

Before troubleshooting the failure, you must verify the versions of the tools in use and identify the exact name of the GoldenGate resource managed by the cluster.

Verifying Software Versions

You can determine the XAG and srvctl versions using the following commands:

[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl query releaseversion The Oracle Grid Infrastructure Agents release version is 3.1.0  [oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/srvctl -V srvctl version: 12.1.0.2.0 

Locating the GoldenGate Instance

To find the specific instance name (in this case, gg_xx) and its current status across the RAC nodes, use crsctl:

[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/crsctl stat res -t | grep -A2 xag xag.gg_xx-vip.vip  1 ONLINE ONLINE racnode-dc1-2 STABLE xag.gg_xx.goldengate  1 ONLINE ONLINE racnode-dc1-2 STABLE 

Analyzing the Relocation Failure

The relocation process can be initiated from any node. However, if the start of the GoldenGate process fails on the target node, Oracle Clusterware will attempt to "clean" the resource and restore it to its original location.

The Relocation Error Stack

Below is the output of a failed relocation attempt from racnode-dc1-2 to racnode-dc1-1. Note the transition from a successful VIP start to a failed GoldenGate start:

[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl relocate goldengate gg_xx --node racnode-dc1-1  CRS-2673: Attempting to stop 'xag.gg_xx.goldengate' on 'racnode-dc1-2' CRS-2677: Stop of 'xag.gg_xx.goldengate' on 'racnode-dc1-2' succeeded CRS-2673: Attempting to stop 'xag.gg_xx-vip.vip' on 'racnode-dc1-2' CRS-2677: Stop of 'xag.gg_xx-vip.vip' on 'racnode-dc1-2' succeeded CRS-2672: Attempting to start 'xag.gg_xx-vip.vip' on 'racnode-dc1-1' CRS-2676: Start of 'xag.gg_xx-vip.vip' on 'racnode-dc1-1' succeeded CRS-2672: Attempting to start 'xag.gg_xx.goldengate' on 'racnode-dc1-1' CRS-2674: Start of 'xag.gg_xx.goldengate' on 'racnode-dc1-1' failed CRS-2679: Attempting to clean 'xag.gg_xx.goldengate' on 'racnode-dc1-1' CRS-2681: Clean of 'xag.gg_xx.goldengate' on 'racnode-dc1-1' succeeded  CRS-2564: Failed to relocate resource 'xag.gg_xx.goldengate'. Will attempt to restore it on 'racnode-dc1-2' now. ... CRS-4000: Command Relocate failed, or completed with errors. 

After the failure, a status check confirms the resource has rolled back to the original node:

[oracle@racnode-dc1-1 ~]$ $GRID_HOME/bin/agctl status goldengate gg_xx Goldengate instance 'gg_xx' is running on racnode-dc1-2 

Strategic Troubleshooting: Which Logs to Check?

When a relocation fails, efficiency is key. Checking the right logs in the correct order can significantly reduce downtime. Here are the logs to investigate, ranked by personal preference:

1. GoldenGate Error Log

The most direct source of information regarding why the GoldenGate processes (Manager, Extract, or Replicat) failed to initialize.

  • Location: $GG_HOME/ggserr.log

2. XAG Agent Trace File

Since GoldenGate is managed by the XAG agent, this trace file captures the interaction between the clusterware and the GoldenGate scripts.

  • Location: $ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/crsd_scriptagent_ggsuser.trc
  • (Note: Adjust the username ggsuser to match your environment's GoldenGate owner.)

3. Clusterware Alert Log

Use this to see the broader cluster perspective and any high-level resource dependency failures.

  • Location: $ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/alert.log

In summary, GoldenGate relocation issues in a RAC environment often involve multiple layers of the stack. By verifying your resource names and starting your search with the ggserr.log and XAG agent traces, you can make your troubleshooting process far more efficient.

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?