How to troubleshoot a failure to mount DBFS

Written by Michael Dinh | Apr 24, 2019 4:00:00 AM

When attempting to shut down Oracle Clusterware, you might encounter the frustrating CRS-2675: Stop of 'ora.crsd' on 'host02' failed error. This usually indicates that a managed resource is refusing to stop, preventing the entire stack from shutting down.

I will demonstrate my attempts to troubleshoot and identify the cause for this failure, specifically focusing on a problematic dbfs_mount resource.

1. The Initial Failure: Shutdown Command Errors

During a routine maintenance window, the command to stop Cluster Ready Services (CRS) failed repeatedly on host02.

# crsctl stop crs CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2799: Failed to shut down resource 'dbfs_mount' on 'host02' CRS-2799: Failed to shut down resource 'ora.GG_PROD.dg' on 'host02' CRS-2799: Failed to shut down resource 'ora.asm' on 'host02' CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'host02' has failed CRS-2675: Stop of 'ora.crsd' on 'host02' failed CRS-4687: Shutdown command has completed with errors.

The logs show that the dbfs_mount resource is the primary bottleneck. In Oracle environments, DBFS (Database File System) is often used for storing GoldenGate trails or shared binaries, making it critical but occasionally stubborn during unmounts.

2. Analyzing the Resource Configuration

To understand how CRS manages this mount, we can check the resource properties using crsctl.

DBFS Mount Definition

The resource is defined as a local_resource with a specific action script responsible for mounting and unmounting the filesystem.

$ $GRID_HOME/bin/crsctl stat res -w "TYPE = local_resource" -p NAME=dbfs_mount TYPE=local_resource ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh STOP_DEPENDENCIES=hard(ora.dbfs.db) CLEAN_TIMEOUT=60

Because dbfs_mount has a hard stop dependency on the database (ora.dbfs.db), if the mount doesn't close, the database cannot shut down, which in turn prevents the disk groups and ASM from closing.

3. Investigating System and Trace Logs

When a script-based resource fails, the first places to look are the OS system logs and the Oracle Grid Infrastructure agent trace files.

System Logs (/var/log/messages)

The system log provides a high-level view of the mount-dbfs.sh script execution:

Apr 17 19:42:26 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata' Apr 17 19:42:26 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error Apr 17 21:01:36 host02 dbfs_client[71957]: OCI_ERROR 3114 - ORA-03114: not connected to ORACLE

Script Agent Trace Logs

For deeper detail, we examine the crsd_scriptagent_oracle.trc file. This log records the exact reason why fusermount failed:

2019-04-17 20:56:43.365201 :[dbfs_mount] [stop] unmounting DBFS from /ggdata  2019-04-17 20:56:43.415516 :[dbfs_mount] [stop] umounting the filesystem using '/bin/fusermount -u /ggdata'  2019-04-17 20:56:43.415541 :[dbfs_mount] [stop] /bin/fusermount: failed to unmount /ggdata: Device or resource busy  2019-04-17 20:56:43.415552 :[dbfs_mount] [stop] Stop - stopped, but still mounted, error

The error "Device or resource busy" confirms that an active process is still accessing the /ggdata directory, preventing the unmount.

4. Identifying the "Busy" Process

To resolve this, we need to find the specific processes holding the mount open. The Linux fuser command is the ideal tool for this.

# fuser -mv /ggdata/ USER        PID ACCESS COMMAND /ggdata:    root kernel mount /ggdata             ggsuser  64776 F.... extract             oracle   65049 f.... oracle_65049_ih             ggsuser  84987 F.... extract

In this case, several GoldenGate Extract processes and Oracle shadow processes were still holding file handles (F or f) on the DBFS mount point.

Conclusion: Tools for Investigation

When troubleshooting CRS shutdown failures, keep these key diagnostic files and tools in your toolkit:

crsd_scriptagent_oracle.trc: The most detailed log for script-based resources.
/var/log/messages: Useful for seeing the sequence of mount/unmount attempts.
fuser -mv <mount_point>: Essential for identifying which PIDs are locking the resource.

Identifying the specific processes causing the unmount failure is half the battle. Stay tuned, as I will share other options to resolve the "failed to unmount" error in a future post.

Database Consulting Services

Ready to optimize your Database for the future?

View full post