When attempting to shut down Oracle Clusterware, you might encounter the frustrating CRS-2675: Stop of 'ora.crsd' on 'host02' failed error. This usually indicates that a managed resource is refusing to stop, preventing the entire stack from shutting down.
I will demonstrate my attempts to troubleshoot and identify the cause for this failure, specifically focusing on a problematic dbfs_mount resource.
During a routine maintenance window, the command to stop Cluster Ready Services (CRS) failed repeatedly on host02.
# crsctl stop crs CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2799: Failed to shut down resource 'dbfs_mount' on 'host02' CRS-2799: Failed to shut down resource 'ora.GG_PROD.dg' on 'host02' CRS-2799: Failed to shut down resource 'ora.asm' on 'host02' CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'host02' has failed CRS-2675: Stop of 'ora.crsd' on 'host02' failed CRS-4687: Shutdown command has completed with errors.
The logs show that the dbfs_mount resource is the primary bottleneck. In Oracle environments, DBFS (Database File System) is often used for storing GoldenGate trails or shared binaries, making it critical but occasionally stubborn during unmounts.
To understand how CRS manages this mount, we can check the resource properties using crsctl.
The resource is defined as a local_resource with a specific action script responsible for mounting and unmounting the filesystem.
$ $GRID_HOME/bin/crsctl stat res -w "TYPE = local_resource" -p NAME=dbfs_mount TYPE=local_resource ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh STOP_DEPENDENCIES=hard(ora.dbfs.db) CLEAN_TIMEOUT=60
Because dbfs_mount has a hard stop dependency on the database (ora.dbfs.db), if the mount doesn't close, the database cannot shut down, which in turn prevents the disk groups and ASM from closing.
When a script-based resource fails, the first places to look are the OS system logs and the Oracle Grid Infrastructure agent trace files.
The system log provides a high-level view of the mount-dbfs.sh script execution:
Apr 17 19:42:26 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata' Apr 17 19:42:26 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error Apr 17 21:01:36 host02 dbfs_client[71957]: OCI_ERROR 3114 - ORA-03114: not connected to ORACLE
For deeper detail, we examine the crsd_scriptagent_oracle.trc file. This log records the exact reason why fusermount failed:
2019-04-17 20:56:43.365201 :[dbfs_mount] [stop] unmounting DBFS from /ggdata 2019-04-17 20:56:43.415516 :[dbfs_mount] [stop] umounting the filesystem using '/bin/fusermount -u /ggdata' 2019-04-17 20:56:43.415541 :[dbfs_mount] [stop] /bin/fusermount: failed to unmount /ggdata: Device or resource busy 2019-04-17 20:56:43.415552 :[dbfs_mount] [stop] Stop - stopped, but still mounted, error
The error "Device or resource busy" confirms that an active process is still accessing the /ggdata directory, preventing the unmount.
To resolve this, we need to find the specific processes holding the mount open. The Linux fuser command is the ideal tool for this.
# fuser -mv /ggdata/ USER PID ACCESS COMMAND /ggdata: root kernel mount /ggdata ggsuser 64776 F.... extract oracle 65049 f.... oracle_65049_ih ggsuser 84987 F.... extract
In this case, several GoldenGate Extract processes and Oracle shadow processes were still holding file handles (F or f) on the DBFS mount point.
When troubleshooting CRS shutdown failures, keep these key diagnostic files and tools in your toolkit:
crsd_scriptagent_oracle.trc: The most detailed log for script-based resources./var/log/messages: Useful for seeing the sequence of mount/unmount attempts.fuser -mv <mount_point>: Essential for identifying which PIDs are locking the resource.Identifying the specific processes causing the unmount failure is half the battle. Stay tuned, as I will share other options to resolve the "failed to unmount" error in a future post.
Ready to optimize your Oracle Database for the future?