Cleaning up PID files when Oracle GRID/RAC upgrades and patches fail
[root@oravm01 ~]# find /u01/ -name \*.pid
/u01/app/11.2.0/grid/crf/admin/run/crfmond/soravm01.pid
/u01/app/11.2.0/grid/crf/admin/run/crflogd/loravm01.pid
/u01/app/11.2.0/grid/gipc/init/oravm01.pid
/u01/app/11.2.0/grid/mdns/init/oravm01.pid
/u01/app/11.2.0/grid/gpnp/init/oravm01.pid
/u01/app/11.2.0/grid/ctss/init/oravm01.pid
/u01/app/11.2.0/grid/ologgerd/init/oravm01.pid
/u01/app/11.2.0/grid/ohasd/init/oravm01.pid
/u01/app/11.2.0/grid/evm/init/oravm01.pid
/u01/app/11.2.0/grid/osysmond/init/oravm01.pid
/u01/app/11.2.0/grid/log/oravm01/agent/crsd/oraagent_oracle/oraagent_oracle.pid
/u01/app/11.2.0/grid/log/oravm01/agent/crsd/orarootagent_root/orarootagent_root.pid
/u01/app/11.2.0/grid/log/oravm01/agent/ohasd/oraagent_oracle/oraagent_oracle.pid
/u01/app/11.2.0/grid/log/oravm01/agent/ohasd/orarootagent_root/orarootagent_root.pid
/u01/app/11.2.0/grid/log/oravm01/gpnpd/oravm01.pid
What is in those files? Just a PID. Here's an example:
[root@oravm01 ~]# cat /u01/app/11.2.0/grid/crs/init/oravm01.pid
4999
[root@oravm01 ~]#
[root@oravm01 ~]# ps -p 4999 -o cmd
CMD
/u01/app/11.2.0/grid/bin/crsd.bin reboot
When patching and the patch process fails, a number of these PID files may remain behind, even though the processes they represent may no longer be running. When an attempt is made to rollback the patch, Oracle will not restart these processes, as it reads the PID file and believes that process is already running. Why Oracle does not do a cleanup of dead PID files is something of a mystery to me. In any case, here is a small script to rename all PID files that do not have a corresponding Process.
#!/bin/bash
# chkpid.sh
for pidfile in $(find /u01/ -name \*.pid)
do
pid=$(cat $pidfile)
ps -p $pid > /dev/null
ret=$?
if [[ $ret -ne 0 ]]; then
echo "#######################"
echo " PID: $pid"
echo " Pid not found for file:"
ls -ld $pidfile
mv $pidfile ${pidfile}.old
fi
done
This script has been used a number of times now when it became impossible to rollback a failed patch. This doesn't always work, but frequently it does. Why do I not just delete the files? Because I may want to verify some process ID's with some log and trace files. How often does this actually happen? I have personally seen this occur a number of times. The most recent was a 2 node RAC. The script made it possible to restart one of the nodes. The other node however required RAC reconfiguration. Even so it was a sigh of relief to get one node up immediately and ensure all was OK with the database.
On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Linux Patching and Oracle: Detect RPM conflicts before they happen
Linux Patching and Oracle: Detect RPM conflicts before they happen
Nov 15, 2011 12:00:00 AM
9
min read
How to Fix the Status of the Oracle GI CRS After a Failed Upgrade
How to Fix the Status of the Oracle GI CRS After a Failed Upgrade
Oct 30, 2020 12:00:00 AM
6
min read
How to delete an RAC Database Using DBCA silent mode
How to delete an RAC Database Using DBCA silent mode
Aug 12, 2019 12:00:00 AM
2
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.