ASM disk group just will not mount

I recently had an issue involving a damaged ASM disk group. Each time an attempt was made to mount the disk group it would return an ORA-0600 [kfrValAcd30]. There had been some network and IO issues reported in this environment, so something was messed up. The usual tricks using
kfed to repair the headers were not working. The GI version was 11.2.0.2 and it smelled like a bug. A client SR with Oracle Support yielded a recommendation to delete the headers of the disks and recreate the disk group. Hmmm….This sounded a bit extreme as the first step. This was the vendors verbatim response:
"ORA-600 [kfrValAcd30] signaled when the expected change sequence doesn't match with the sequence we find, during the recovery of the diskgroup. This is some kind of inconsistency in the ACD block which is usually caused by a platform specific install/operational issue. It may also happen if there is any IO or storage issue . This should be further investigated by the OS/Storage vendor. Solution The only way to resolve this is by recreating the respective diskgroup. How To Drop and Recreate ASM Diskgroup DOC ID - 563048.1"The proposed solution in MOS Note 563048.1 starts off with: Erase ASM metadata information for disk using OS command: !!! YOU HAVE TO BE CAREFUL WHEN TYPING DISK NAME !!! For Unix platforms using 'dd' Unix command. Example:
dd if=/dev/zero of=/dev/raw/raw13 bs=1048576 count=50Okay, any client following this recommendation, has now officially nuked the first 50MB of data on the disk without backing up anything. If possible, it is always a good idea to backup as much of the damaged environment as possible, so if what you try fails, you can restart at the last known good point.
To be fair, earlier in the MOS note (in the Goals section, this course of action is qualified with: "Erasing the header using "dd" command is very dangerous operation and need to be done under support supervision and when it [is] confirmed by support that fixing the header is impossible. ... Backup disk header for all member disks in diskgroup"This MOS note has a purpose. It tackles the issue of dropping a disk group that is not being dropped with SQL commands. So the note is good, the advice less so for two reasons. Firstly, no attempt had been made restore data availability either with regular restore procedures or by attempting to extract any data from the disks. Dropping the disk groups as a first step will cause all data in the diskgroup to be lost. Secondly, I was not convinced that the disk groups could not be dropped without resorting to wiping the headers when the time came to drop the disk group. I ignored the "dd" advice and tested the several other options in a 12.1.0.2 two-node RAC lab environment. ASM disk groups may sometimes have problems due to bugs and more commonly due to I/O subsystem related failures. The software is generally resilient enough to fix itself, but if there is an issue with a disk group that refuses to mount, here are two options to consider.
Option 1: DB has regular RMAN backups
Solution 1 Overview:
- Collect some metadata to recreate the diskgroup when we drop it.
- Collect some DB structural info to identify which datafiles reside in affected diskgroup.
- Restore and recover datafiles previously resident in this diskgroup from RMAN backupset.
- Drop and recreate the faulty disk group.
Option 2: DB has NO regular RMAN backups, recent archivelogs are available
Solution 2 Overview:
- Same as above, until step 3. We cannot restore datafiles from an RMAN backupset. We use AMDU to extract the datafile from the diskgroup.
- Use AMDU to report on metadata from the unmountable disk group. View the report produced for corruptions and other problems.
- Use AMDU to extract the datafiles from the unmountable disk group.
- Rename the datafile in the controlfile to point to the AMDU extracted datafiles, apply recovery and open the database.
- Drop and recreate the faulty disk group.