5: Troubleshooting
In this post I'll be sharing a few issues that we faced and sorted out. From a 'lessons learned' perspective, they are worth sharing in order to help others. Please note that they've all been applied in real life on X4 and/or X5 Exadatas.5.1 - Cell patching issue
- It happened when the patch failed on a cell:
myclustercel05 2016-05-31 03:46:42 -0500 Patch failed during wait for patch finalization and reboot.
2016-05-31 03:46:43 -0500 4 Done myclustercel05 :FAILED: Details in files .log /patches/April_bundle_patch/22738457/Infrastructure/12.1.2.3.1/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.1.160411/patchmgr.stdout, /patches/April_bundle_patch/22738457/Infrastructure/12.1.2.3.1/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.1.160411/patchmgr.stderr
2016-05-31 03:46:43 -0500 4 Done myclustercel05 :FAILED: Wait for cell to reboot and come online.
- Checking this logfile on the cell, we can see that it failed due to a reduced redundancy:
/opt/oracle/cell12.1.2.1.2_LINUX.X64_150617.1/.install_log.txt
CELL-02862: Deactivation of grid disks failed due to reduced redundancy of the following grid disks: DATA_CD_00_myclustercel05, DATA_CD_01_myclustercel05, DATA_CD_02_myclustercel05, DATA_CD_03_myclustercel05, DATA_CD_04_myclustercel05, DATA_CD_05_myclustercel05, DATA_CD_06_myclustercel05, DATA_CD_07_myclustercel05, DATA_CD_08_myclustercel05, DATA_CD_09_myclustercel05, DATA_CD_10_myclustercel05, DATA_CD_11_myclustercel05....
- It was due to the fact that the previous cell disks were not brought online after the reboot. In this case, we have to bring the disks online manually on the previous cell and resume the patch on the remaining cells
- Bring disks online manually on the failed cell:
# ssh root@myclustercel05
# cellcli -e alter griddisk all active
# cellcli -e list griddisk attributes name, asmmodestatus # to check the status of the disks
... wait until all disks are "ONLINE" ...
-
- Restart the patch on the remaining cells (cel06 and cel07)
# cd
# cat ~/cell_group | grep [67] > cells_6_and_7
# ./patchmgr -cells cells_6_and_7 -cleanup
# ./patchmgr -cells cells_6_and_7 -patch_check_prereq -rolling
# ./patchmgr -cells cells_6_and_7 -patch -rolling
# ./patchmgr -cells ~/cell_group -cleanup
5.2 - CRS does not restart Issue
It happened that after a failed Grid patch, CRS was unable to restart. We opened a SR and Oracle came with an action plan to restart the GI. Let's say the issue happened on server myclusterdb03 here:- Stop the clusterware
[root@myclusterdb03]# crsctl stop crs -f
- Remove the network sockets
[root@myclusterdb03]# cd /var/tmp/.oracle
[root@myclusterdb03]# rm -f *
- Remove the maps files
[root@myclusterdb03]# cd /etc/oracle/maps/
[root@myclusterdb03]# mv myclusterdb03_gipcd1318_cc0d4e3b8eedcf02bf179a98a71ce468-0000000000 X-myclusterdb03_gipcd1318_cc0d4e3b8eedcf02bf179a98a71ce468-0000000000
- Start the clusterware
[root@myclusterdb03]# crsctl start crs
The Clusterware, upon starting, will recreate network sockets and maps file.
5.3 - A Procedure to Add Instances to A Database
The following is a procedure that I performed after a CRS patch failed on a node 3. In this case, some databases were only running on nodes 3 and 4. As we had an issue on node 3 CRS patching, we opted to move these databases to nodes 1 and 2 before the end of the maintenance window so we could then work on the failed node 3 quietly with no downtime. The patch on node 4 was next and was also completed with no downtime. The goal was to add two instances on nodes 1 and 2 to the database mydb:select tablespace_name, file_name from dba_data_files where tablespace_name like 'UNDO%' ;
create undo tablespace UNDOTBS1 datafile '+DATA' ;
create undo tablespace UNDOTBS2 datafile '+DATA' ;
alter system set undo_tablespace='UNDOTBS1' sid='mydb1' ;
alter system set undo_tablespace='UNDOTBS2' sid='mydb2' ;
show spparameter instance
alter system set instance_number=3 sid='mydb1' scope=spfile ;
alter system set instance_number=4 sid='mydb2' scope=spfile ;
alter system set instance_name='mydb1' sid='mydb1' scope=spfile ;
alter system set instance_name='mydb2' sid='mydb2' scope=spfile ;
show spparameter thread ;
alter system set thread=1 sid='mydb1' scope=spfile ;
alter system set thread=2 sid='mydb2' scope=spfile ;
set lines 200
set pages 999
select * from gv$log ;
alter database add logfile thread 1 group 11 ('+DATA', '+RECO') size 100M, group 12 ('+DATA', '+RECO') size 100M, group 13 ('+DATA', '+RECO') size 100M, group 14 ('+DATA', '+RECO') size 100M ;
alter database add logfile thread 2 group 21 ('+DATA', '+RECO') size 100M, group 22 ('+DATA', '+RECO') size 100M, group 23 ('+DATA', '+RECO') size 100M, group 24 ('+DATA', '+RECO') size 100M ;
select * from gv$log ;
alter database enable public thread 1 ;
alter database enable public thread 2 ;
srvctl add instance -db mydb -i mydb1 -n myclusterdb01
srvctl add instance -db mydb -i mydb2 -n myclusterdb02
srvctl status database -d mydb
sqlplus / as sysdba
select host_name, status from gv$instance ;
srvctl modify service -d mydb -s myservice -modifyconfig -preferred 'mydb1,mydb2,mydb3,mydb4'
srvctl modify service -d mydb -s myservice -modifyconfig -preferred 'mydb1,mydb2,mydb3,mydb4'
srvctl start service -d mydb -s myservice -i mydb1
srvctl start service -d mydb -s myservice -i mydb2
srvctl start service -d mydb -s myservice -i mydb1
srvctl start service -d mydb -s myservice -i mydb2
5.4 - OPatch Resume
As general advice, if an opatch/ opatchauto operation fails, try to resume it:[root@myclusterdb03]# cd /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103
[root@myclusterdb03 24448103]# /u01/app/12.1.0.2/grid/OPatch/opatchauto resume -oh /u01/app/12.1.0.2/grid
Share this
Previous story
← Asynchronous replication from MySQL cluster
Next story
Bushy join trees in Oracle 12.2 →
You May Also Like
These Related Stories
How to patch an exadata (part 3) - grid and database OH patching
How to patch an exadata (part 3) - grid and database OH patching
Mar 28, 2017
9
min read
How to patch an exadata (part 1) - introduction and prerequisites
How to patch an exadata (part 1) - introduction and prerequisites
Mar 28, 2017
8
min read
How to patch an exadata (part 2) - cells, IB and DB servers
How to patch an exadata (part 2) - cells, IB and DB servers
Mar 28, 2017
8
min read
No Comments Yet
Let us know what you think