Background
I ran into a situation where we needed to recover from an old online backup which (due to some issues with the RMAN "KEEP" command) was missing the archived redo log backups/files needed to make the backup consistent. The client wasn't concerned about data that changed during the backup, they were interested in checking some very old data from long before this online backup had started. Visualizing the scenario using a timeline (not to scale):|-------|------------------|---------|------------------| t0 t1 t2 t3 t4 Data is added PresentThe client thought that some data had become corrupted and wasn't sure when but knew that it wasn't recently so the flashback technologies were not an option. Hence they wanted a restore of the database into a new temporary server as of time t1 which was in the distant past. An online (hot) backup was taken between t2 and t3 and was considered to be old enough or close enough to t1 however the problem was that all archived redo log backups were missing. The client was certain that the particular data they were interested in would not have change during the online backup. Hence the question is: without the necessary redo data to make the online backup consistent (between times t2 and t3) can we still open the database to extract data from prior to when the online backup began? The official answer is "no" - the database must be made consistent to be opened. And with an online backup the redo stream is critical to making the backed up datafiles consistent. So without the redo vectors in the redo stream, the files cannot be made consistent with each other and hence the database cannot be opened. However the unofficial, unsupported answer is that it can be done. This article covers the unsupported and unofficial methods for opening a database with consistency corruption so that certain data can be extracted. Other scenarios can lead to the same situation. Basically this technique can be used to open the Oracle database any time the datafiles cannot be made consistent.
Demo Setup
To illustrate the necessary steps I've setup a test 12c non-container database called NONCDB. And to simulate user transactions against it I ran a light workload using the Swingbench Order Entry (SOE) benchmark from another computer in the background. Before beginning any backups or recoveries I added two simple tables to the SCOTT schema and some rows to represent the "old" data (with the words "OLD DATA" in the C2 column):Notice that I added a PK-FK referential integrity constraint and placed each table is a different tablespace so they could be backed up at different times. These first entries represent my "old data" from time t1.SQL> create table scott.parent (c1 int, c2 varchar2(16), constraint parent_pk primary key (c1)) tablespace users; Table created. SQL> create table scott.child (c1 int, c2 varchar2(16), foreign key (c1) references scott.parent(c1)) tablespace soe; Table created. SQL> insert into scott.parent values(1, 'OLD DATA 001'); 1 row created. SQL> insert into scott.parent values(2, 'OLD DATA 002'); 1 row created. SQL> insert into scott.child values(1, 'OLD DETAILS A'); 1 row created. SQL> insert into scott.child values(1, 'OLD DETAILS B'); 1 row created. SQL> insert into scott.child values(1, 'OLD DETAILS C'); 1 row created. SQL> insert into scott.child values(2, 'OLD DETAILS D'); 1 row created. SQL> commit; Commit complete. SQL>
The Online Backup
The next step is to perform the online backup. For simulation purposes I'm adjusting the steps a little bit to try to represent a real life situation where the data in my tables is being modified while the backup is running. Hence my steps are:- Run an online backup of all datafiles except for the USERS tablespace.
- Add some more data to my test tables (hence data going into the CHILD table is after the SOE tablespace backup and the data into the PARENT table is before the USERS tablespace backup).
- Record the current archived redo log and then delete it to simulate the lost redo data.
- Backup the USERS tablespace.
- Add some post backup data to the test tables.
Notice that in the above steps that since I'm using Oracle Database 12c I'm able to execute normal SQL commands from RMAN - this is a RMAN 12c new feature.$ rman Recovery Manager: Release 12.1.0.2.0 - Production on Thu Feb 26 15:59:36 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. RMAN> connect target connected to target database: NONCDB (DBID=1677380280) RMAN> backup datafile 1,2,3,5; Starting backup at 26-FEB-15 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=46 device type=DISK channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00005 name=/u01/app/oracle/oradata/NONCDB/datafile/SOE.dbf input datafile file number=00001 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_b2k8dsno_.dbf input datafile file number=00002 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_sysaux_b2k8f3d4_.dbf input datafile file number=00003 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_undotbs1_b2k8fcdm_.dbf channel ORA_DISK_1: starting piece 1 at 26-FEB-15 channel ORA_DISK_1: finished piece 1 at 26-FEB-15 piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp tag=TAG20150226T155942 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:11:16 Finished backup at 26-FEB-15 Starting Control File and SPFILE Autobackup at 26-FEB-15 piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/autobackup/2015_02_26/o1_mf_s_872698259_bgzb0647_.bkp comment=NONE Finished Control File and SPFILE Autobackup at 26-FEB-15 RMAN> alter system switch logfile; Statement processed RMAN> commit; Statement processed RMAN> alter system switch logfile; Statement processed RMAN> insert into scott.parent values (3, 'NEW DATA 003'); Statement processed RMAN> insert into scott.child values (3, 'NEW DETAILS E'); Statement processed RMAN> commit; Statement processed RMAN> select sequence# from v$log where status='CURRENT'; SEQUENCE# ---------- 68 RMAN> alter system switch logfile; Statement processed RMAN> alter database backup controlfile to '/tmp/controlfile_backup.bkp'; Statement processed RMAN> backup datafile 4; Starting backup at 26-FEB-15 using channel ORA_DISK_1 channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00004 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_users_b2k8gf7d_.dbf channel ORA_DISK_1: starting piece 1 at 26-FEB-15 channel ORA_DISK_1: finished piece 1 at 26-FEB-15 piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp tag=TAG20150226T165814 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 26-FEB-15 Starting Control File and SPFILE Autobackup at 26-FEB-15 piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/autobackup/2015_02_26/o1_mf_s_872701095_bgzdrrrh_.bkp comment=NONE Finished Control File and SPFILE Autobackup at 26-FEB-15 RMAN> alter database backup controlfile to '/tmp/controlfile_backup.bkp'; Statement processed RMAN> insert into scott.parent values (4, 'NEW DATA 004'); Statement processed RMAN> insert into scott.child values (4, 'NEW DETAILS F'); Statement processed RMAN> commit; Statement processed RMAN> exit Recovery Manager complete. $
Corrupting the Backup
Now I'm going to corrupt my backup by removing one of the archived redo logs needed to make the backup consistent:Finally I'll remove the OLD data to simulate the data loss (representing t4):SQL> set pages 999 lines 120 trims on tab off SQL> select 'rm '||name stmt from v$archived_log where sequence#=68; STMT ------------------------------------------------------------------------------------------------------------------------ rm /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_68_bgzcnv04_.arc SQL> !rm /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_68_bgzcnv04_.arc SQL>
SQL> select * from scott.parent order by 1; C1 C2 ---------- ---------------- 1 OLD DATA 001 2 OLD DATA 002 3 NEW DATA 003 4 NEW DATA 004 SQL> select * from scott.child order by 1; C1 C2 ---------- ---------------- 1 OLD DETAILS A 1 OLD DETAILS B 1 OLD DETAILS C 2 OLD DETAILS D 3 NEW DETAILS E 4 NEW DETAILS F 6 rows selected. SQL> delete from scott.child where c2 like 'OLD%'; 4 rows deleted. SQL> delete from scott.parent where c2 like 'OLD%'; 2 rows deleted. SQL> commit; Commit complete. SQL>
Attempting a Restore and Recovery
Now let's try to recover from our backup on a secondary system so we can see if we can extract that old data. After copying over all of the files, the first thing to do is to try a restore as per normal:Notice that it did restore the datafiles from both the SOE and USERS tablespaces, however we know that those are inconsistent with each other. Attempting to do the recovery should give us an error due to the missing redo required for consistency:$ rman target=/ Recovery Manager: Release 12.1.0.2.0 - Production on Mon Mar 2 08:40:12 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. connected to target database (not started) RMAN> startup nomount; Oracle instance started Total System Global Area 1577058304 bytes Fixed Size 2924832 bytes Variable Size 503320288 bytes Database Buffers 1056964608 bytes Redo Buffers 13848576 bytes RMAN> restore controlfile from '/tmp/controlfile_backup.bkp'; Starting restore at 02-MAR-15 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=12 device type=DISK channel ORA_DISK_1: copied control file copy output file name=/u01/app/oracle/oradata/NONCDB/controlfile/o1_mf_b2k8d9nq_.ctl output file name=/u01/app/oracle/fast_recovery_area/NONCDB/controlfile/o1_mf_b2k8d9v5_.ctl Finished restore at 02-MAR-15 RMAN> alter database mount; Statement processed released channel: ORA_DISK_1 RMAN> restore database; Starting restore at 02-MAR-15 Starting implicit crosscheck backup at 02-MAR-15 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=12 device type=DISK Crosschecked 4 objects Finished implicit crosscheck backup at 02-MAR-15 Starting implicit crosscheck copy at 02-MAR-15 using channel ORA_DISK_1 Crosschecked 2 objects Finished implicit crosscheck copy at 02-MAR-15 searching for all files in the recovery area cataloging files... cataloging done using channel ORA_DISK_1 channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_b2k8dsno_.dbf channel ORA_DISK_1: restoring datafile 00002 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_sysaux_b2k8f3d4_.dbf channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_undotbs1_b2k8fcdm_.dbf channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/NONCDB/datafile/SOE.dbf channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp tag=TAG20150226T155942 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:01:46 channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_users_b2k8gf7d_.dbf channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp tag=TAG20150226T165814 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:00:01 Finished restore at 02-MAR-15 RMAN>
As expected we got the dreaded ORA-01547, ORA-01194, ORA-01110 errors meaning that we don't have enough redo to make the recovery successful.RMAN> recover database; Starting recover at 02-MAR-15 using channel ORA_DISK_1 starting media recovery archived log for thread 1 with sequence 67 is already on disk as file /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_67_bgzcn05f_.arc archived log for thread 1 with sequence 69 is already on disk as file /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_69_bgzdqo9n_.arc Oracle Error: ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_bh914cx2_.dbf' RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of recover command at 03/02/2015 08:44:21 RMAN-06053: unable to perform media recovery because of missing log RMAN-06025: no backup of archived log for thread 1 with sequence 68 and starting SCN of 624986 found to restore RMAN>
Attempting a Recovery
Now the crux of the situation. We're stuck with the common inconsistency error which most seasoned DBAs should be familiar with:Oracle Error: ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_bh914cx2_.dbf' RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of recover command at 03/02/2015 08:44:21 RMAN-06053: unable to perform media recovery because of missing log RMAN-06025: no backup of archived log for thread 1 with sequence 68 and starting SCN of 624986 found to restoreAnd of course we need to be absolutely positive that we don't have the missing redo somewhere. For example in an RMAN backup piece on disk or on tape somewhere from an archive log backup that can be restored. Or possibly still in one of the current online redo logs. DBAs should explore all possible options for retrieving the missing redo vectors in some form or another before proceeding. However, if we're absolutely certain of the following we can continue:
- We definitely can't find the missing redo anywhere.
- We absolutely need to extract data from prior to the start of the online backup.
- Our data definitely wasn't modified during the online backup.
The fact that there are two rows returned and that not all files have FUZZY=NO indicates that we have a problem and that more redo is required before the database can be opened with the RESETLOGS option. But our problem is that we don't have that redo and we're desperate to open our database anyway.SQL> select fuzzy, status, checkpoint_change#, 2 to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time, 3 count(*) 4 from v$datafile_header 5 group by fuzzy, status, checkpoint_change#, checkpoint_time 6 order by fuzzy, status, checkpoint_change#, checkpoint_time; FUZZY STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME COUNT(*) ----- ------- ------------------ -------------------- ---------- NO ONLINE 647929 26-FEB-2015 16:58:14 1 YES ONLINE 551709 26-FEB-2015 15:59:43 4 SQL>
Recovering without Consistency
Again, recovering without consistency is not supported and should only be attempted as a last resort. Opening the database with the data in an inconsistent state is actually pretty simple. We simply need to set the "_allow_resetlogs_corruption" hidden initialization parameter and set the undo management to "manual" temporarily:Now, will the database open? The answer is still: "probably not". Giving it a try we get:SQL> alter system set "_allow_resetlogs_corruption"=true scope=spfile; System altered. SQL> alter system set undo_management='MANUAL' scope=spfile; System altered. SQL> shutdown abort; ORACLE instance shut down. SQL> startup mount; ORACLE instance started. Total System Global Area 1577058304 bytes Fixed Size 2924832 bytes Variable Size 503320288 bytes Database Buffers 1056964608 bytes Redo Buffers 13848576 bytes Database mounted. SQL>
Doesn't look good, right? Actually the situation is not that bad. To put it simply this ORA-00600 error means that a datafile has a recorded SCN that's ahead of the database SCN. The current database SCN is shown as the 3rd argument (in this case 551715) and the datafile SCN is shown as the 5th argument (in this case 562781). Hence a difference of:SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [2663], [0], [551715], [0], [562781], [], [], [], [], [], [], [] Process ID: 4538 Session ID: 237 Serial number: 5621 SQL>
562781 - 551715 = 11066In this example, that's not too large of a gap. But in a real system, the difference may be more significant. Also if multiple datafiles are ahead of the current SCN you should expect to see multiple ORA-00600 errors. The solution to this problem is quite simple: roll forward the current SCN until it exceeds the datafile SCN. The database automatically generates a number of internal transactions on each startup hence the way to roll forward the database SCN is to simply perform repeated shutdowns and startups. Depending on how big the gap is, it may be necessary to repeatedly shutdown abort and startup - the gap between the 5th and 3rd parameter to the ORA-00600 will decrease each time. However eventually the gap will reduce to zero and the database will open:
Now presumably we want to query or export the old data so the first thing we should do is switch back to automatic undo management using a new undo tablespace:SQL> connect / as sysdba Connected to an idle instance. SQL> shutdown abort ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 1577058304 bytes Fixed Size 2924832 bytes Variable Size 503320288 bytes Database Buffers 1056964608 bytes Redo Buffers 13848576 bytes Database mounted. Database opened. SQL>
Finally the database is opened (although the data is inconsistent) and the "old" data can be queried:SQL> create undo tablespace UNDOTBS2 datafile size 50M; Tablespace created. SQL> alter system set undo_tablespace='UNDOTBS2' scope=spfile; System altered. SQL> alter system set undo_management='AUTO' scope=spfile; System altered. SQL> shutdown abort ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 1577058304 bytes Fixed Size 2924832 bytes Variable Size 503320288 bytes Database Buffers 1056964608 bytes Redo Buffers 13848576 bytes Database mounted. Database opened. SQL>
As we can see, all of the "old" data (rows that begin with "OLD") that were from before the backup began (before t2) is available. And only part of the data inserted during the backup (rows where C1=3) as would be expected - that's our data inconsistency. We've already seen that we can SELECT the "old" data. We can also export it:SQL> select * from scott.parent; C1 C2 ---------- ---------------- 1 OLD DATA 001 2 OLD DATA 002 3 NEW DATA 003 SQL> select * from scott.child; C1 C2 ---------- ---------------- 1 OLD DETAILS A 1 OLD DETAILS B 1 OLD DETAILS C 2 OLD DETAILS D SQL>
At this point we've either queried or extracted that critical old data which was the point of the exercise and we should immediately discard the restored database. Remember it has data inconsistency which may include in internal tables an hence shouldn't be used for anything beyond querying or extracting that "old" data. Frequent crashes or other bizarre behavior of this restored database should be expected. So get in, get the data, get out, and get rid of it!$ expdp scott/tiger dumpfile=DATA_PUMP_DIR:OLD_DATA.dmp nologfile=y Export: Release 12.1.0.2.0 - Production on Mon Mar 2 09:39:11 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options Starting "SCOTT"."SYS_EXPORT_SCHEMA_02": scott/******** dumpfile=DATA_PUMP_DIR:OLD_DATA.dmp nologfile=y Estimate in progress using BLOCKS method... Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA Total estimation using BLOCKS method: 640 KB Processing object type SCHEMA_EXPORT/USER Processing object type SCHEMA_EXPORT/SYSTEM_GRANT Processing object type SCHEMA_EXPORT/ROLE_GRANT Processing object type SCHEMA_EXPORT/DEFAULT_ROLE Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA Processing object type SCHEMA_EXPORT/TABLE/TABLE Processing object type SCHEMA_EXPORT/TABLE/COMMENT Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS Processing object type SCHEMA_EXPORT/STATISTICS/MARKER . . exported "SCOTT"."CHILD" 5.570 KB 4 rows . . exported "SCOTT"."PARENT" 5.546 KB 3 rows Master table "SCOTT"."SYS_EXPORT_SCHEMA_02" successfully loaded/unloaded ****************************************************************************** Dump file set for SCOTT.SYS_EXPORT_SCHEMA_02 is: /u01/app/oracle/admin/NONCDB/dpdump/OLD_DATA.dmp Job "SCOTT"."SYS_EXPORT_SCHEMA_02" successfully completed at Mon Mar 2 09:39:46 2015 elapsed 0 00:00:34 $
Conclusion
If "desperate times call for desperate measures" and if you're in that situation described in detail above where you need the data, are missing the necessary redo vectors, and are not concerned about the relevant data being modified during the backup then there options. The "more redo needed for consistency" error stack should be familiar to most DBAs:ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistentAnd they may also be somewhat familiar with the "_allow_resetlogs_corruption" hidden initialization parameter. However don't let the resulting ORA-00600 error make the recovery attempt seem unsuccessful:
ORA-00600: internal error code, arguments: [2663], [0], [551715], [0], [562781], [], [], [], [], [], [], []This error is overcome-able and the database likely can still be opened so the necessary data can be queried or extracted. Note: this process has been tested with Oracle Database 10g, Oracle Database 11g, and Oracle Database 12c.
Share this
You May Also Like
These Related Stories
Table recovery with rman in Database 12c
Table recovery with rman in Database 12c
Sep 23, 2013
12
min read
Oracle Database 12c: Network Recovery in RMAN
Oracle Database 12c: Network Recovery in RMAN
Aug 14, 2013
12
min read
RMAN 12c : Say goodbye to your backup when dropping your PDB
RMAN 12c : Say goodbye to your backup when dropping your PDB
Feb 14, 2014
6
min read
Comments (2)