Emergency Recovery or RMAN DUPLICATE Rerun Just Works!

2 min read
Sep 12, 2010 12:00:00 AM

Working as a DBA often means facing high-pressure situations, especially during an on-call weekend. Recently, I was tasked with an emergency 700GB database clone to troubleshoot a critical functional issue. The environment was Oracle 10.2.0.5 (10gR2) 64-bit on Linux.

To perform the clone, I utilized the following RMAN commands:

export NLS_DATE_FORMAT='YYYY-MM-DD:HH24:MI:SS' rman target / nocatalog auxiliary sys/syspwd@TARGET log=dup_TARGET.log  run {   set until time "to_date('10-JAN-2010 23:05:25', 'DD-MON-YYYY HH24:MI:SS')";   allocate auxiliary channel c1 type disk;   allocate auxiliary channel c2 type disk;   duplicate target database to TARGET; } 

The Skeptic’s View of the DUPLICATE Command

Historically, I haven't been a proponent of the DUPLICATE command. My hesitations were rooted in three main concerns:

  • Reliability and Transparency: If a command-by-command manual restore fails, you can simply fix the issue and resume from the last failed step. With DUPLICATE, the process feels like a "black box"—if it fails, do you have to restart the entire multi-hour process?
  • Production Isolation: DUPLICATE requires a connection to the production database. In many high-security environments, we aim to isolate Production from Dev/Test to avoid any unnecessary performance overhead or security risks.
  • Infrastructure Requirements: The target environment must be configured to see the source backups in the exact same way as the source database, which isn't always feasible.

However, in this specific case, our NAS storage was centralized and shared across all hosts, making the infrastructure ideal for the DUPLICATE process. It was time to put old habits aside and embrace modern best practices.

When Disaster Strikes: The RMAN-03002 Error

After four hours of processing, I was met with a dreaded error stack. Under the pressure of management and tight timelines, seeing "Failure of Duplicate Db command" is a nightmare scenario.

Analyzing the Error Stack

At first glance, the errors suggested missing data in the backup sets: ORA-19615: some files not found in backup set ORA-19613: datafile 50 not found in backup set

However, a deeper look at the full log revealed the true culprit—a simple lack of disk space on the target node: ORA-19502: write error on file "/u03/TRGD03/db/apps_st/data/undotbs1.362.667992025" Linux-x86_64 Error: 28: No space left on devic

The "Rerun" Miracle: Validating Reliability

This failure presented the perfect opportunity to test the "rerun" capabilities of the DUPLICATE command in 10gR2. I cleared the necessary disk space and executed the exact same command again.

The results were impressive. Instead of starting from scratch, RMAN recognized the files that had already been successfully restored. The alert log showed that files previously taking hours were now "restored" in minutes:

Sun Sep 12 04:28:34 EDT 2010 Full restore complete of datafile 66 to datafile copy ... Elapsed time: 0:03:47 Sun Sep 12 04:29:30 EDT 2010 Full restore complete of datafile 94 to datafile copy ... Elapsed time: 0:04:43 

Final Result

The operation completed in just 90 minutes. RMAN restored the remaining files, applied the archive logs, recreated the control file, and opened the database with RESETLOGS.

Conclusion

The DUPLICATE command proved itself to be reliable, transparent, and incredibly efficient. By skipping already-restored files during a rerun, it saved me nearly six hours of work during an emergency. For those running Oracle 10.2.0.5, this command is a robust tool that adheres to modern best practices.

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

 

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.