Failure Scenarios for your Restore Practice

Dec 9, 2010 / By Gwen Shapira

Tags: , ,

This post was inspired by Maris Elsins, who tweeted:  “Someone gzipped the redo logs of a running no-backups test DB. It was a nice surprise when I tried to start it up after shut immediate :D” and then “So the quiz: How do I recover from this situation?”

I solved it with ease, because 3 years ago our encrypted storage solution corrupted the keys for the volume with the redo logs. While our SAs were busy fixing the storage, I was responsible for the restore strategy.  Once you solve this problem once, it is easy to know how to restore from similar scenarios.

I always tell DBAs to practice their restore skills because this is one area that you are not allowed to get wrong.  Having experience with different types of restores give you the confidence to do the right thing in an emergency. However, I noticed that when it comes to practicing, not all DBAs are equally imaginative on what to practice. Practicing the same failure over and over is not the best possible practice.

Here are few scenarios you should be able to recover from:

1. Solve Maris’s problem: Your colleague just gzipped the redo logs of a DB without backups and restarted. What would you do? How much data did you lose? Would backups make any difference in this case? What if he zipped a data file? system data file? undo file? temp file?

2.You are using local storage (SSD + ASM). You just had horrible hardware failure. Your sysadmin gives you a new server, oracle software is installed and asks you to restore the data there.
How much data did you just lose? What should you do to avoid or minimize data loss? What if you don’t have an RMAN catalog?

3. Your last backup is from yesterday. 6 hours ago you added a new datafile to one of the tablespaces. Repeat scenario #1 in this case.

4. Something horrible happened to ASM storage on your RAC 2-node database. You are asked to restore the data on a standalone server.

5. Your colleague just deleted ORACLE_HOME. You were using ASM, and this was also ORACLE_HOME for your ASM.

Bonus question:

6. You are running bi-directional streams for one schema out of many. One of your colleagues dropped a table on a schema that isn’t replicated and exists only on the NY database. You are asked to do PITR to get the lost table back. How do you get streams to work again after running PITR only on one of the sides of bi-directional streams?

Any ideas for additional practice?

2 Responses to “Failure Scenarios for your Restore Practice”

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>