THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Viewing RMAN jobs status and output

Yesterday I was discussing with a fellow DBA about ways to check the status of existing and/or past RMAN jobs. Good backup scripts usually write their output to some sort of log file so, checking the output is usually a straight-forward task. However, backup jobs can be scheduled in many different ways (crontab, Grid Control, Scheduled Tasks, etc) and finding the log file may be tricky if you don’t know the environment well.
Furthermore, log files may also have already been overwritten by the next backup or simply just deleted. An alternative way of accessing that information, thus, may come handy.

Fortunately, RMAN keeps the backup metadata around for some time and it can be accessed through the database’s V$ views. Obviously, if you need this information because your database just crashed and needs to be restored, the method described here is useless.

Read the rest of this entry . . .

MacGyvering a Remote Disk Usage Utility

For my backups at home I have a DNS-323 which, after a minimal twist of the arm was applied, can be accessed via ‘ssh’ and ‘rsync’. Methodology-wise, I’m following the general idea described in this forum. Namely: for each of my machines I have one backup directory per week of the year (01, 02, 03, etc) and — and this is the über-cool part of it — rsync‘s ‘--link-dest‘ is used to hard-link files that didn’t change since the previous backup run. This means that I can have weekly snapshots of all my machines at the fraction of the space a full backup would take. Very nice.

But, just like gas in a vacuum, files have a nasty habit to fill up all available disk space. In my case, the 100% mark was hit last week. Obviously, I had to delete stuff. But… which stuff?

Sure, I have access to du on the DNS-323 to find out. But du’ing 350G worth of disk space is no fun. And there is the matter of the hard-links that further complicates things. Even though the backup directory ‘enkidu/03′ weights 5G, if 95% of its content is hard-linked by other weeks, deleting it won’t give me back much. To properly get the job done, what I really want is a way to get a snapshot of the disk usage locally, and in a format that would allow me to navigate and examine it at will.

Read the rest of this entry . . .

Managing production systems: document, test, verify, try again.

A couple of days ago I was reading a paper Paxos Made Live – An Engineering Perspective written by Google engineers. It is an interesting reading about implementation of Paxos algorithm for building a fault-tolerant database. But one paragraph made me think I am reading something very familiar:

We decided to err on the side of caution and to rollback our system to the old version of Chubby (based on 3DB) in one of our data centers. At that point, the rollback mechanism was not properly documented (because we never expected to use it), its use was non-intuitive, the operator performing the roll-back had no experience with it, and when the rollback was performed, no member of the development team was present. As a result, an old snapshot was accidentally used for the rollback. By the time we discovered the error, we had lost 15 hours of data and several key datasets had to be rebuilt.

This really looked like one of the incident notification we at Pythian send to our customers in case of production outage or any other significant issue. Don’t get me wrong, I am not saying: “Look, big guys, like Google, have problems too!”. The point here is that when you manage production environment of any scale, whether it is a multi-terabyte heavily loaded system, or a “one database” website, you face similar organizational problems. This short paragraph points to some very important questions you should be asking yourself everyday if you are in charge of a production system.

  • Are all of your standard procedures, like backup/restore well documented? You never know who will be dealing with issues when thunder strikes.
  • Do you have a proper escalation procedures, so every production support team member knows where to look for help, in case he is stuck or in doubt?
  • Do you crosscheck work done by others? This can help you catch things like wrong backup used for restore, suggest a way to improve one’s work, or learn something new from your colleagues and make existing process better.
  • Stop trusting your own procedures. Test and verify them from time to time. Things tend to change, sometimes unnoticed. So if you haven’t tried to restore your backup for 3 months, you can’t really be sure it works.
  • When Was Your Last Disaster Recovery Test?

    If you answer anything else but something like “last month and every month before that”, then you are probably in troubles. Learn from Wikipedia’s Data Center Overheating.

    It doesn’t mean that they didn’t regularly test their disaster recovery process. Maybe they did but the failover mechanism was broken after the last test.

    A regular DR procedure validation is designed to minimize the risk of a broken process to go unnoticed. If the failure is detected during a regular switchover process, you are prepared to handle it way better (or potentially just leave services on the currently primary site) than during emergency failover when you get to the “Oh shit!” moment under the tremendous pressure to get services back.
    Read the rest of this entry . . .

    Let Your Oracle Backups Be Up in the Clouds

    There are a lot of discussions going on in the Internet regarding whether we can trust third parties to look after our data. I am not going to add fuel to the fire. I am going to show you how simple it is to backup an Oracle database to the Storage Cloud using the Oracle Secure Backup (OSB) Cloud Module.

    First, you need to sign up for an Amazon S3 account and get an Access Key ID and a Secret Access Key. Check AWS pricing first.

    Next, download the OSB Cloud Module and unzip it. Read the rest of this entry . . .

    Taste test: Innobackup vs. Xtrabackup

    Firstly, I have to thank my co-workers Singer Wang and Gerry Narvaja for doing a lot of the work that resulted in this comparison.

    After running both InnoDB Hot Backup and Xtrabackup, we have found that there is a measurable but not large difference between the resources that Xtrabackup and InnoDB Hot Backup consume.

    Xtrabackup:

    • Free
    • takes 1.1% longer (2 min during a 3 hour backup)
    • uses 1.4% more space (1G more in a 70G backup — this was for uncompressed backups)
    • uses 1.115% more cpu overall
    • split as 0.12% user, 0.66% nice, 0.025% system, 0.31% more iowait, 0% more steal
    • Read the rest of this entry . . .

    Easier SQL Server Database Restores

    Have you ever been asked to restore a database and wondered which backup files were available? And if many were available, which ones you should use, and in which order?

    Getting familiar with the msdb schema, especially the backupset and backupmediafamily tables, helps to answer that question since all the backup history is stored in that database. But you would still have to check if the backups are still available on disk (tape?) and figure out what are the commands to restore, and the order in which to restore each file.

    Okay, this is not a difficult thing to do, but when you do it over and over again, it becomes tedious, and the automation bug in you starts to look for a better and quicker way to handle it. The bug in me found the following answer.

    Read the rest of this entry . . .

    Concerns and What Does Not Work in XtraDB Backup

    A short time ago I posted how I was Using XtraDB Backup to backup InnoDB. Overall, the blog post was positive, but experiences that others have had (and commented to on that blog post) have made me want to put out another short article about using XtraDB backup.

    The first few points remain the same — the backup process is stable, we were able to use the binaries without compiling, and using Innobackupex as the wrapper script, analogous to Innobackup.pl.

    However, we did figure out why Xtrabackup had to be run as the mysql user:

    Xtrabackup writes to the data dictionary file (ibdata1, for example). We have not examined if it also writes to the data and index files (still ibdata1 by default, or the .ibd files when using innodb_file_per_table). [EDIT: The authors of Xtrabackup have commented below as to why the write occurs:

    xtrabackup is kind of very small and restricted instance of InnoDB, and reuses a lot of InnoDB code.

    InnoDB by default requires O_RDWR option on ibdata1 files at start, and xtrabackup therefore also did that. In the latest push to trunk it was fixed, now xtrabackup opens file with O_RDONLY flag.

    When the new version is released, we will be sure to test it so that we can run the backup as a read-only user, and report back.]

    On the one hand, Xtrabackup is a free tool. On the other hand, modifying InnoDB’s underlying files risks corrupting all the InnoDB tables in the system. Which is a tricky situation when it is your backup tool that might cause corruption that is beyond repair, as you do not know if you can trust your backups.
    Read the rest of this entry . . .

    Log Buffer #150

    This is the 150th edition of Log Buffer, the weekly review of database blogs. Someone accidentally left Dave Edwards’ cage unlocked, and he escaped, thus leaving me with the pleasurable duty of compiling the 150th weekly Log Buffer.

    Many people other than Dave are finding release this week. Read the rest of this entry . . .

    Using Xtradb Backup

    As Baron points out, Percona’s Xtrabackup tool can be used just like InnoDB Hot Backup.

    Many are wondering, “is it good enough?” In fact, I wondered the same thing, and after a few weeks of using and testing Xtrabackup (on machines that have MyISAM and InnoDB tables), I can say:

    0) We have not run into any problems with backing up InnoDB tables; on the machines we’ve run it on (RHEL 5.3 and Debian) it has been stable.

    1) We were able to use one of the binaries provided by Percona — we did not have to compile anything. Binaries are provided for 64-bit versions of Linux

    2) Innobackupex is the script analogous to Innobackup. Basically, this is a wrapper script for Xtrabackup that will copy the non-InnoDB tables.

    3) Usually we run InnoDB Hot Backup from a non-privileged user. We had to run Xtrabackup as the mysql user; we think this is because it writes a state file in the datadir, but we did not dig too hard — once we saw that running it from the mysql user worked, we just left it like that.

    4) Xtrabackup does not have a –compress option like InnoDB hot backup does. Having the option would be useful, but I believe this option will come soon.

    Currently, for clients who want a hot backup solution but do not want to pay for InnoDB Hot Backup, Pythian recommends Xtrabackup.

    Start NowWith Pythian - database design, management and emergency handling capabilities...

    Live Updates

    pythian: RT @pythianfielding: My #ukoug2011 #Exadata IORM presentation starts in a few mins in hall 7A
    more



    Testimonials

    • Serge Racine

      DBA, Brookfield Energy

      We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more