MySQL Backup Concepts for (Linux) System Administrators – Part 1
Apr 7, 2011 / By James McOrmond
Hello, My name is James McOrmond. I am a MySQL DBA (and part time SysAdmin) at the ‘The Pythian Group’ headquarters here in Ottawa.
In my previous employment I was the primary System Administrator for a local Linux OS software company (for over 8 years). Server backups were of course something I was doing as part of my job, but i’ve learned that the backups I was doing for the MySQL databases were probably not what a DBA would have set up.
In this blog post, I will cover 3 basic types of MySQL backups for stand-alone database systems. I will not be covering fancy GUI applications, or really complicated processes – just the basic concepts – which is what I think System Administrators (ie – non DBA’s) need to know to have a good/valid backup.
This is the most basic form of MySQL backup – and likely the one most system administrators have used in the past.
This program reads data from mysqld, and ‘dumps’ clear text to stdout which you can redirect to a file anywhere on the system, or directly to a file. The text output is actually a sequence of SQL “INSERT” commands which would be used to reload the data. Commonly I see the filenames containing portions of the current date, but you can do whatever you want based on your overall backup retention policies.
Various arguments can be given to mysqldump to include all databases on the system, just a subset of databases, or even limiting to specific tables in one database. If you’re looking for “quick and dirty” – the ‘–all-databases’ argument will get you a dump of all the databases on the system. If you have stored procedures or events, you’ll also want to make sure these are part of your dumps.
mysqldump --all-databases | gzip - >/mnt/backups/mysqlbackup-`date +%Y-%m-%d_%H.%M.%S`.gz
If your databases are large, it’s probably advisable to at least export each database schema into a different file. This would allow you to have different retention schedules for each database. It also makes restoring a single database (or table) easier – if for example someone deletes a single table. This of course never happens in the ‘real world’ right? :-)
While using mysqldump is definitely the “easiest” way to perform a backup, it also has a significant issue that make many serious DBA’s shutter if these are the only backups available.
The main problem of course relates to making sure the data is consistent across all tables within the same backup. If you don’t, one table may reference entries in another table that may have been modified (or even no longer exists) between the start and end of the backup process.
The most convenient way to enforce this is by using the ‘–lock-all-tables’ argument. “Unfortunately” this does exactly what it’s supposed to do – it puts a read only lock all tables (in all databases). That means no other process can modify data in the database until the backup is done. If your database has mostly reads – or if you can put your application into a read only or maintenance mode, this may be fine, but otherwise you will probably eventually notice several write statements in locked state – waiting for the backup to finish.
If your backup takes hours, having a read lock on the data may be quite inconvenient, and you should investigate another backup method.
2) File System Snapshots
File System snapshots (as the name suggests), happen at the File System layer, not within the database itself. Depending on your environment, you may already be doing snapshots to backup other parts of your environment.
If you are using a SAN for your storage, snapshotting is likely a built in feature. If not – you can use LVM snapshots at the OS level, as long as you have created the necessary logical volumes of the correct areas of your system.
As with mysqldump, you need to make sure the data is consistent in your backup, or else it may not be very useful to you when you try to do a restore, at 3am on a Saturday morning, with a customer on the phone ‘asking’ you when everything will be working again :-)
There are two kinds of flows typical with this type of backup:
perform a 'FLUSH TABLES" perform a 'FLUSH TABLES WITH READ LOCK" create the snapshot perform an 'UNLOCK TABLES'
At this point, you’re safe to mount the snapshot to a new location, and then copy the database files somewhere else – possibly using whatever method you’re using to backup the rest of the system (tar, rsync or a tool like bacula).
The interruption to the active database is fairly minimal here – only between the “FLUSH TABLES WITH READ LOCK”, and the “UNLOCK TABLES” commands. Performing the first “FLUSH TABLES” ahead of time makes sure the interruption is minimal.
Unfortunately, while this sounds like the perfect solution, some storage engines like InnoDB may essentially ignore the “FLUSH TABLES WITH READ LOCK”, so data may not be perfectly consistent on disk when the snapshot happens. It will be closer than just doing a mysqldump, but still not perfect.
All is not lost, as mysqld will actually do a “crash recovery” based on the various log files available to clean up the database as part of the startup process.
The best and easiest way to be absolutely sure the data is consistent for the snapshot, is to actually shut down the mysqld process entirely while the snapshot is being performed.
perform a 'FLUSH TABLES" shutdown the database create the snapshot restart the database
The outage in this case will be complete – no reads or writes will be possible at all. The duration of the outage will be quite short so this outage may be acceptable. It’s entirely possible it will be less than a minute from the database shutdown, to the restart.
Your application should still be put into a maintenance mode if possible, but it should be scheduled during an off time, and you may decide it is short enough that end users won’t notice.
xtrabackup is an open source tool available from Percona. Among other things, it is capable of performing backups of InnoDB databases – while the database is still running.
Percona provides yum/apt repositories for Red Hat/CentOS/Fedora, Debian and Ubuntu, as well a generic .tar.gz for everyone else. You should be able to get it installed on your chosen distribution fairly easily. The package will contain many different files most notably the xtrabackup binary itself, as well as the innobackupex-1.5.1 wrapper script.
This innobackupex wrapper script will save you a lot of time when you’re trying to get your backups done. In it’s simplest mode, it will take a single argument – the backup destination directory, and then it will do a fairly complete backup into that directory. If your my.cnf is not in /etc/ then you should use the –defaults-file argument.
Typically you’ll want these backups run through a program like tar, so the –stream option is your friend. This also allows you to compress the backups using a program like gzip, which is of course always a good idea.
innobackupex-1.5.1 --stream=tar /mnt/backups | gzip - > mysqlbackup.`date +%Y-%m-%d_%H.%M.%S`.tar.gz
To really have the backup at the fully consistent level we want, xtrabackup needs to be done in multiple stages. The first stage simply does the backup into a directory (which it will name based on the date), and the second one re-runs innobackupex with the –apply-log argument so it applies the updates. If you wish, you can also then create a compressed tarball of this backup directory.
These additional stages however don’t need to be done at the time of your backups, they can be done at the time of the restore which is likely far less often. It all depends on when you want your system spending the CPU cycles.
This blog is by no means a fully exhaustive list of the tools available, but they essentially cover one example of the 3 basic types – logical, cold and hot.
It is entirely valid and common to use more than one backup type. Often I have seen a logical backup along with one of the other two backup types.
Care should be taken with the security of your backups of course, and please remember that if you specific user name and password on the command line, that they can be viewed by any user logged into the system that can run commands like “ps” or “top”.
If you are using multiple servers in a master/slave configuration, command line arguments also exist in the different tools for recording relevant binlog positions and those definitely should be used.
…But that is a topic for another time.
Some relevant links
mysqldump documentation on mysql.com – http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
Linux LVM HOWTO – http://www.tldp.org/HOWTO/LVM-HOWTO/
Percona Xtrabackup – http://www.percona.com/docs/wiki/percona-xtrabackup:start