Multi Tb migration using mydumper
In this post I will explain how to transfer a multi terabyte size database between two MySQL instances using mydumper, which is a logical backup and restore tool that works in parallel. I will also cover the case where you need to transfer only a subset of data. Big tables are often problematic because of the amount of time needed to do mysqldump/restore which is single threaded. It is not uncommon for this type of process to take several days. From MySQL 5.6 and on, we also have transportable tablespaces, but there are some limitations with that approach. Physical backups (e.g. xtrabackup) have the advantage to be faster, but there are some scenarios where you still need/want a logical backup (e.g migrating to RDS/Aurora/Google Cloud SQL).
Install mydumper/myloaderThe first thing you'll need to do is get mydumper installed. Latest version at the time of this writing is 0.9.1 and you can get rpm here . You can also manually compile as follows (the instructions are for RHEL-based system):
wget https://launchpad.net/mydumper/0.9/0.9.1/+download/mydumper-0.9.1.tar.gz tar zxvf mydumper-0.6.2.tar.gzYou can also get the source from GitHub:
git clone https://github.com/maxbube/mydumper.gitHere's how to compile mydumper. This will put the files on /usr/local/bin/
sudo yum install glib2-devel mysql-devel zlib-devel pcre-devel openssl-devel cmake gcc-c++ cmake . make make install
Export data using mydumperBy default, mydumper will work by creating several parallel processes (rule of thumb is 1 export process per core) that will read one table each and write table contents to one file for each table. In this case, I'm exporting City and State tables:
./mydumper -t 8 -B world_innodb -T City,StateThis is an improvement over traditional mysqldump, but we can still do better. mydumper can split each table into chunks (e.g. 100k rows) and write each chunk to a separate file, allowing to parallelize the import later on. I have omitted the -T argument here, so all tables from world_innodb will be exported.
./mydumper -t 8 --rows=100000 -B world_innodbIf exporting Innodb tables only, it makes sense to use --trx-consistency-only so the tool uses less locking. You will still get the binlog file/pos needed to seed a slave.
./mydumper -t 8 --rows=100000 --trx-consistency-only -B world_innodbYou can also specify a regular expression to export only some databases, let's say db1 and db2.
./mydumper -t 8 --rows=100000 --trx-consistency-only --regex '^(db1|db2)' -B world_innodbOther options include the ability to compress the exported data on the fly, and also export triggers, code and events. Finally, I also recommend the use of --verbose option for added visibility into what each thread is doing. Here is an example of the complete command:
./mydumper -t 8 \ --rows=100000 \ --regex '^(db1|db2)' \ --compress \ --triggers \ --routines \ --events \ -v 3 \ --compress \ --trx-consistency-only \ -B world_innodb \ --outputdir /data/export \ --logfile /data/mydumper.logWhile running multiple threads, I noticed some contention related to adaptive hash index (this is on 5.5, I haven't tested if this happens on other versions as well). Disabling AHI can have an impact on read queries, so if you can afford having the host out of production while the export is running, I recommend to disable AHI temporarily. It is probably a good idea to run the export on a slave, as you will be hitting it hard with reads. Also if you keep the slave sql thread stopped, export will be much faster. I got up to 400% reduction in export time.
Import data using myloaderThe load phase is the most painful part, usually taking way longer than the time it took to export data. If you run mydumper using the --rows option as described above, several myloader threads can insert concurrently on the same table, speeding up the process. Otherwise, you only get multiple tables imported in parallel, which is helpful but reduces the benefits if you have a handful of huge tables and mostly small tables. Keep in mind though, when using a single thread import rows can be inserted in primary key ascending order, which optimizes disk space. Running multiple insert threads on a single table will cause row distribution to be less optimal, potentially using significantly more disk space. Other potential way to reduce the import time is temporarily relax consistency by setting innodb_flush_log_at_trx_commit=0 sync_binlog=0. Also set query_cache_type=0 AND query_cache_size=0 to prevent the query cache mutex from being used. You can control myloader transaction size with queries-per-transaction parameter. Using the default value (1000) produced really big transactions, I had better results by reducing this to 100. The -o option will drop the tables on the destination database if they already exist. Here is an example of the command (rule of thumb in this case is have import_threads = cores / 2):
myloader --threads=4 -d /data/export/ -B db1 -q 100 -o -v 3NOTE: myloader works by setting sql-log-bin=0 for the import session by default, so make sure to override that (option is -e) if you have any slaves down the chain.