Using gh-ost for online schema changes
Visit the download page at https://github.com/github/gh-ost/releases/latest to get the link for the latest version to use below.
wget https://github.com/github/gh-ost/releases/download/v[VERSION]/gh-ost-binary-linux-[VERSION].tar.gz
tar -xvf gh-ost-binary-linux-[VERSION].tar.gz I tested on bare-bones (Centos7) servers and did not find any other software was required to be installed. Just a few changes to make on the database: Add these permissions: master> grant ALTER, CREATE, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT, TRIGGER, UPDATE on *.* to ghost@'192.168.56%' identified by password 'xxx'; master> grant super, replication slave on *.* to ghost@'192.168.56%'; Add this setting on the node you will run gh-ost on. It's not explicitly needed if running from a replica because gh-ost will set it for you, but note this is required. slave> set global binlog_format=ROW; You do need to set this if running on a master, otherwise you will receive an error message: 2016-09-01 16:07:29 FATAL 192.168.56.68:3306 has STATEMENT binlog_format, but I'm too scared to change it to ROW because it has replicas. Bailing out Notes about the files created: You'll see some tables are created during the process. _t1_ghc is the record of changes, not data change capture _t1_gho is the ghost table that is an initial copy and then receives changes from binlogs _t1_del is the cutover table Unless you specify otherwise, _t1_del will remain after the process is complete, as a backup.Sample statements for various setups are below. A few brief notes on usage:
As mentioned above, the tool is flexible to allow you to run it from different locations. These are listed below.
Drop the --execute flag in the below statements to do a dry run.
Using --exact-rowcount causes a select count(*) and doesn't help with accuracy of output with regard to the percentage complete. In a federated setup, select count(*) can crash the server. I'd avoid this flag unless you've tested it well for your setup. In any case, the percentage complete and estimated time to completion will not be very accurate, even if you do an exact-rowcount. This is especially true of a cluster with heavy writes.1. Running gh-ost on a replica, with changes going to the master and all replicas. This is the desired method to avoid load on the master.
The tool will detect the master from a “show slave status” statement, which causes obvious issues with non-native replication such as Tungsten. See more on this at https://github.com/github/gh-ost/issues/212. ./gh-ost \ --max-load=Threads_running=25 \ --critical-load=Threads_running=1000 \ --chunk-size=1000 \ --throttle-control-replicas="192.168.56.144" \ --max-lag-millis=1500 \ --user="ghost" \ --password="ghost" \ --host=192.168.56.144 \ --database="mysqlslap" \ --table="t1" \ --verbose \ --alter="add column whatever3 varchar(50)" \ --switch-to-rbr \ --allow-master-master \ --cut-over=default \ --default-retries=120 \ --panic-flag-file=/tmp/ghost.panic.flag \ --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \ --initially-drop-ghost-table \ --execute2. Test on replica, without affecting master or other replicas
See more at: https://github.com/github/gh-ost/blob/master/doc/testing-on-replica.md ./gh-ost \ --test-on-replica \ --max-load=Threads_running=25 \ --critical-load=Threads_running=1000 \ --chunk-size=1000 \ --throttle-control-replicas="192.168.56.20" \ --max-lag-millis=1500 \ --user="ghost" \ --password="ghost" \ --host=192.168.56.20 \ --database="mysqlslap" \ --table="t1" \ --verbose \ --alter="add column charcol6 varchar(25)" \ --switch-to-rbr \ --cut-over=default \ --default-retries=120 \ --panic-flag-file=/tmp/ghost.panic.flag \ --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \ --execute3. Migrate on the replica only (perhaps used with "shell game" method)
./gh-ost \ --migrate-on-replica \ --max-load=Threads_running=25 \ --critical-load=Threads_running=1000 \ --chunk-size=1000 \ --throttle-control-replicas="192.168.56.20" \ --max-lag-millis=1500 \ --user="ghost" \ --password="ghost" \ --host=192.168.56.20 \ --database="mysqlslap" \ --table="t1" \ --verbose \ --alter="add column charcol6 varchar(25)" \ --switch-to-rbr \ --cut-over=default \ --default-retries=120 \ --panic-flag-file=/tmp/ghost.panic.flag \ --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \ --execute4. Run from master, which replicates down to slaves as configured within mysql
./gh-ost \
--allow-on-master \ --max-load=Threads_running=25 \ --critical-load=Threads_running=1000 \ --chunk-size=1000 \ --throttle-control-replicas="192.168.56.144" \ --max-lag-millis=1500 \ --user="ghost" \ --password="ghost" \ --host=192.168.56.145 \ --database="mysqlslap" \ --table="t1" \ --verbose \ --alter="add column whatever6 varchar(50)" \ --cut-over=default \ --default-retries=120 \ --switch-to-rbr \ --panic-flag-file=/tmp/ghost.panic.flag \ --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \ --executeSample output:
Here is some sample status output. You can see the automatic throttling when lag exceeds my setting of –max-lag-millis. Adding the --debug flag gives even more output.
Copy: 8000/11966 66.9%; Applied: 0; Backlog: 0/100; Time: 28s(total), 28s(copy); streamer: mysqld-bin.000075:73975014; ETA: 13s Copy: 8000/11966 66.9%; Applied: 0; Backlog: 0/100; Time: 29s(total), 29s(copy); streamer: mysqld-bin.000075:73975014; ETA: 14s Copy: 8000/11966 66.9%; Applied: 0; Backlog: 0/100; Time: 30s(total), 30s(copy); streamer: mysqld-bin.000075:73975014; ETA: throttled, 192.168.56.20:3306 replica-lag=2.000000s Copy: 9000/11966 75.2%; Applied: 0; Backlog: 0/100; Time: 31s(total), 31s(copy); streamer: mysqld-bin.000075:84539586; ETA: throttled, lag=2.070653s Copy: 9000/11966 75.2%; Applied: 0; Backlog: 0/100; Time: 32s(total), 32s(copy); streamer: mysqld-bin.000075:84539586; ETA: 10s Copy: 9000/11966 75.2%; Applied: 0; Backlog: 0/100; Time: 33s(total), 33s(copy); streamer: mysqld-bin.000075:84539586; ETA: throttled, 192.168.56.20:3306 replica-lag=2.000000s Copy: 9000/11966 75.2%; Applied: 0; Backlog: 0/100; Time: 34s(total), 34s(copy); streamer: mysqld-bin.000075:95104741; ETA: throttled, lag=2.070911s Copy: 10000/11966 83.6%; Applied: 0; Backlog: 0/100; Time: 35s(total), 35s(copy); streamer: mysqld-bin.000075:95106041; ETA: 6s Copy: 10000/11966 83.6%; Applied: 0; Backlog: 0/100; Time: 36s(total), 36s(copy); streamer: mysqld-bin.000075:95106041; ETA: 7s Copy: 10000/11966 83.6%; Applied: 0; Backlog: 0/100; Time: 37s(total), 37s(copy); streamer: mysqld-bin.000075:95106041; ETA: throttled, 192.168.56.20:3306 replica-lag=2.000000s The "downtime" for this tool is during the rename. Here is sample output during a heavy read and write load. The timing is similar to other online schema change tools. 2016-08-22 11:56:06 INFO Lock & rename duration: 3.02072298s. During this time, queries on `sbtest5` were blockedOperational controls:
In addition to not using triggers (and the related load), a great feature of gh-ost is operational controls. You can change the configuration while the process is running, and throttle it (that is, pause the process and avoid any load).
Run operational commands against the gh-ost*.sock file created in the /tmp directory. The file is named according to the change you're making (in cases of multiple concurrent changes). Manually throttle (pause) the process via the command line interface: echo throttle | nc -U /tmp/gh-ost.mysqlslap.t1.sock Restart a throttled process: echo no-throttle | nc -U /tmp/gh-ost.mysqlslap.t1.sock Example for changing configuration while process is running: echo "chunk-size=250" | nc -U /tmp/gh-ost.mysqlslap.t1.sock View the status of the process (could be used, for example, if you've started the process in screen, without logging back into screen). "sup" is brief and "status" is detailed. echo sup | nc -U /tmp/gh-ost.mysqlslap.t1.sock echo status | nc -U /tmp/gh-ost.mysqlslap.t1.sockOn this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
gh-ost with Amazon RDS / Aurora
gh-ost with Amazon RDS / Aurora
Jan 30, 2018 12:00:00 AM
4
min read
pt-online -schema corner case: When you don’t want to replicate your alter activities across the whole cluster.
pt-online -schema corner case: When you don’t want to replicate your alter activities across the whole cluster.
Nov 21, 2022 12:00:00 AM
5
min read
gh-ost hooks for better visibility
gh-ost hooks for better visibility
Oct 24, 2017 12:00:00 AM
5
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.