gh-ost with Amazon RDS / Aurora
Doing online schema changes in a managed database environment is not as straight-forward as one may think. Using gh-ost with Amazon RDS / Aurora can help you overcome commonly found issues with DDL changes, including:
- Long transactions
- Slave lag
- Temporary tables filling the ephemeral storage
- Lack of information about the progress
- Inability to pause/throttle ongoing changes
Pre-requisitesThe first thing we need to run gh-ost is to make sure binlogs are enabled. The way to do this is set backup retention to at least 1 day in your instance properties. For an already existing instance that means selecting it, then clicking on Instance actions -> Modify, and choosing an appropriate retention period in the drop-down box. The other thing gh-ost needs in order to run is binlog_format set to ROW. This needs to be set in the Parameter group of your instance. If you are using Aurora make sure you are looking at the DB Cluster parameter group instead.
Running gh-ost on the masterThe easiest way is to run gh-ost directly against the master writer endpoint of your instance. gh-ost will attach itself as a virtual slave, and capture changes to data from the binlogs. It will use those changes to keep data in sync as it copies the rows from the source table into the destination table. Assuming you already have the pre-requisites describe above in place, here's how to run gh-ost on the master:
./gh-ost \ … --assume-rbr \ --allow-on-master \ --host=<master writer endpoint> \ --database="testdb" \ --table="mytable" \ --alter=... \ --verbose \ --executeThe --assume-rbr parameter is required to prevent gh-ost from trying to modify the binlog_format. It is not possible to do so directly in RDS/Aurora, as SUPER privilege is not available to end users. The --allow-on-master option is also required, as gh-ost is designed to use a slave by default. Finally, we need to specify the master writer endpoint in --host argument for gh-ost to connect.
Using Read Replicasgh-ost has the option to run against a replica to minimize load on the master. This is actually the recommended mode to run it. RDS read replicas are not a problem since they are similar to traditional MySQL slaves. The caveat here is Aurora standard read replicas share the same underlying storage as the master, so gh-ost will detect it is running on the master, even if you have it connect to the reader/replica endpoint. The workarounds for that include creating a cross-region read replica, which behind the scenes does a full copy of the storage layer to create a new Aurora instance on a different region, and sets up traditional MySQL replication to keep it in sync with the master. You also have the option of manually creating your own slave server on a dedicated ec2 instance or even another Aurora cluster, and manually configuring replication. As before, you need to make sure row-based binlogs are enabled on replica, and that read_only is set to 0 on the replica parameter group.
Migrate on replicaIn this mode, gh-ost will do the table copy and swap directly on the replica. This is useful for a rolling schema change or simply for testing purposes. The host argument points to the replica where we will be doing the actual migration. This can be either a RDS read replica, ec2 instance, Aurora cross-region read replica or standalone Aurora instance manually configured as a slave, as discussed on the previous section. We need to manually specify the master host endpoint using the --assume-master-host option, as gh-ost will detect the master's internal IP instead and time-out trying to connect to it. I need to dig deeper to find out why a connection to the master is required in this case as no operations are performed against it. Here is an example with all the required options:
./gh-ost \ ... --assume-master-host=<master endpoint> \ --assume-rbr \ --host=<replica writer endpoint> \ --migrate-on-replica \ --database="test" \ --table="t1" \ --execute
Test on replicaHere's what gh-ost manual says about the --test-on-replica option: Indicates the migration is for purpose of testing only. Before cut-over takes place, replication is stopped. Tables are swapped and then swapped back: your original table returns to its original place. Both tables are left with replication stopped. You may examine the two and compare data. So from the above, we need to make some tweaks: First, we need to specify --test-on-replica-skip-replica-stop option, as in RDS/Aurora the start/stop slave commands won't work. We also have to use the AWS-provided procedures to handle replication. The way to do this is leverage the hooks functionality of gh-ost as follows: 1. Create a hook file called /tmp/ghost_hooks/gh-ost-on-stop-replication-rds-command.sh and put the following contents:
#!/bin/bash mysql -h$replica_endpoint -e "call mysql.rds_stop_replication"Make sure the file name starts with gh-ost-on-stop-replication prefix as that is the name gh-ost looks for 2. Give execute permissions to the hook file
chmod +x /tmp/ghost_hooks/gh-ost-on-stop-replication-rds-command.sh3. Run gh-ost with --hooks-path=/tmp/ghost_hooks/ option Here's an example with all the required options:
./gh-ost \ ... --assume-master-host=<master endpoint> \ --assume-rbr \ --host=<replica writer endpoint> \ --test-on-replica \ --test-on-replica-skip-replica-stop \ --hooks-path=/tmp/hooks \ --database="test" \ --table="t1" \ --execute