Replicating in Google Cloud SQL using Tungsten

While investigating alternatives to migrate to Google Cloud SQL, I encountered a lack of support for external masters. However, it's possible to overcome this limitation by replicating into Google Cloud SQL using Tungsten replicator. Cloud SQL is Google's database-as-a-service solution, similar to RDS for Amazon Web Services. You can get a fully managed database in only a few clicks (or API calls). At the time of writing this, the only supported databases are MySQL and Postgres.
Cloud SQL alternatives
Google offers two different options for MySQL deployments. 1st generation instances:- Only MySQL versions 5.5 and 5.6 can provisioned
- Max memory is limited to 16 Gb
- Max of 250 Gb storage (up to 500 Gb with Silver or higher support package)
- MyISAM and InnoDB
- Asynchronous replication for read replicas
- Only MySQL versions 5.6 and 5.7 can be deployed
- Maximum memory is limited to 205 Gb
- Maximum of 10 Tb storage
- InnoDB storage engine only
- Semi-sync replication only
- GTID replication only
- No support for external master
- no SUPER privilege
- no triggers
- no performance schema
- no replication between 1st and 2nd generation instances is possible
The solution
The answer is to use Tungsten Replicator, so that replication is completely external to the database. Note that since we only need the replicator, the FOSS version available on Github is enough for our purposes. There is no need to buy Tungsten commercial version, which includes the cluster functionality and official support. We will need to install two different Tungsten processes: One will attach to the source database to read transactions from the binary logs, while the second will apply those transactions to the Cloud SQL instance. This is what it looks like:
Preparing the environment
The first thing you will need is a place to install the replicator. I suggest to provision a dedicated instance (instance-1 in the diagram) in the same zone as your Cloud SQL instance. At minimum you would want an n1-standard-1 size, as Tungsten does consume its fair share of memory. Tungsten needs some packages (do check out the complete requirements list here) so let's go ahead and install them: apt-get install ruby default-jre
Now we need a database user for Tungsten on the source and target databases:
GRANT ALL ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
It is usually a good idea to have a dedicated OS user as well:
useradd -m -d /opt/continuent tungsten
Get the replicator package from GitHub and extract the contents as tungsten OS user:
wget https://github.com/continuent/tungsten-replicator/releases/download/5.2.1/tungsten-replicator-5.2.1.tar.gz tar zxf tungsten-replicator-5.2.1.tar.gz
Tungsten installation
Prepare the Tungsten config files on instance-1 as follows. instance-1 is the server where Tungsten will run from, and instance-2 is the server that has the source database. Process that will read from MySQLvi /etc/tungsten/tungsten-mysqlreader.ini
[defaults] replication-user=tungsten replication-password=secret skip-validation-check=MySQLUnsupportedDataTypesCheck skip-validation-check=MySQLPermissionsCheck skip-validation-check=MySQLMyISAMCheck [mysqlreader] install-directory=/opt/continuent/mysqlreader master=instance-1 members=instance-1 datasource-host=instance-2 datasource-user=tungsten datasource-password=secretProcess that will write to Cloud SQL
vi /etc/tungsten/tungsten-writetocloudsql.ini
[defaults] replication-user=tungsten replication-password=secret [writetocloudsql] datasource-type=mysql install-directory=/opt/continuent/writetocloudsql master=instance-1 members=instance-1 topology=master-slave datasource-host=cloudsql_ip_address datasource-user=tungsten datasource-password=secret privileged-slave=false skip-validation-check=InstallerMasterSlaveCheck skip-validation-check=MySQLPermissionsCheck skip-validation-check=MySQLBinaryLogsEnabledCheck rmi-port=10002 master-thl-port=2112 master-thl-host=instance-1 thl-port=2113Note the use of privileged-slave=false and the various validation checks that need to be skipped for the applier process. That allows us to get past the SUPER requirement and the other Cloud SQL limitations. Since we are running both extractor and applier processes on the same instance, we need to manually specify the ports on the second process so that there are no conflicts. Now we are ready to install the replicators, by running the following as tungsten OS user:
cd tungsten-replicator-5.2.1/tools tpm install
Replicating into Google Cloud SQL using Tungsten
At this point you would start the reader process using trepctl online command to start capturing events from the source instance, and use something like mysqldump to dump & load your dataset into a Google Cloud SQL instance. Make sure you have the binlog coordinates of that dump available. Once the Cloud SQL instance is properly seeded, we would start the applier process from the appropriate position e.g. /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl online -from-event 'mysql-bin.000011:0000000000002552;0'
Checking the status of each process:
tungsten@instance-1:~$ /opt/continuent/mysqlreader/tungsten/tungsten-replicator/bin/trepctl status Processing status command... NAME VALUE ---- ----- appliedLastEventId : mysql-bin.000007:0000000000000520;112 appliedLastSeqno : 1 appliedLatency : 0.355 autoRecoveryEnabled : false autoRecoveryTotal : 0 channels : 1 clusterName : mysqlreader currentEventId : mysql-bin.000007:0000000000000520 currentTimeMillis : 1511354659857 dataServerHost : instance-2 extensions : host : instance-2 latestEpochNumber : 0 masterConnectUri : thl://localhost:/ masterListenUri : thl://instance-1:2112/ maximumStoredSeqNo : 1 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE ndingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : jdbc:mysql:thin://instance-2:3306/tungsten_mysqlreader?noPrepStmtCache=true relativeLatency : 3.857 resourcePrecedence : 99 rmiPort : 10000 role : master seqnoType : java.lang.Long serviceName : mysqlreader serviceType : local simpleServiceName : mysqlreader siteName : default sourceId : instance-2 state : ONLINE timeInStateSeconds : 72.806 timezone : GMT transitioningTo : uptimeSeconds : 74.46 useSSLConnection : false version : Tungsten Replicator 5.2.1 Finished status command...
tungsten@instance-1:/etc/tungsten$ /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl status Processing status command... NAME VALUE ---- ----- appliedLastEventId : NONE appliedLastSeqno : -1 appliedLatency : -1.0 autoRecoveryEnabled : false autoRecoveryTotal : 0 channels : -1 clusterName : writetocloudsql currentEventId : NONE currentTimeMillis : 1510939133227 dataServerHost : cloudsql extensions : host : cloudsql latestEpochNumber : -1 masterConnectUri : thl://localhost:/ masterListenUri : thl://instance-1:2113/ maximumStoredSeqNo : -1 minimumStoredSeqNo : -1 offlineRequests : NONE pendingError : Replicator configuration failed pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: Unable to translate property value: key=serverId value = 3555962359 pipelineSource : UNKNOWN relativeLatency : -1.0 resourcePrecedence : 99 rmiPort : 10002 role : master seqnoType : java.lang.Long serviceName : writetocloudsql serviceType : unknown simpleServiceName : writetocloudsql siteName : default sourceId : 35.184.133.21 state : OFFLINE:ERROR timeInStateSeconds : 431.65 timezone : GMT transitioningTo : uptimeSeconds : 433.596 useSSLConnection : false version : Tungsten Replicator 5.2.1 Finished status command...
I've discovered there is a bug with Tungsten 5.2, where high values of server-id parameter prevent replicator from working. CloudSQL sets very high server-ids by default, and this cannot be modified by a user. I have already reported this to Continuent so hopefully they will come up with a way to fix this soon. In the meantime the only way to get past this is to open a ticket with Google support, and have them modify the server-id on Cloud SQL instance for you.