When we want to perform a heavy ALTER TABLE against a several gigabytes table the options are quite a few but the one I tend to use the most is pt-online-schema-change (pt-osc). This tool is very mature and solid and helped me in the past with Galera Cluster, standalone and regular asynchronous MySQL topologies for many years. This time I encountered something that it’s worth keeping in mind when we’re performing a given DDL command in a regular primary-secondary cluster with one requisite: do NOT replicate this ALTER TABLE among the cluster.
The best practice in these situations is to perform any DDL change in the replica, then perform the proper promotion/demotion of servers and continue with the ALTER operations in the new replicas. Now, in this particular case, instance’s promotion was not an option as the replicas had a hard role designed as backup servers so any DDL would have to happen keeping the original roles.
In the case of an ALTER TABLE of this nature to be performed with pt-osc the natural way to go is this:
./pt-online-schema-change --alter="FORCE" --statistics --print --recursion-method=none --execute --set-vars sql_log_bin=OFF
Executing the above command will perform the command without any issue and it won’t replicate any temporary table or trigger creation so the alter will run isolated on the target master.. Right? Well, no.
Whether it’s true that no command like above will be replicated, there is a caveat here: Any query created by the trigger itself will be inserted in the binlog as part of the original event. This is, in case of an INSERT-statement happening in the original table during the pt-osc execution will create a single event that will contain the original statement plus the INSERT-statement on the temporary table identified by _OriginalTableName_new since the variable defined only affects to the active session and not the whole instance.
The problem here is obvious, as soon as that transaction comes into the replica it will break complaining about trying to apply an event in a non existing table (the temporary one as it’s not been replicated thanks to our –set-vars sql_log_bin=OFF) and skipping this event would skip the original DML query too.
Not everything is lost, keep reading…
Let’s perform a full test to prove my words. For this test i have a dummy script just writing data in the background and the pt-osc version is 3.3.1
mysql> show global variables like 'binlog_format'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | binlog_format | ROW | +---------------+-------+ 1 row in set (0.00 sec)
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000003
Position: 166814
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql57-01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 166814
Relay_Log_File: mysql57-02-relay-bin.000008
Relay_Log_Pos: 167027
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
…/…
root@mysql57-01:~# ./pt-online-schema-change --alter="FORCE" --statistics --print --recursion-method=none --execute --set-vars sql_log_bin=OFF h=localhost,D=dummydb,t=dummyt,u=root,p=letmein No slaves found. See --recursion-method if host mysql57-01 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. .../... 2022-06-24T13:22:08 Dropped triggers OK. # Event Count # ================== ===== # INSERT 1 # mysql_warning_1062 1 Successfully altered `dummydb`.`dummyt`.
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql57-01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 1192694
Relay_Log_File: mysql57-02-relay-bin.000008
Relay_Log_Pos: 888957
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: No
…/…
Last_Errno: 1146
Last_Error: Error executing row event: 'Table 'dummydb._dummyt_new' doesn't exist'
…/…
BEGIN /*!*/; # at 889320 #220624 13:22:08 server id 1 end_log_pos 889378 CRC32 0x648d1d8b Table_map: `dummydb`.`dummyt` mapped to number 233 # at 889378 #220624 13:22:08 server id 1 end_log_pos 889441 CRC32 0xa4727bb7 Table_map: `dummydb`.`_dummyt_new` mapped to number 230 # at 889441 #220624 13:22:08 server id 1 end_log_pos 889493 CRC32 0x10a18dfd Write_rows: table id 233 # at 889493 #220624 13:22:08 server id 1 end_log_pos 889545 CRC32 0x3ed50ab7 Write_rows: table id 230 flags: STMT_END_F ### INSERT INTO `dummydb`.`dummyt` ### SET ### @1=51 ### @2='89' ### @3=1656076928 ### @4=338 ### INSERT INTO `dummydb`.`_dummyt_new` ### SET ### @1=51 ### @2='89' ### @3=1656076928 ### @4=338 # at 889545 #220624 13:22:08 server id 1 end_log_pos 889576 CRC32 0x855d1d3f Xid = 6257 COMMIT/*!*/;
root@mysql57-01:~# mysqlbinlog --base64-output=decode-rows --verbose /var/log/mysql/mysql-bin.000003 > /tmp/bin03 root@mysql57-01:~# cat /tmp/bin03 | grep "_new" | wc -l 74
It’s clear that our triggers are writing data that shouldn’t be there. Let’s change the binlog format to STATEMENT:
mysql> show global variables like 'binlog_format'; +---------------+-----------+ | Variable_name | Value | +---------------+-----------+ | binlog_format | STATEMENT | +---------------+-----------+ 1 row in set (0.00 sec)
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000006
Position: 154
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql57-01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 154
Relay_Log_File: mysql57-02-relay-bin.000014
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
…/…
root@mysql57-01:~# ./pt-online-schema-change --alter="FORCE" --statistics --print --recursion-method=none --execute --set-vars sql_log_bin=OFF h=localhost,D=dummydb,t=dummyt,u=root,p=letmein No slaves found. See --recursion-method if host mysql57-01 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. …/… 2022-06-24T13:37:31 Dropped triggers OK. # Event Count # ====== ===== # INSERT 1 Successfully altered `dummydb`.`dummyt`.
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql57-01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 332460
Relay_Log_File: mysql57-02-relay-bin.000014
Relay_Log_Pos: 332673
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
…/…
root@mysql57-01:~# mysqlbinlog --base64-output=decode-rows --verbose /var/log/mysql/mysql-bin.000006 > /tmp/bin06 root@mysql57-01:~# cat /tmp/bin06 | grep "_new" | wc -l 0
If you have the particular need of running pt-osc in such a way that the effects of it must be isolated and affecting only that server, ensure the binlog format is STATEMENT. Otherwise it will end up breaking replication and the solution will be to create all those temporary tables manually and populate them to ensure that any DML command gets properly applied and so the original one too, avoiding any data integrity issue. In any case, keep in mind that the recommended value for binlog format is ROW for consistency reasons as STATEMENT could lead to undeterministic binlog entries that could insert different values in replicas.
Looking to optimize your MySQL use?