DECIMAL columns and a non-expected replication error
A few weeks ago one of our customers reported a slave sql thread stopped with the following error: "Column 3 of table X cannot be converted from type decimal(0,?) to type decimal(4,3)" on one out of 3 slaves. The other two were already beyond the offending event and no errors were raised. This was not just another replication issue. In this opportunity I wanted to share another "complex" replication error I encountered and how it was resolved. As it says above, we have one out of 3 slaves reporting that a decimal value could not be converted to ... decimal?. Allow me to give you some more context to understand the problem.
Case details
The MySQL server version was 5.5 on all nodes. MySQL was showing that tables were using DECIMAL(4,3) on both master and slave, using any of the common methods (DESC, SHOW CREATE TABLE, information_schema.COLUMNS,
etc) but it was clear that something was different between them. After reviewing a related internal ticket, I found out that the tables on that slave were recently imported from a logical backup to enable file-per-table and that the error arose right after resuming replication. So,
IF the data set was born before 5.0,
AND the server was in-place upgraded
AND the slave was seeded from a master's physical backup
AND the affected table was never recreated 'from scratch' on the master
THEN the table on the master could still be using the old DECIMAL data type (MYSQL_TYPE_DECIMAL).
As a side note, and in case you are wondering, NO, altering/repairing a table is not enough to start using any new data types. MySQL will keep the same data types to avoid errors when copying the rows from the old to the new table as part of the ALTER process. Also, creating a new table using CREATE TABLE ... LIKE
will make the new table to use the exact same data types as the source table.
Fixing the issue
There are two things that needs to be corrected in this scenario: first we need to make DECIMAL format on the master and the slave to match by recreating the table on the former and second, we need to resume replication. Resuming replication If the affected table is not heavily written, and the offending event is simple, you could resume replication by executing the event's statements manually on the slave to then skip the event. If the events are huge and the table is being modified often (which was my case), perhaps a good solution is to ignore that table and then restore it from the master once it was recreated on the master. Settingslave_type_conversions
won't do the trick as converting from old decimal to new decimal is not a supported
attribute promotion. Also, to completely fix the problem, it's a good idea to find which other tables could fall into the same problem by checking for other tables on the master using DECIMAL columns that are still using the old format. That information is stored on the table's frm file and could be exposed using
dbsake's frmdump. [code lang="bash"] root@mypc:~# ./dbsake frmdump -t /var/lib/mysql/frm/tab1.frm -- -- Table structure for table `tab1` -- Created with MySQL Version 5.5.47 -- CREATE TABLE `tab1` ( `col1` decimal(5,2) DEFAULT NULL /* MYSQL_TYPE_NEWDECIMAL */, `tinyin` tinyint(4) DEFAULT NULL /* MYSQL_TYPE_TINY */, `bigin` bigint(20) DEFAULT NULL /* MYSQL_TYPE_LONGLONG */ ) ENGINE=InnoDB DEFAULT CHARSET=latin1; root@mypc:~# ./dbsake frmdump -t /var/lib/mysql/frm/tab1old.frm -- -- Table structure for table `tab1old` -- Created with MySQL Version 5.5.47 -- CREATE TABLE `tab1old` ( `col1` decimal(5,2) DEFAULT NULL /* MYSQL_TYPE_DECIMAL */, `tinyin` tinyint(4) DEFAULT NULL /* MYSQL_TYPE_TINY */, `bigin` bigint(20) DEFAULT NULL /* MYSQL_TYPE_LONGLONG */ ) ENGINE=InnoDB DEFAULT CHARSET=latin1; [/code]
Recreating the table on the master To recreate the table, we can use
pt-online-schema change which will not only create the table from scratch but will also perform this task automatically and with minimum locking. A dummy alter like ENGINE=<current engine> will be enough unless..
One last problem...
You might run into another issue related to the old DECIMAL format while recreating the table using pt-online-schema-change: the old format allocates one more character than the precision specified to store the value's sign. This fact allows you to store positive numbers one digit longer from what's specified for the column precision. For example, if the precision is (5,2), you will be able to store numbers with a four digit integer part (ie. 4321,12). This is no longer true forMYSQL_TYPE_NEWDECIMAL
and you might probably receive "Out of range" errors from pt-online-schema-change while copying the data to the new table. Depending of the version of the tool, you might need to execute it with PTDEBUG=1 to see the error.
....... # pt_online_schema_change:10337 2996 SHOW WARNINGS # Retry:3671 2996 Try code failed: Copying rows caused a MySQL error 1264: # Level: Warning # Code: 1264 # Message: Out of range value for column 'column1' at row 5 # Query: INSERT LOW_PRIORITY IGNORE INTO ... .......To workaround this, you could extend the column precision by one digit to allow any larger values to be stored on the new table. For example, if your current column precision is (4,3), you could change it to (5,3) using pt-online-schema-change, fixing both problems with a single execution.
Final words
Perhaps this issue is one of many good reasons why MySQL always recommends logical over in-place upgrades. Any new data types will be used only when the table is created from scratch. Regarding fixing the problem, of course you could always reseed the slave from a physical backup of the master and forget all this ever happened, but if you don't want to loose any work made on the slave or if the slave cannot be easily reseeded, then the above may be the best method to achieve resolution.Share this
You May Also Like
These Related Stories
MySQL 5.7 Multi-threads replication operation tips
MySQL 5.7 Multi-threads replication operation tips
Feb 16, 2018
2
min read
Merge Replication Identity Range Management - An Identity Disorder? Part 2
Merge Replication Identity Range Management - An Identity Disorder? Part 2
Feb 4, 2014
5
min read
Mystery Solved: Replication lag in InnoDB
Mystery Solved: Replication lag in InnoDB
Jun 19, 2012
4
min read
No Comments Yet
Let us know what you think