Restructuring your database environments can be a high-stakes operation. Recently, I’ve encountered several reports of DDL activity causing MySQL crash scenarios. These range from dropping multiple databases in rapid succession to the more complex case of large-scale table migrations during version upgrades.
In a recent upgrade to MySQL 5.7, we encountered a significant failure while running mysql_upgrade. The process triggered an ALTER TABLE FORCE on a 2.2Tb table to convert it to the new data format supporting microsecond precision.
The issue appeared at a critical juncture: the intermediate table had been completely filled with data, and MySQL was attempting to swap the existing table for the new one. Instead of a smooth transition, the system stalled and eventually crashed.
A review of the InnoDB monitor output in the error log revealed a fatal bottleneck. The diagnostic info highlighted three specific warnings:
[ERROR] InnoDB: The age of the last checkpoint is 379157140, which exceeds the log group capacity 377483674dict0dict.cc).[ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server...The key takeaway from the log is that a purge event (row0purge.cc) on the data dictionary (dict0dict.cc) created a semaphore that lasted longer than the 600-second threshold. When InnoDB detects a semaphore wait exceeding 10 minutes, it assumes a hang and intentionally crashes the server to prevent data corruption or further inconsistency.
Furthermore, the checkpoint age warning indicated that redo log accumulation was exceeding InnoDB’s ability to flush changes to persistent storage. This created a massive I/O bottleneck during the table swap phase.
To resolve this and ensure the ALTER TABLE could complete successfully, we implemented three major changes:
We increased the redo log space to 8G. Initial testing indicated that an ALTER TABLE FORCE on a 2.2Tb table writes approximately that much data to the redo log. Providing this "breathing room" prevents the checkpoint age from exceeding log group capacity.
We increased both the overall system memory and the InnoDB Buffer Pool size. This allowed for faster data processing and reduced the time required for the intermediate table to be populated and swapped.
The most critical change was adjusting the innodb_purge_threads variable. While MySQL 5.7 defaults this to 4, we reduced it to 1.
Expert Tip: According to the MySQL Reference Guide, you should keep this setting low to prevent threads from contending with each other for access to busy tables.
Reducing the threads eliminated the contention that caused the 600-second semaphore wait on the data dictionary, allowing the table swap to complete without triggering a fatal error.
The takeaway from this is simply to suggest that if you are running MySQL 5.7, you will want to consider changing the variable innodb_purge_threads from it’s new default value of 4, to the 5.6 default value of 1 to avoid purge thread contention. That is, unless you have a strong need for multiple purge threads running. This also begs the question of why the default was changed from 1 to 4 in 5.7 as it could be considered as an unsafe value.
Looking to optimize your MySQL use?