Watch Out When Running Out of Disk Space With InnoDB Group Replication Cluster

Oracle MySQL Community Edition comes with a native, virtually synchronous, clustering solution called InnoDB Cluster. This clustering solution uses a technology under the hood called Group Replication. This technology is very similar to Galera Cluster, although where Galera relies on their own wsrep (write-set replication) concepts, InnoDB cluster uses native concepts available in regular MySQL such as binary logs, GTID’s and more.
InnoDB Cluster has been available since version 5.7, however it’s much improved in 8.0. A best practice setup would be to have a minimum of three nodes running MySQL 8.0. This is the setup I was running when I encountered the issue triggering this blog post.
mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | group_replication_applier | 14c7f5b3-67d2-11eb-9e96-08002799f743 | node0 | 3306 | ONLINE | PRIMARY | 8.0.23 | | group_replication_applier | fffa336e-67d2-11eb-a791-08002799f743 | node1 | 3306 | ONLINE | SECONDARY | 8.0.23 | | group_replication_applier | 89170bb4-67d3-11eb-91eb-08002799f743 | node2 | 3306 | ONLINE | SECONDARY | 8.0.23 | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
Introducing a failure to the cluster
One of the observations I recently made in an environment I was managing, was that one of the nodes was running out of disk space for storing binary logs. So in my testing environment I chose to become a bad DBA and wreak some havoc into my nicely operating cluster.
node1:~ # dd if=/dev/zero of=/mysql/bin/junk dd: writing to '/mysql/bin/junk': No space left on device 1815769+0 records in 1815768+0 records out 929673216 bytes (930 MB, 887 MiB) copied, 10.3299 s, 90.0 MB/s
MySQL is aware that this is the current situation:
2021-02-07T10:07:54.879371Z 14 [ERROR] [MY-000035] [Server] Disk is full writing '/mysql/bin/node1-bin.000002' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
At this point my node1
can no longer commit any transactions to its local data copy as it’s configured to have sync_binlog = 1
. What I’ve effectively done is introduce replication lag in my synchronous cluster.
What troubled me in this situation is that the cluster does not seem to be aware of this situation. My node1
status hasn’t changed at all in the cluster status and it still reports to be ONLINE
just like my output at the beginning of this post. As for cluster operations, we have two out of three nodes working fine; one as PRIMARY
the other as SECONDARY
. So even with the “broken node” our cluster should be operating fine, which it actually does.
Wreaking some more havoc
With node1
getting more and more behind in replication I decided to test the cluster response to failure. I decided to kill my SECONDARY node (node2
). Rather than cleanly shutting down MySQL I chose to kill -9
my instance. To avoid systemd from restarting the instance I also used systemctl stop mysql
.
My replication group noticed the change and removed node2
from the group.
2021-02-07T10:20:29.130938Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address node2:3306 has become unreachable.' 2021-02-07T10:20:45.956837Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: node2:3306' 2021-02-07T10:20:45.957419Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to node0:3306, node1:3306 on view 16126893147346366:12.'
At this time the cluster status was as follows:
- node0: PRIMARY – ONLINE – processing transactions
- node1: SECONDARY – ONLINE – NOT processing transactions
- node2: DOWN
I assumed the group would lose quorum since node0
is proposing transactions but there’s no node available to (reliably) certify these transactions.
Relying on binary logs
In the existing situation if we just restart node2
, it would be able to connect up to node0
to get the binary logs and perform an incremental recovery to bring itself in a healthy situation again.
What if node0
no longer had the binary logs to do that? On node0
we purge the binary logs that node2
needs.
MySQL node0:33060+ ssl JS > \sql FLUSH BINARY LOGS; Query OK, 0 rows affected (0.0983 sec) MySQL node0:33060+ ssl JS > \sql PURGE BINARY LOGS TO 'node0-bin.000002'; Query OK, 0 rows affected (0.0234 sec)
After this we start node2
again and we watch the error log:
2021-02-07T10:23:03.792160Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.23) starting as process 28032 ... 2021-02-07T10:23:06.220509Z 0 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.' ... 2021-02-07T10:23:07.236538Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.23' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server - GPL. ... 2021-02-07T10:23:10.078442Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address node0:3306.' 2021-02-07T10:23:11.079997Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor' 2021-02-07T10:23:11.080484Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to node0:3306, node2:3306, node1:3306 on view 16126893147346366:13.' ... 2021-02-07T10:23:22.075506Z 22 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_15973@node1:3306',replication started in log 'FIRST' at position 4
We see that node2
links up to node1
for incremental recovery. There’s no reason why it shouldn’t do that since node1
is reporting as ONLINE with no issues. However in reality node1
has not been replicating, not committing transactions and thus not generating binary logs. As a result, my node2
is stuck in RECOVERING status until I fix node1
and hopefully make it catch up with the group.
All this time node0
has been writing and committing without any issues while these writes were not getting replicated anywhere. In my opinion, in an InnoDB cluster environment, this is problematic.
Conclusion
The obvious conclusion here is to ensure that you’re properly monitoring your disk space and that you do not let your MySQL server run out of space at any time. Make sure to monitor all volumes if you’ve spread your data-files, binary log files and other log files to different volumes. All writes are important.
To report this issue to the development team, I opened a bug-report on bugs.mysql.com. Feel free to click “affects me” if, like me, you consider this a significant problem.