Recently, we worked with a customer seeking assistance in monitoring their MySQL 5.7 InnoDB cluster. Following a network outage that caused a split-brain scenario with no writable primaries, they required a robust monitoring strategy to prevent future downtime.
While I initially expected to pull node roles (Primary vs. Secondary) directly from the performance_schema tables—a standard feature in version 8.0—it turns out this functionality was never back-ported to 5.7. Despite documentation suggesting otherwise, the MEMBER_ROLE column is missing in 5.7, leading me to file a bug report with Oracle for either a back-port or a documentation update.
Our primary strategy focuses on tracking the online node count. For a cluster to remain writable, it requires a quorum (at least two nodes, though three are preferred for fault tolerance).
Here is how a healthy 5.7 cluster appears in the MySQL Shell:
// node1:3306 ssl JS > cluster.status(); { "clusterName": "innodb_cluster", "defaultReplicaSet": { "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "node1:3306": { "mode": "R/W", "role": "HA", "status": "ONLINE" }, "node2:3306": { "mode": "R/O", "role": "HA", "status": "ONLINE" }, "node3:3306": { "mode": "R/O", "role": "HA", "status": "ONLINE" } } } }
In the performance_schema, this translates to the following:
SELECT * FROM performance_schema.replication_group_members;
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_STATE |
| group_replication_applier | ...bcf1 | node2 | ONLINE |
| group_replication_applier | ...00ec | node1 | ONLINE |
| group_replication_applier | ...0c64 | node3 | ONLINE |
To simulate the customer's failure, I used iptables to drop traffic on the cluster communication adapter:
iptables -A INPUT -p tcp -i eth1 -j DROP
The error log immediately captures the split:
[Warning] Member with address node3:3306 has become unreachable.[Note] Group membership changed to node2:3306, node1:3306.To monitor this via SQL, use this count query:
SELECT COUNT(*) FROM performance_schema.replication_group_members WHERE MEMBER_STATE = 'ONLINE';
Alert Thresholds:
A more targeted approach is to verify that at least one PRIMARY node is active. Since InnoDB clusters can technically support multiple primaries (though not by default), we check for "at least one" instead of "exactly one."
You can identify the current primary by joining the global_status with replication_group_members:
SELECT COUNT(*) FROM performance_schema.replication_group_members WHERE member_id = ( SELECT variable_value FROM performance_schema.global_status WHERE variable_name = 'group_replication_primary_member' ) AND MEMBER_STATE = 'ONLINE';
A result of 1 confirms the current primary is online and functional.
Because the cluster is designed to promote a new primary automatically as long as quorum exists, monitoring the cluster.status() JSON output is the most comprehensive method. Even for a 5.7 cluster, it is recommended to use the MySQL Shell 8.0 for its enhanced scripting capabilities.
Using mysqlsh combined with the jq command-line tool allows for precise status parsing:
| Cluster State | Command | Result |
| Healthy | `mysqlsh root@127.0.0.1 -- cluster status | jq .defaultReplicaSet.status` |
| Degraded | `mysqlsh root@127.0.0.1 -- cluster status | jq .defaultReplicaSet.status` |
Handling "NO_QUORUM" States:
When a cluster loses quorum, mysqlsh outputs warnings that can break JSON parsing. You can use grep to clean the output before passing it to jq:
mysqlsh root@127.0.0.1 -- cluster status | \ grep -v WARNING | \ grep -v 'Write operations on the InnoDB cluster will not be allowed.' | \ jq .defaultReplicaSet.status
Result: "NO_QUORUM"
By implementing one or more of these three options—Node Counting, Primary Verification, or JSON Status Parsing—you can gain reliable insights into your InnoDB cluster's health and ensure your monitoring system alerts you before a total partition occurs.
Looking to optimize your MySQL use?