It is not uncommon that we need to filter out some DBs or Tables while setting up replication. It is important to understand how MySQL evaluates/processes the replication filtering rules to avoid conflict or confusion while setting them up. The purpose of this blog is to illustrate the rules and provide some suggestions for best practice.
MySQL provides 3 levels of filters for setting up replication: Binary log, DB, and Table. The binlog filters apply on the master to control how to log the changes. Since MySQL replication is based on the binlog, it is the first level filter and has the highest priority.
While the DB-level and Table-level filters apply on the slaves, since each table belongs to a schema, the DB-level filters have higher priority than the Table-level ones. Inside the Table-level filters, MySQL will evaluate the options in the order of: --replicate-do-table, --replicate-ignore-table, --replicate-wild-do-table, --replicate-wild-ignore-table.
Based on that, we have the following suggestions for setting up MySQL replication filter as best practice:
--replicate-do-db or --replicate-ignore-db. Never use both at the same time.binlog_format='statement' OR 'mixed' (in mixed mode, if a transaction is deterministic then it will be stored in statement format) and set up --replicate-do-db or --replicate-ignore-db on slaves, make sure never make changes on the tables across the default database on master otherwise you might lose the changes on slave due to default database not matching.--replicate-ignore-table and --replicate-wild-do-table to avoid conflicting and confusing.For MariaDB replication filters within Galera cluster, it should be used with caution. As a general rule except for InnoDB DML updates, the following replication filters are not honored in a Galera cluster: binlog-do-db, binlog-ignore-db, replicate-wild-do-db, replicate-wild-ignore-db. However, replicate-do-db, replicate-ignore-db filters are honored for DDL and DML for both InnoDB & MyISAM engines. As they might create discrepancies and replication may abort (see MDEV-421, MDEV-6229). (https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/), For the slaves replicating from cluster, the rules are similar with normal replication settings as above.
There are 2 options for setting binlog filter on master: --binlog-do-db and --binlog-ignore-db. MySQL will check --binlog-do-db first, if there are any options, it will apply this one and ignore --binlog-ignore-db. If the --binlog-do-db is NOT set, then mysql will check --binlog-ignore-db. If both of them are empty, it will log changes for all DBs.
See the below examples. In scenario 1) no binlog level filters are set and so all changes were logged; In scenario 2) --binlog-do-db and --binlog-ignore-db are all set to m_test and changes on the DB m_test were logged and changes on the DB test were NOT logged; In scenario 3) only --binlog-ignore-db is set to m_test and so changes on the DB m_test were NOT logged and changes on the DB test were logged;
scenario 1)—--binlog-do-db and --binlog-ignore-db is NOT set:
SQL
mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | vm-01-bin.000003 | 120 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) mysql> show binlog events in "vm-01-bin.000003" from 120; Empty set (0.00 sec) mysql> insert into t1(id,insert_time) values(10,now()); Query OK, 1 row affected (0.05 sec) mysql> show binlog events in "vm-01-bin.000003" from 120; +------------------+-----+------------+-----------+-------------+---------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+------------+-----------+-------------+---------------------------------------------------------------+ | vm-01-bin.000003 | 120 | Query | 1 | 211 | BEGIN | | vm-01-bin.000003 | 211 | Query | 1 | 344 | use `m_test`; insert into t1(id,insert_time) values(10,now()) | | vm-01-bin.000003 | 344 | Xid | 1 | 375 | COMMIT /* xid=17 */ | +------------------+-----+------------+-----------+-------------+---------------------------------------------------------------+ 3 rows in set (0.00 sec)
scenario 2)—--binlog-do-db=m_test and --binlog-ignore-db=m_test:
SQL
-- insert into tables of DB m_test was logged mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | vm-01-bin.000004 | 656 | m_test | m_test | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) mysql> use m_test mysql> insert into t1(insert_time) values(now()); Query OK, 1 row affected (0.02 sec) mysql> show binlog events in "vm-01-bin.000004" from 656; +------------------+-----+------------+-----------+-------------+---------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+------------+-----------+-------------+---------------------------------------------------------+ | vm-01-bin.000004 | 656 | Query | 1 | 747 | BEGIN | | vm-01-bin.000004 | 747 | Intvar | 1 | 779 | INSERT_ID=13 | | vm-01-bin.000004 | 779 | Query | 1 | 906 | use `m_test`; insert into t1(insert_time) values(now()) | | vm-01-bin.000004 | 906 | Xid | 1 | 937 | COMMIT /* xid=26 */ | +------------------+-----+------------+-----------+-------------+---------------------------------------------------------+ 4 rows in set (0.00 sec) -- insert into tables of DB test was NOT logged mysql> use test; mysql> show master status ; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | vm-01-bin.000004 | 937 | m_test | m_test | | +------------------+----------+--------------+------------------+-------------------+ mysql> insert into t1(`a`) values('ab'); Query OK, 1 row affected (0.03 sec) mysql> show binlog events in "vm-01-bin.000004" from 937; Empty set (0.00 sec)
scenario 3)—--Binlog_Do_DB=null --binlog-ignore-db=m_test:
SQL
mysql> use m_test mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | vm-01-bin.000005 | 120 | | m_test | | +------------------+----------+--------------+------------------+-------------------+ mysql> insert into t1(insert_time) values(now()); Query OK, 1 row affected (0.01 sec) mysql> show binlog events in "vm-01-bin.000005" from 120; Empty set (0.00 sec) mysql> use test mysql> insert into t1(`a`) values('ba'); Query OK, 1 row affected (0.03 sec) mysql> show binlog events in "vm-01-bin.000005" from 120; +------------------+-----+------------+-----------+-------------+----------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+------------+-----------+-------------+----------------------------------------------+ | vm-01-bin.000005 | 120 | Query | 1 | 199 | BEGIN | | vm-01-bin.000005 | 199 | Query | 1 | 305 | use `test`; insert into t1(`a`) values('ba') | | vm-01-bin.000005 | 305 | Xid | 1 | 336 | COMMIT /* xid=22 */ | +------------------+-----+------------+-----------+-------------+----------------------------------------------+ 3 rows in set (0.00 sec)
So, for Binlog-level filter, we will use either one (and ONLY one or none) of the 2 options: --binlog-do-db to make MySQL log changes for the DBs in the list. OR, --binlog-ignore-db to make MySQL log changes for the DBs NOT in the list. Or leave both of them empty to log changes for all the DBs.
However, we usually recommend NOT to setup any binlog-level filters. The reason is that to log changes for all DBs and set up filters only on slaves will achieve the same purpose and let us have an extra full copy of data changes for the master, in case we will need that for recovery.
There are 2 options for setting DB-level filters: --replicate-do-db or --replicate-ignore-db. MySQL processes these two filters the similar way as it processes the Binlog-level filters, the difference is that it ONLY applies on the slaves and so affects how the slaves replicate from its master. It will check --replicate-do-db first, if there are any options, it will replicate the DBs in the list and ignore --replicate-ignore-db. If the --replicate-do-db is NOT set, then mysql will check --replicate-ignore-db and replicate all the DBs except for the ones in this list. If both of them are empty, it will replicate all the DBs. You can find the process in the chart in the official documentation.
There is a trick for DB-level filters though If the binlog_format is set as statement or mixed. (The binlog_format =mixed also applies here, it is because that in mixed mode replication, in case the transaction is deterministic it will be resolved to statement which is equivalent to statement mode).
Since "With statement-based replication, the default database is checked for a match." (MySQL Documentation). If you set up --replicate-do-db and you update a table out of the default database in master, the update statement will not be replicated if the default database you are running command from is not in the --replicate-do-db. For example, there are 2 DBs in master, you set binlog_format='statement' OR ‘mixed’ and set --replicate-do-db=DB1 on slave. When execute the following commands: use DB2; update DB1.t1 ... This update command will not be executed on slave. To make the update statement replicated to slave, you need to do: use DB1; update t1 …
For example: with binlog_format=statement or binlog_format=mixed, we insert into m_test.t1 in two approaches: one is using default DB as m_test, the other one is using default DB test, the changes are all logged in the master. But in slave, after it caught up, only the insert (default DB is m_test) was replicated to slave, and the insert (default DB is test) was NOT replicated.
Scenario 1) binlog_format=statement
In master:
SQL
mysql> use m_test Database changed mysql> insert into m_test.t1(insert_time) values(now()); -- Logged mysql> use test; Database changed mysql> insert into m_test.t1(insert_time) values(now()); -- Logged
In slave: Only the first insert was replicated because the default DB matched the filter.
Scenario 2) binlog_format=mixed
In slave: Similar result as above, as deterministic statements are treated as statement-based.
--replicate-do-db or --replicate-ignore-db. Never use both at the same time.binlog_format='statement' OR ‘mixed’ and set up DB filters on slaves, make sure never make changes on the tables across the default database, otherwise data discrepancy will be expected in the slaves.There are 4 options for setting Table-level filters: --replicate-do-table, --replicate-ignore-table, --replicate-wild-do-table or --replicate-wild-ignore-table. MySQL evaluates the options in order.
The above chart shows us that MySQL will first check --replicate-do-table, the tables listed here will be replicated and so won't be ignored by the following options. Then MySQL will check --replicate-ignore-table, the tables listed here will be ignored even if it shows up in the following wildcard options. The lowest priority is --replicate-wild-ignore-table.
Due to the priorities for the 4 Table-level options, to avoid confusing/conflicting, we suggest using only one of the 4 options, or using the following two options: --replicate-ignore-table and replicate-wild-do-table so that it is clear that the tables in --replicate-ignore-table will be ignored and the tables in replicate-wild-do-table will be replicated.
Looking to optimize your MySQL use?