Simple auditing, i.e., knowing what changed recently, can save you tons of time while troubleshooting.
I know that, in the ideal world:
- Everything is supposed to be done through configuration management.
- Everything is documented and all changes are tracked through a VCS.
- Every DDL or set global is trapped via MySQL Proxy and logged.
But there are always ways to bypass the gatekeepers. Changes can go in unnoticed. An hour or so later, your database performance suddenly changes for the worse, and you get that phone call.
First you check if anything caused an actual error. You look around at a few log files and nothing shows up. The next thing you ask yourself is, did someone change anything in the last little while. Of course, everybody says no. After a few hours of digging, comparing schemas, diff-ing old and current config files, you actually find what has changed, put it back the way it was and everything is back to normal. You ask the question again, did anybody change the global variable from X to Y, and someone finally admits it. But they thought that it was not relevant since they did that change on the weekend and the system performance didn’t go down the toilet until Monday morning around 9:00am. Sound familiar?
We have all gone through this many times, and it doesn’t matter what process you have in place — something always slips through the cracks.
I came up with some simple stored procedure that will compare two data sets and keep track of the changes historically. It’s loosely based on slowly-changing dimension type 2 in the data warehouse world (google Ralph Kimball if you want to get all the gory details). This method tracks only changes, so you should be able to keep historical rows forever, unless you are constantly dropping and creating whatever you are tracking.
Say you want to keep track of
my.cnf or global variables changes. This one of the simplest tables to track, the
information_schema — it has only two columns. Others, such as tables or routines, would make the SQL to compare a little more involved.
First you need a table to track this with. I called mine
historical_global_variable. It has the two columns from the original table plus three additional ones:
- row_effective_date: Time of change e.g. when the row was the current value
- row_end_date: Time of next change e.g. when the row was no longer the current value
- current_value_flag : 1 if its the current value 0 otherwise
There are three conditions that we might encounter:
- The first execution of our auditing process or a new variable, which can occur after an upgrade.Insert new value into the
row_end_dateto some date far in the future — say, 100 or 500 years.
- Variable change through a
my.cnfchange followed by a bounce, a
set global variablecommand, or a new default value which can happen for an unset parameter after an upgrade
- Set the
current_value_flag = 0; the
row_end_date = now()for the current value of the variable
- Insert new value into the h
row_end_dateto some date far in the future — say, 100 or 500 years
- Set the
- Deprecated variable which can occur after an upgradeSet the
current_value_flag = 0; the
row_end_date = now()for the current value of the variable
I did my tests in MySQL 5.1 so I was able to create a procedure and schedule it with events.
The table structure:
CREATE TABLE `historical_global_variable` (`variable_name` varchar(64) NOT NULL DEFAULT '', `variable_value` varchar(20480) DEFAULT NULL, `row_effective_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', `row_end_date` datetime DEFAULT NULL, `current_value_flag` smallint(6) DEFAULT NULL, PRIMARY KEY (`variable_name`,`row_effective_date`) ) ;
The stored procedure:
DELIMITER $$DROP PROCEDURE IF EXISTS `audit`.`populate_historical_global_variable`$$ CREATE PROCEDURE `populate_historical_global_variable`() MODIFIES SQL DATA DETERMINISTIC BEGIN declare done int default 0; declare c_variable_name varchar(64); declare p_variable_name varchar(64); declare c_variable_value varchar(20480); declare p_variable_value varchar(20480); declare v_now timestamp default now(); declare c1 cursor for select t.variable_name, t.variable_value c_variable_value, h.variable_value p_variable_value, h.variable_name p_variable_name from temp_global_variables t left join (select * from historical_global_variable where current_value_flag=1) h on h.variable_name = t.variable_name union select t.variable_name, t.variable_value c_variable_value, h.variable_value p_variable_value, h.variable_name p_variable_name from (select * from historical_global_variable where current_value_flag=1) h left join temp_global_variables t on h.variable_name = t.variable_name; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; delete from temp_global_variables; insert into temp_global_variables select * from information_schema.global_variables; open c1; repeat fetch c1 into c_variable_name, c_variable_value, p_variable_value, p_variable_name; if not done then # New variable not seen yet if p_variable_name is null then insert into historical_global_variable (variable_name, variable_value, row_effective_date, current_value_flag, row_end_date) values (c_variable_name, c_variable_value, v_now, 1, date_add(v_now, interval 500 year)); # Deprecated variable elseif c_variable_name is null then update historical_global_variable set current_value_flag=0, row_end_date = v_now where variable_name=p_variable_name and current_value_flag=1; # Variable change elseif ifnull(p_variable_value,'a a a') <> ifnull(c_variable_value,'a a a') then update historical_global_variable set current_value_flag=0, row_end_date = v_now where variable_name=c_variable_name and current_value_flag=1; insert into historical_global_variable (variable_name, variable_value, row_effective_date, current_value_flag, row_end_date) values (c_variable_name, c_variable_value, v_now, 1, date_add(v_now, interval 500 year)); end if; end if; until done end repeat; close c1; END$$ DELIMITER ;
I scheduled this to run daily, as there are not supposed to be many changes. You can do it hourly if you need to.
Now, how can you use it? There are typical queries that you can run, and these queries can be executed on anything audited.
Current values (what does it look like now):
select variable_name, variable_value from historical_global_variable where current_value_flag=1; variable_name variable_value AUTOCOMMIT ON AUTOMATIC_SP_PRIVILEGES ON AUTO_INCREMENT_INCREMENT 1 AUTO_INCREMENT_OFFSET 1 BACK_LOG 50 BASEDIR /usr/local/mysql-5.1.24/ etc
This is a long list! It’s too bad that the
information_schema.global_variables view does not have a default value indicator. The query can be used to regenerate your
my.cnf if it gets mangled beyond repair.
Active values for a specific point in time(what did it look like for a specific timestamp):
select variable_name, variable_value from historical_global_variable where '2008-04-17' between row_effective_date and row_end_date;
Historical value changes for MAX_TMP_TABLES (Historical changes to a specific object ):
select row_effective_date, row_end_date, current_value_flag, variable_value from historical_global_variable where variable_name='MAX_TMP_TABLES' order by row_effective_date; row_effective_date row_end_date current variable value value flag 2008-04-16 17:21:38 2008-04-16 17:37:23 0 36 2008-04-16 17:38:04 2508-04-16 17:38:04 1 32
Changes in the last 2 days:
select row_effective_date, row_end_date, current_value_flag, variable_name, variable_value from historical_global_variable where row_effective_date >= date_sub(now(), interval 2 day) order by row_effective_date; row_effective_date row_end_date current variable variable value name value flag 2008-06-09 12:00:00 2008-06-10 12:00:00 0 TIMESTAMP 1213027200 2008-06-10 12:00:00 2008-06-10 15:46:22 0 TIMESTAMP 1213113600 2008-06-10 15:46:22 2508-06-10 15:46:22 1 TIMESTAMP 1213127182
Deprecated Parameters (what has been removed):
select row_effective_date, row_end_date, variable_name, variable_value from historical_global_variable h where current_value_flag = 0 and row_effective_date = ( select max(max_h.row_effective_date) from historical_global_variable max_h where max_h.variable_name = h.variable_name);
It returned no rows for me since I haven’t upgraded since the procedure was created but you get the picture.
New Parameters (what has been Added):
select row_effective_date, row_end_date, variable_name, variable_value from historical_global_variable h where row_effective_date = ( select min(row_effective_date) from historical_global_variable max_h where max_h.variable_name = h.variable_name) and row_effective_date >= date_sub(now(), interval 2 day);
This is a rare occurrence, but it can be valid if you track both columns and tables.
I have attempted to do the same with the
information_schema.routines table with the same level of success, but as I said earlier, the procedure is little more daunting to write, since you must compare every column to which you wish to track changes.
It’s simple, but once written and scheduled, it’s a time saver when you need to know what changed. The next step would be to write a script that can generate the stored procedures and then we can track any change.
Let me know if you think it was a worthwhile exercise, and if there is any other topic in this area you would like me to address.
Interested in working with Robert? Schedule a tech call.