This is the first place I am announcing this: The Pythian Group has made me a Team Lead. I am extremely honored and somewhat humbled by this, and I am determined to do a good job. I started officially on Monday, March 3rd, and my first week went pretty well.
On Saturday, I spent a short bit of time automating one process. And while I was waiting for a 300G backup to copy from one machine to another, I worked on automating more.
Currently I have one somewhat junior DBA working for me, and I am getting another DBA tomorrow. But yesterday, I put in more than a half day of work. Why? Well, I was automating more to make the life of my team members easier.
As part of the DBA service, we offer monitoring in the form of alerting, and also in the form of daily checks to ensure everything is running smoothly. Daily checks consist of things that do not need to be checked every minute, but should be checked frequently. For instance, one of our daily checks is to ensure that the running database configuration matches the config file (i.e.,
my.ini). This is very valuable to ensure that no changes get lost. A DBA might be adjusting the configuration, but forget to put the final changes in the config file. In that case, the next day our daily checks will throw a warning, and that DBA will say “oh yeah, I forgot to put that into the config file!”
As many monitoring systems do, our system has false negatives. Though we do not normally “do the dailies” on weekends, I spent some time Saturday with them. I took the checks that were false negatives and fixed them to not show errors or warnings for those false negatives. For instance, many of our machines complained that the
have-bdb parameter was set to
DISABLED in the database but set to
YES in the default file — because we used
skip-bdb in the
/etc/my.cnf. We did not set something like
have-bdb=0, so the check complained. I fixed that problem, and a few others.
The result is that I went from 26% of my team’s machines reporting “daily checks are OK, nothing to look at” to 46% of my team’s machines reporting that. This means that my team can be more productive and spend time on the real errors, instead of clicking and taking the time to realize “oh, yeah, that, it’s just a false negative.”
4 Responses to “Automating To Save Time”
Leave a Reply