Customizing pt-stalk to capture the diagnostics data you really need

Pythian Marketing

February 7, 2017

Tags: Oracle, Technical Track, Technical Blog, Database, Database Troubleshooting, Group Blog Posts, Oracle Cursor

Valeriy Kravchuk's great recent post on using oprofile to profile MySQL mentioned how pt-stalk, the script from Percona Toolkit to capture diagnostics data based on a detected condition, currently does not support the new operf comand, and relies instead on the deprecated and soon to be removed, opcontrol. Fortunately, in the Open Source world, we deal with these situations by contributing, and this seemed a simple enough change that I could get a PR ready quickly and reply on Valeriy's post with the announcement. However, I don't want to let it end here. Instead, I want to take this opportunity to show how you too can customize pt-stalk to suit your data capture needs, and this is what this blog post is about. We must start our customizing path by deciding which part of the puzzle handles the behavior we want to modify, as the decision will determine the path to making a contribution. This will consist of updating one of the modules, writing a plugin, or forking the whole project.

Updating a Module

My very simple PR modifies the --collect-oprofile command which, as the name implies, is part of the collect module. This means that from the root directory of the repo, we need to edit the lib/bash/collect.sh file. There is one small catch, which is that the toolkit's scripts that we end up executing as users (i.e. those that live under the /bin/ directory of the repo) include all (or most of, in the case of the perl tools) the modules used by the tool. In the perl based tools' comments, this is described as being 'fat-packed', and it means there is one more step we need to take in order to update the end-user script with our changes made to modules. In the case of my PR, here is what I had to run before committing my changes: ./util/update-modules bin/pt-stalk collect This can be read as 'update the modules used by pt-stalk to include the latest version of collect'. If you think this means the repo has redundant code, you are right. If you think this has the risk of a given branch being internally inconsistent, once again, you are right. It is up to the maintainer of a branch (or the author of a PR, in this case) to make sure this does not happen. I think it's a trade-off worth making in order to keep the code reasonably maintainable and easy to use at the same time.

Writing Plugins

What if we wanted to alter the way pt-stalk monitors for the trigger condition? You'll notice there is no stalk module under lib/bash, so, in this case, we may have to modify the pt-stalk script directly. However, the script supports plugins, so in most cases, we should be able to achieve the desired behavior just by writing one. Over the years, I have been collecting such plugins, as written by community members, including myself, in this repo. You'll notice there are two types of plugins there:

function plugins, which alter the way pt-stalk monitors for a condition, and
general plugins, which alter the way pt-stalk collects data.

Function plugins are provided as an argument to --function, not --plugin. Open any file in that sub-directory and you'll see the interface is dead simple: just write a trg_plugin function that outputs a number, and then the number will be used by pt-stalk to compare it against its --threshold value. That's all there is to it, but that can go a long way, as seen from the examples that let you trigger collections based on replication delay, load average, running transaction time, etc.

Forking the Project

Finally, sometimes a change is big enough that none of the clean, 'proper' ways to extend pt-stalk work. In that case, you have to fork. Doing so is not complicated, but doing so in a way that makes the resulting code easy to merge back takes some effort, which is why I am, for now, keeping this branch with MongoDB support as 'just a fork'. Maybe one day I will decide that the effort to keep up with upstream's changes is enough to warrant some invested time to make a PR, but for now, whenever I need to use pt-stalk with MongoDB, I just grab that code. What factors do I take into account when deciding if a fork is worth the work of creating a PR? Among others, I ask myself these questions:

Can I easily test the new behavior using the existing test cases and environment (the answer is 'yes' to my changes to collect-oprofile, but 'no' to adding MongoDB support. As expected, there are no MongoDB tests in pt-stalk, and the sandboxes used to test Percona Toolkit are only MySQL based.
Does this behavior have the potential to benefit lots of people? If not, it does not make sense for me to spend time creating the PR, and for the upstream maintainers to review it and merge it.
What are the odds that my changes will break something else in a way that is not detected by existing test cases? Adding MongoDB support is a pretty big change. It is a risk worth taking if I will use my fork only on MongoDB servers, and use upstream's on MySQL. But if it were to be integrated into the project, a lot more effort would be needed to make sure that MySQL users aren't negatively impacted by any changes made to support MongoDB.

Conclusion

While showing its age, pt-stalk is still a very good option to monitor servers for rare-to-catch problems, and it incorporates a lot of knowledge that we may miss if we were to write a similar tool from scratch. Important knowledge, like how to make sure the monitoring tool itself won't cause a service disruption by filling up a partition or eating up all available database connections. Hopefully, this post shows that 'showing its age' is something that anyone with some time and determination can help with, by:

updating some modules,
writing plugins, or
forking entire tools.

If you ask me, that is one of the beauties of Open Source, one that I appreciate even more now that I work at Pythian, where we help clients with an amazingly wide spectrum of technologies. This means we are constantly challenged to cross pollinate the toolsets from different ecosystems, and we are in a very good position to make some contributions back to the community.

Insight and analysis of technology and business strategy

Customizing pt-stalk to capture the diagnostics data you really need

Updating a Module

Writing Plugins

Forking the Project

Conclusion

Top Categories

Tell us how we can help!