Offline Visualization of Azkaban Workflows

Posted in: Technical Track

As mentioned in my past adventures, I’m often working with the workflow management tool ominously called Azkaban. Its foreboding name is not really deserved; it’s relatively straightforward to use, and offers a fairly decent workflow visualization. For that last part, though, there is a catch: to be able to visualize the workflow, you have to (quite obviously) upload the project bundle to the server. Mind you, it’s not that much of a pain, and could easily managed by, say, a Gulp-fueled watch job. But still, it would be nice to tighten the feedback loop there, and be able to look at the graphs without having to go through the server at all.

Happily enough, all the information we need is available in the Azkaban job files themselves, and in a format that isn’t too hard to deal with. Typically, a job file will be called ‘foo.job’ and look like

command=echo "some command goes here"

So what we need to do to figure out a whole workflow is to begin at its final job, and recursively walk down all its dependencies.

use 5.12.0;

use Path::Tiny;

sub create_workflow {
  my $job = path(shift);
  my $azkaban_dir = $job->parent;

  my %dependencies;

  my @files = ($job);

  while( my $file = shift @files ) {
    my $job = $file->basename =~ s/\.job//r;

    next if $dependencies{$job}; # already processed

    my @deps = map  { split /\s*,\s*/ }
               grep { s/^dependencies=\s*// }
                    $file->lines( { chomp => 1 } );

    $dependencies{$job} = \@deps;

    push @files, map { $azkaban_dir->child( $_.'.job' ) } @deps;

  return %dependencies;

Once we have that dependency graph, it’s just a question of drawing the little boxes and the little lines. Which, funnily enough, is a much harder job one would expect. And better left off to the pros. In this case, I decided to go with Graph::Easy, which output text and svg.

use Graph::Easy;

my $graph = Graph::Easy->new;

while( my( $job, $deps ) = each %dependencies ) {
    $graph->add_edge( $_ => $job ) for @$deps;

print $graph->as_ascii;

And there we go. We put those two parts together in a small script, and we have a handy cli workflow visualizer.

$ target/azkaban/foo.job

  |                        v
+------+     +-----+     +-----+     +-----+
| zero | --> | baz | --> | bar | --> | foo |
+------+     +-----+     +-----+     +-----+
               |                       ^

Or, for the SVG-inclined,

$ -f=svg target/azkaban/foo.job

which gives us

Screen Shot 2014-08-10 at 3.09.42 PM

Interested in working with Yanick? Schedule a tech call.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *