Offline Visualization of Azkaban Workflows

Aug 11, 2014 / By Yanick Champoux

Tags: ,

As mentioned in my past adventures, I’m often working with the workflow management tool ominously called Azkaban. Its foreboding name is not really deserved; it’s relatively straightforward to use, and offers a fairly decent workflow visualization. For that last part, though, there is a catch: to be able to visualize the workflow, you have to (quite obviously) upload the project bundle to the server. Mind you, it’s not that much of a pain, and could easily managed by, say, a Gulp-fueled watch job. But still, it would be nice to tighten the feedback loop there, and be able to look at the graphs without having to go through the server at all.

Happily enough, all the information we need is available in the Azkaban job files themselves, and in a format that isn’t too hard to deal with. Typically, a job file will be called ‘foo.job’ and look like

type=command
command=echo "some command goes here"
dependencies=bar,baz

So what we need to do to figure out a whole workflow is to begin at its final job, and recursively walk down all its dependencies.

use 5.12.0;

use Path::Tiny;

sub create_workflow {
  my $job = path(shift);
  my $azkaban_dir = $job->parent;

  my %dependencies;

  my @files = ($job);

  while( my $file = shift @files ) {
    my $job = $file->basename =~ s/.job//r;

    next if $dependencies{$job}; # already processed

    my @deps = map  { split /s*,s*/ }
               grep { s/^dependencies=s*// }
                    $file->lines( { chomp => 1 } );

    $dependencies{$job} = @deps;

    push @files, map { $azkaban_dir->child( $_.'.job' ) } @deps;
  }

  return %dependencies;
}

Once we have that dependency graph, it’s just a question of drawing the little boxes and the little lines. Which, funnily enough, is a much harder job one would expect. And better left off to the pros. In this case, I decided to go with Graph::Easy, which output text and svg.

use Graph::Easy;

my $graph = Graph::Easy->new;

while( my( $job, $deps ) = each %dependencies ) {
    $graph->add_edge( $_ => $job ) for @$deps;
}

print $graph->as_ascii;

And there we go. We put those two parts together in a small script, and we have a handy cli workflow visualizer.

$ azkaban_flow.pl target/azkaban/foo.job

  +------------------------+
  |                        v
+------+     +-----+     +-----+     +-----+
| zero | --> | baz | --> | bar | --> | foo |
+------+     +-----+     +-----+     +-----+
               |                       ^
               +-----------------------+

Or, for the SVG-inclined,

$ azkaban_flow.pl -f=svg target/azkaban/foo.job

which gives us

Screen Shot 2014-08-10 at 3.09.42 PM

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>