Making Oozie Logs A Little Easier On The Eyes

Feb 13, 2014 / By Yanick Champoux

Tags: , ,

Today we’re having a quick one.

Earlier during the day, I had to peruse an Oozie log for the first time. And it looked like:


2014-02-11 20:13:14,211  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@:start:] Start action [0004636-140111040403753-oozie-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:14,212  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@:start:] [***0004636-140111040403753-oozie-W@:start:***]Action status=DONE
2014-02-11 20:13:14,212  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@:start:] [***0004636-140111040403753-oozie-W@:start:***]Action updated in DB!
2014-02-11 20:13:14,271  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] Start action [0004636-140111040403753-oozie-W@a-first-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:15,079  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] credentials is null for the action
2014-02-11 20:13:18,306  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] checking action, external ID [job_201401070500_217582] status [RUNNING]
2014-02-11 20:13:18,408  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] [***0004636-140111040403753-oozie-W@a-first-action***]Action status=RUNNING
2014-02-11 20:13:18,409  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] [***0004636-140111040403753-oozie-W@a-first-action***]Action updated in DB!
2014-02-11 20:13:34,367  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] callback for action [0004636-140111040403753-oozie-W@a-first-action]
2014-02-11 20:13:34,424  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] action completed, external ID [job_201401070500_217582]
2014-02-11 20:13:34,443  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@a-first-action] action produced output
2014-02-11 20:13:34,653  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] Start action [0004636-140111040403753-oozie-W@some-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:35,418  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] credentials is null for the action
2014-02-11 20:13:38,628  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] checking action, external ID [job_201401070500_217583] status [RUNNING]
2014-02-11 20:13:38,731  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] [***0004636-140111040403753-oozie-W@some-action***]Action status=RUNNING
2014-02-11 20:13:38,731  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] [***0004636-140111040403753-oozie-W@some-action***]Action updated in DB!
2014-02-11 20:13:57,659  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] callback for action [0004636-140111040403753-oozie-W@some-action]
2014-02-11 20:13:57,712  INFO HiveActionExecutor:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] action completed, external ID [job_201401070500_217583]
2014-02-11 20:13:57,729  WARN HiveActionExecutor:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10044]
2014-02-11 20:13:57,895  INFO ActionEndXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@some-action] ERROR is considered as FAILED for SLA
2014-02-11 20:13:57,964  INFO ActionStartXCommand:539 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@fail] Start action [0004636-140111040403753-oozie-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-02-11 20:13:57,965  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@fail] [***0004636-140111040403753-oozie-W@fail***]Action status=DONE
2014-02-11 20:13:57,965  WARN ActionStartXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@fail] [***0004636-140111040403753-oozie-W@fail***]Action updated in DB!
2014-02-11 20:13:58,036  WARN CoordActionUpdateXCommand:542 - USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
Finished: FAILURE

wut?

Okay, Java peeps have a predilection for verbose logs, but surely there is a way to make the whole thing a little more readable for poor, poor human eyes… So I quickly hacked the following:


use 5.10.0;

use strict;

my @lines = map { parse_line($_) } <>;

# the job id is big, and doesn't give us much, remove
for my $useless ( map { $_->{JOB} } @lines ) {
    for ( @lines ) {
        $_->{msg} =~ s/Q$useless//g;
        $_->{ACTION} =~ s/^Q$useless//;
    }
}

my %previous;
for my $l ( @lines ) {
    # we'll only print metadata that changed
    my @changes = grep { $l->{$_} ne $previous{$_} } 
                       qw/ USER GROUP TOKEN APP JOB ACTION /;

    say join ' ', map { $_ . "[" . $l->{$_} . "] " } @changes if @changes;

    say "t", $l->{time}, " ", $l->{msg};
    %previous = %$l;
}

sub parse_line {
    my $line = shift;

    # try to parse the line as a typical log line
    my( $time, $info ) = /^d{4}-d{2}-d{2}s*  # the date. Don't care
                           (d+:dd:dd)       # the time, More interesting
                           ,d+s*.*?-           # log level and stuff. Meh
                           (.*)                  # the message itself
                         /x
        or return ();

    my %data = ( time => $time );

    # capture some repeated metadata
    for my $k ( qw/ USER GROUP TOKEN APP JOB ACTION / ) {
        $data{$k} = $1 if $info =~ s/$k[(.*?)]s*//;
    }

    # useless and long, scrap it
    $info =~ s/[*{3}.*?*{3}]//;

    $data{msg} = $info;

    return %data;
}

And there we go,  a log trace that is a mite easier on the eyes…


$ perl filter oozie_mess.log
USER[running_user]  GROUP[-]  APP[some-big-job-workflow]  JOB[0004636-140111040403753-oozie-W]  ACTION[@:start:] 
    20:13:14  Start action [@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:14  Action status=DONE
    20:13:14  Action updated in DB!
ACTION[@a-first-action] 
    20:13:14  Start action [@a-first-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:15  credentials is null for the action
    20:13:18  checking action, external ID [job_201401070500_217582] status [RUNNING]
    20:13:18  Action status=RUNNING
    20:13:18  Action updated in DB!
USER[-]  TOKEN[-]  APP[-] 
    20:13:34  callback for action [@a-first-action]
USER[running_user]  TOKEN[]  APP[some-big-job-workflow] 
    20:13:34  action completed, external ID [job_201401070500_217582]
    20:13:34  action produced output
ACTION[@some-action] 
    20:13:34  Start action [@some-action] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:35  credentials is null for the action
    20:13:38  checking action, external ID [job_201401070500_217583] status [RUNNING]
    20:13:38  Action status=RUNNING
    20:13:38  Action updated in DB!
USER[-]  TOKEN[-]  APP[-] 
    20:13:57  callback for action [@some-action]
USER[running_user]  TOKEN[]  APP[some-big-job-workflow] 
    20:13:57  action completed, external ID [job_201401070500_217583]
    20:13:57  Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10044]
    20:13:57  ERROR is considered as FAILED for SLA
ACTION[@fail] 
    20:13:57  Start action [@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
    20:13:57  Action status=DONE
    20:13:57  Action updated in DB!
ACTION[-] 
    20:13:58  E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100

2 Responses to “Making Oozie Logs A Little Easier On The Eyes”

  • Geen says:

    None of the “wtf” is Java’s fault. This is how Oozie is logging.
    Please open a Jira ticket with how you’d like things to change so we can fix it! A patch would be even nicer :)

    • [Not Java's fault] Maybe not technically, but I think it’s hard to argue the fact that the Java sphere and its logging underworld are inclined to, ahem, let’s call it prolixity. ;-)

      Also, I have to be fair: I just looked at that specific log. I’m pretty sure there are options and knobs one can adjust to make the logs lean more toward human consumption. … right?

      [patches are welcome] … you realize how dangerous it is to say something like this to a guy like me, right? 8-D

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>