Avro MapReduce Jobs in Oozie

Normally when using Avro files as input or output to a MapReduce job, you write a Java main[] method to set up the Job using AvroJob. That documentation page does a good job of explaining where to use AvroMappers, AvroReducers, and the AvroKey and AvroValue (N.B. if you want a file full of a particular…

Making Oozie Logs A Little Easier On The Eyes

Today we’re having a quick one. Earlier during the day, I had to peruse an Oozie log for the first time. And it looked like: 2014-02-11 20:13:14,211 INFO ActionStartXCommand:539 – USER[running_user] GROUP[-] TOKEN[] APP[some-big-job-workflow] JOB[0004636-140111040403753-oozie-W] ACTION[0004636-140111040403753-oozie-W@:start:] Start action [0004636-140111040403753-oozie-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-02-11 20:13:14,212 WARN ActionStartXCommand:542 – USER[running_user]…

Oozing Caribou

Meet Oozie’s Workflows Oozie is a workflow scheduler for Hadoop, but that’s not terribly important right now. What is important is that it defines its workflows using an XML dialect. And as all XML things go, the result is… shall we say, less than easy on the eyes and the typing fingers.