Hadoop

Calculating Business Days in HiveQL

One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure function: SELECT FROM_UNIXTIME(your_date,’u’) FROM some_table; This will return 1 for Monday, 7 for Sunday and…

Watch: Hadoop vs. Riak

Every data platform has its value, and deciding which one will work best for your big data objectives can be tricky—Alex Gorbachev, Oracle ACE Director, Cloudera Champion of Big Data, and Chief Technology Officer at Pythian, has recorded a series of videos comparing the various big data platforms and presents use cases to help you…

Watch: Hadoop vs. HBase

Every data platform has its value, and deciding which one will work best for your big data objectives can be tricky—Alex Gorbachev, Oracle ACE Director, Cloudera Champion of Big Data, and Chief Technology Officer at Pythian, has recorded a series of videos comparing the various big data platforms and presents use cases to help you…

Avro MapReduce Jobs in Oozie

Normally when using Avro files as input or output to a MapReduce job, you write a Java main[] method to set up the Job using AvroJob. That documentation page does a good job of explaining where to use AvroMappers, AvroReducers, and the AvroKey and AvroValue (N.B. if you want a file full of a particular…

Is X a Big Data Product?

Virtually everyone in data space claims today that they are a Big Data vendor and that their products are Big Data products. Of course, if you are not in Big Data then you are legacy. So how do you know whether a product is a Big Data product?

Small Files on MapR-FS

One of the well-known best practices for HDFS is to store data in few large files, rather than a large number of small ones. There are a few problems related to using many small files but the ultimate HDFS killer is that the memory consumption on the name node is proportional to the number of…

Cloudera Challenge 2014

Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of effort into it, but it paid off in the end! Note: I won’t bother you…

Essential Hadoop Concepts for Systems Administrators

Of course, everyone knows Hadoop as the solution to Big Data. What’s the problem with Big Data? Well, mostly it’s just that Big Data is too big to access and process in a timely fashion on a conventional enterprise system. Even a really large, optimally tuned, enterprise-class database system has conventional limits in terms of…

Page 1 of 212