Tag: Hadoop

Calculating Business Days in HiveQL

One of the common tasks in data processing is to calculate the number of days between two given dates. You can easily achieve this by using Hive DATEDIFF function. You can also get weekday number by using this more obscure…

Read More >

Watch: Hadoop vs. Riak

Every data platform has its value, and deciding which one will work best for your big data objectives can be tricky—Alex Gorbachev, Oracle ACE Director, Cloudera Champion of Big Data, and Chief Technology Officer at Pythian, has recorded a series…

Read More >

Watch: Hadoop vs. HBase

Every data platform has its value, and deciding which one will work best for your big data objectives can be tricky—Alex Gorbachev, Oracle ACE Director, Cloudera Champion of Big Data, and Chief Technology Officer at Pythian, has recorded a series…

Read More >

Avro MapReduce Jobs in Oozie

Normally when using Avro files as input or output to a MapReduce job, you write a Java main[] method to set up the Job using AvroJob. That documentation page does a good job of explaining where to use AvroMappers, AvroReducers,…

Read More >

Is X a Big Data Product?

Virtually everyone in data space claims today that they are a Big Data vendor and that their products are Big Data products. Of course, if you are not in Big Data then you are legacy. So how do you know whether a product is a Big Data product?

Read More >

Small Files on MapR-FS

One of the well-known best practices for HDFS is to store data in few large files, rather than a large number of small ones. There are a few problems related to using many small files but the ultimate HDFS killer…

Read More >

Cloudera Challenge 2014

Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of…

Read More >

Essential Hadoop Concepts for Systems Administrators

Of course, everyone knows Hadoop as the solution to Big Data. What’s the problem with Big Data? Well, mostly it’s just that Big Data is too big to access and process in a timely fashion on a conventional enterprise system….

Read More >

Microsoft Analytics Platform System: Name Overhaul in Big Data War!

I had the chance to attend a course about what used to be called Parallel Data Warehouse (PDW). PDW was introduced few years ago with the offering of SQL Server 2008 R2 Parallel Data Warehouse , something very few people…

Read More >

C14 OakTable World Las Vegas

OakTable Network will be holding its OakTable World for the very first time during the COLLABORATE conference.

Read More >
Page 1 of 212