Tag: Hadoop

C14 OakTable World Las Vegas

OakTable Network will be holding its OakTable World for the very first time during the COLLABORATE conference.

Read More >

Oozing Caribou

Meet Oozie’s Workflows Oozie is a workflow scheduler for Hadoop, but that’s not terribly important right now. What is important is that it defines its workflows using an XML dialect. And as all XML things go, the result is… shall…

Read More >

Connection Resets When Importing from Oracle with Sqoop

I’ve been using Sqoop to load data into HDFS from Oracle. I’m using version 1.4.3 of Sqoop, running on a Linux machine and using the Oracle JDBC driver with JDK 1.6. I was getting intermittent connection resets when trying to…

Read More >

Deploying Cloudera Impala on EC2 with Example Live Demo

Pythian Big Data Impala Implementation

A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance.

Read More >

Using Ansible to Secure Cloudera Manager Installation on a Hadoop Cluster

Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts.

Read More >

Big Data is the Commercial Supercomputing in the Age of Datafication

Modern commercial supercomputing in the age of Datafication is what we today call Big Data. I think a better term for it would be Data Supercomputing but the industry has already spoken so Big Data it is. The architecture shifted from environments that required massively-parallel compute-intensive number crunching to massively-parallel data-volume-intensive processing.

Read More >

HDFS Authentication Puzzle

HDFS authentication model changed in recent releases, but documentation is stale which can lead people into thinking HDFS is using very primitive authentication

Read More >

Hadoop FAQ – But What About the DBAs?

Do DBAs have a role to play with Hadoop clusters? If so, what is that role and what skills they need to get there. I provide the answers to these questions in this post.

Read More >

A First Foray Into Hadoop Territory

Before I dig into the mechanics under the hood of the Hadoop beastie (which is the part, I assume, that is going to be heady as hell), I thought it would be a good idea to play a little bit with some of its applications to give me a feel for the lay of the land.

Read More >
Page 2 of 212