Big Data

Deploying Cloudera Impala on EC2 with Example Live Demo

Pythian Big Data Impala Implementation

A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance.

COLLABORATE 13 à la Pythian …

As per many previous IOUG/OAUG/Quest shows, Pythian is will be in Denver next week! It was a sunny day in the fall of 1991 when I gave my first paper at International Oracle User Week (IOUW), a pre-cursor to COLLABORATE and a few earlier incarnations called IOUG-Live and IOUG-Alive! It has been a whirlwind of…

Using Ansible to Secure Cloudera Manager Installation on a Hadoop Cluster

Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts.

Big Data is the Commercial Supercomputing in the Age of Datafication

Modern commercial supercomputing in the age of Datafication is what we today call Big Data. I think a better term for it would be Data Supercomputing but the industry has already spoken so Big Data it is. The architecture shifted from environments that required massively-parallel compute-intensive number crunching to massively-parallel data-volume-intensive processing.

Love Your MongoDB

mongodb

Pythian now officially supports MongoDB both as On-Demand and Managed Services offerings. Including monitoring, capacity planning, tuning and troubleshooting

Page 1 of 3123