Big Data

"The key to problem solving is to never stop listening. You have to be open-minded. Alert to the full range of possibilities, and at the same time very focused. The goal is to arrive at the right solution as quickly as possible; right for the system and right for the client."
— Alex Gorbachev, Chief Technology Officer, Pythian

Each individual on Pythian’s big data team brings passion, insight, and knowledge. And as a team, that collective wisdom and vision have put Pythian at the forefront of the emerging big data market.  Our top-calibre team comprises sought-after speakers, published authors, and frequent bloggers, who’ve never met a challenge they couldn’t solve.

Pythian has acted as trusted advisors to clients with sophisticated big data teams. We’ve filled knowledge gaps in our clients’ existing teams. We’ve been their team. Working with multiple clients—each with customized data systems—gives us a unique perspective that translates into deep skills in all aspects of big data. From defining the strategy, to choosing a platform and architecting it, to tuning and monitoring it in production, we produce results tailored to your environment and business objectives.Our team includes:

  • Certified Hadoop Administrators and Developers
  • One of the world’s thirteen Cloudera Champions of Big Data
  • Big Data Solution Architects
  • Data Scientists
  • Big Data Platform Engineers


We assess your requirements, turn them into detailed architectures, and then take them from proof of concept, to pilot, to production. No surprises. You’ll know exactly what’s involved each step of the way.


We build and deploy production-ready platforms, flawlessly, so you can begin collecting and analyzing useful insights from your data—immediately.


Our team has the specialized knowledge required to get the best performance from your big data technologies, and ensure maximum efficiency without having to add costly hardware.


We ensure that all components of your big data platform can safely tolerate a failure without affecting the platform as a whole, so your business always has access to its data.


Our big data experts engineer fully secure deployments that significantly improve out-of-the-box security and comply with even the most stringent standards.


We combine new technologies with decades of design principle experience to predict the resources that your big data platform will require, so your business is prepared to handle increasing volumes of data.


Upgrades bring improved stability, performance, and security, but because big data technologies are constantly evolving, upgrades are usually more frequent and less straightforward than traditional database upgrades. …, poorly documented, and require a unique action plan for each situation.


We build on decades of experience using traditional data systems and adapt that knowledge to the unstructured world of big data. Whether your focus is on ingestion speed, query scalability, data quality, or anything else, we can construct the most appropriate data models  for your organization.


We develop scalable and efficient data ingestion pipelines by integrating various data sources into your big data platform. Our extensive expertise and deep knowledge means we can execute this critical step faster and more efficiently.


We build robust events processing pipelines by using the latest technology, giving your enterprise the unique ability to process data in almost real-time so your business can make better decisions faster.


We recommend business intelligence tools that fit your specific needs and use-cases, and integrate them into your big data platform—allowing your users to easily access and analyze data.


We select and deploy appropriate monitoring practices and efficient systems management tools, giving your enterprise another tool in its arsenal to ensure high availability, performance and security.



Below is a partial list of the technologies we work with.

  • Hadoop Distributions – Cloudera, MapR, Hortonworks
  • Hadoop Ecosystem – Apache Hive, Apache Pig, Apache HBase, Apache Oozie, Azkaban, Apache Mahout, Apache ZooKeeper, Apache Spark and more
  • Hadoop Security – Kerberos, Apache LDAP, Active Directory, encryption
  • Cloudera Technologies – Cloudera Impala, Cloudera Search,  Apache Sentry, Cloudera Manager
  • BI Tools / Visualization – Platfora, Tableau Software
  • NoSQL— Hbase, Apache Cassandra, MongoDB, Couchbase
  • Data Ingestion – Apache Kafka, Apache Flume, Apache Sqoop
  • Complex Event Processing – Apache Storm, Spark Streaming
  • Search Engines – Apache Solr, Elastic Search


Hadoop in the Cloud

Watch this video from Pythian CTO, Alex Gorbachev, as he shares some of his recommendations about using Hadoop in the Cloud.

Learn More

The Hadoop and the Hare

Can data get into the warehouse in real-time? Can we record everything the user does on the site in real-time? Real-time is a magic phrase. Learn more about the drive for real-time and its real value to business.

Learn More

Advanced Hadoop Security Features

In this video, Pythian CTO, Alex Gorbachev, provides an overview of the advanced security features within Hadoop.

Learn More