Big Data on Microsoft Azure – HDInsight
Introduction
HDInsight
Since 2013, Microsoft has been helping their customers achieve the best of the Big Data ecosystem. With their partnership with Hortonworks distributor, they expanded their capabilities and were able to enrich their solutions on the Big Data spectrum. HDInsight is a fully managed, open-source analytics service for enterprises that want to use the Hadoop technology stack to solve and tackle Big Data problems. The platform offers a unique set of products that are entirely managed by Microsoft Azure. In a nutshell, Azure HDInsight is a cloud distribution of Hadoop components from the Hortonworks Data Platform – HDP, which makes it easy, fast and cost-effective to process a massive amount of data in a hyper-scale environment. There are several reasons why companies are looking for managed Big Data solutions nowadays. Mainly because of the low-cost and scalable possibility, security and compliance, monitoring, productivity, extensibility, as well as the most important reason: the global availability of the selected products.
Cluster types
HDInsight offers different cluster types to address different issues that you may struggle with in your business. They have an hourly-based approach to billing and in a decoupled architecture. That means you can process the data you want and afterwards destroy the cluster, saving the data inside of the Azure Blob Storage or Azure Data Lake Store. The data will remain there without being removed or changed once the process is over. Most of the companies that use the HDInsight flavor adopt this approach to achieve blazing fast performance and at the same time, reduce their costs with the infrastructure. In an on-premises environment, we are not allowed to turn off the computing part, since the HDFS and the processing area are coupled by using a PaaS (Platform-as-a-Services) solution. This solution makes it easy to work around this and also gives you endless possibilities to use a set of tools to help you to manage, orchestrate and monitor the entire data workflow. HDInsight offers the following cluster types: - Apache Hadoop - Apache Spark - Apache HBase - R Server - Apache Storm - Apache Interactive Query (Hive 2.0) - Apache Kafka * HDInsight is the only PaaS platform that offers this amount of fully-managed cluster types in a cloud environment.
Common scenarios by cluster type
In this section, we are going to walk through the cluster types and review the best-fit solution as well the everyday-use cases scenarios for them.- Apache Hadoop
- Apache Spark
- Apache HBase
- R Server
- Apache Storm
- Interactive Query (Hive 2.0)
- Apache Kafka
Learn more about Pythian's services and solutions for Microsoft Azure.
On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Oracle E-Business Suite: Virtual host names
Oracle E-Business Suite: Virtual host names
Oct 6, 2015 12:00:00 AM
1
min read
Conference review Percona Live Santa Clara 2018
Conference review Percona Live Santa Clara 2018
May 9, 2018 12:00:00 AM
2
min read
How to use createGoldImage For Cloning in 19c
How to use createGoldImage For Cloning in 19c
Oct 31, 2019 12:00:00 AM
5
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.