How to choose the best data management option
As your data estate ramps up to a data empire, more and more of your users are vying for––and expecting––better access to this wealth of internal and external information. That’s why finding the right home for your data is so important. This blog post on how to choose the best data management option provides an overview of what options provide the best features for organizing and managing your data, and what it means to move from a legacy data system to a modern, cloud-native data platform.
Who can blame your users for wanting more, and more trustworthy, data at their fingertips? Data has become an essential business asset that helps companies analyze and understand the past, present and future of their organization (and beyond), letting them make more efficient and profitable decisions.
But making trustworthy data accessible to different kinds of users can be a major challenge.
That’s why many companies are turning to full enterprise data platforms. With automated processes to clean, organize and integrate data for both user self-service and machine learning, data platforms are fueling this critical need for insight.
So how is a data platform different from a data warehouse, or a data lake, or a data mart? And do you have to say goodbye to all of this legacy data storage when you move to a platform?
The short answer is ‘no.’ But to explain, we should sort out the data management options that are available to empower every type of data user––from the dabbler to the power user
Enterprise data platform
A cloud-native enterprise data platform is the new model for organizing and enabling access to data.
Cloud data platforms offer a single, unified repository for both relational and non-relational data. Users get accurate data and fast, high-performance analytics for many different use cases, both exploration and targeted query. Data platforms can group together a data lake, a data warehouse and data marts. So this option doesn’t force you to choose. You get the best of all the options to enable access to data by many users and systems for a full range of use cases, all with the scalability, flexibility and future-readiness of the cloud.
On-prem or cloud? Gartner is predicting the cloud market to top $380 billion by 2020. While some feel they are not ready to take their data platform off premises, many experts cite the ability to affordably plug into more sophisticated security and performance as one of the top reasons for moving to the cloud.
Pro Tip: A gradual transition can help you start taking advantage of the improvements in security and performance available at lower costs
Warehouses have historically been the central repository for data needed to track business performance within an organization. More expensive to create and maintain, warehouses should only store the governed data you have decided is the most important to your organization. Traditional warehouses are quickly being upgraded to modern ones, and here’s why:
In a traditional warehouse, storage and computing are tied together. Extraction, transformation and load are done by one monolithic ETL service. But since you normally only process about 10 percent of your data, you’re overpaying for computing.
Modern warehouses separate storage and computing so each can scale as needed, saving you costs. ETL is also separated so you can choose, optimize and change the technology for each process without affecting the others. Choosing a modern data warehouse on the cloud is not a matter of rejecting the enterprise data platform as there’s a clear migration path from one to the other. In fact, deploying a modern data warehouse is a major step toward building a cloud-based enterprise data platform.
Pro Tip: Modernizing your data warehouse can save you a bundle. Your organizational needs will help you decide whether to use a rip-and-replace or a phased approach.
Data lakes are large data sets containing structured, unstructured, semi-structured, images, streaming video, audio––you name it. You can pour it all in quickly without worrying about applying robust data governance. Data scientists and other power users can just swim around and find stuff. They’re happy because they get access to the full load straight away, but you’re not making them any promises about how well the data is organized or governed.
While the first data lakes were built on-premises based on the Hadoop framework, they never delivered on their promise of enabling self-service analytics. Today, data lakes have evolved to become part of a larger cloud data platform that can be used for data science exploration, integrating seamlessly with cloud data warehouses and cloud-based analytics and self-service tools.
Pro Tip: Use the lake as a staging area for tracking use and prioritizing the most important data for governance and inclusion in your data warehouse.
You guessed it––a data swamp is an overloaded and un-refreshed data lake that has gone stagnant. In effect, it bogs down users and stalls out progress.
Pro Tip: Refresh your data lake often to keep it from getting stale and becoming swampy
A subset of your data warehouse, a mart contains curated data geared for specific business lines or departments such as marketing, finance or HR. These audiences develop, manipulate and own their marts, which makes it fast and easy for them to access the insight they need.
Pro Tip: To prevent your data marts from becoming siloed, ensure you add new, relevant data sources as they become available in the data warehouse, and automate refreshes to ensure users get real-time data.
By integrating your current data containers into Enterprise Data Platforms, you can serve up the insight your users want and need. Cloud solutions let you serve users more securely at lower cost.
Let Pythian help you develop a migration plan to a data platform that works best for you. Read our e-book 10 signs that it’s time to modernize your data warehouse to determine if you’re experiencing any of the challenges associated with an outdated data warehouse.