Breaking down data silos: BI’s broken data integration promise
This is the first in a series of four posts on the breaking down data silos to gain more complete, accurate business insights. When setting up a data program for your organization, it can make sense to start by adopting one of the many available out-of-the-box BI or visualization tools. After all, these tools are supposed to be relatively easy to integrate into your organization. Just sign up, connect the tool to your data sources (on-prem or cloud service of choice), add as many users in your organization as makes sense, and then kick back and watch the insights roll in. But while it’s true that out-of-the-box, self-service BI tools offer slick user interfaces and powerful visualizations, they often tout themselves as one-size-fits-all solutions to nearly everything, including getting insights from multisource data, governance, data prep and data access. The reality is there is a lot of work that has to happen behind the scenes to get data ready for analysis and visualization - particularly if you’re trying to get insights from multiple, disparate data sources. BI tools are, of course, very useful in many cases. But using only out-of-the-box software to try to scale an enterprise-level data program may not solve the most common barrier to getting complete insights -- data silos. At best, these tools will present a single data set in a cleaner, easier to digest visual format. And at worst, they can result in analysis built on bad data, which inevitably begets the most unfortunate outcome of all: business decisions based on inaccurate or incomplete insights. So what good are these tools if they can’t solve the most common data-related issues? The short answer is that they are very valuable. The problem lies not in the tools themselves, but in the fact that companies adopt them without a full understanding of what it takes to get data prepared for analysis and visualization. They don’t always account for complex processes like cleaning, unifying and integrating data for consumption by end-users and their tools. But this is something savvy data users understand all too well. In fact, a majority of data scientists agree that the most time-consuming element of analysis isn’t actually analysis at all. It’s actually cleaning the data so it can be used in the first place.
What is data integration?Data integration is a critical component of the data preparation process, particularly when the data is coming from multiple sources. The preparation process includes data integration, cleansing, formatting and organizing. It also involves validation for accuracy and consistency so the data can be analyzed using business intelligence and visualization software, or can be used as input for systems like decision support. The data preparation process also focuses on business user requirements, improving data quality and transforming data into a format that meets user needs. At its core, data integration is the combining of data from different sources, so users see a unified view of all relevant data (despite being different types, or residing in different places, or generated within various departments or business units) instead of just one source or type. Because most business units or departments are generally prone to walling off their data due to structural, political, or other reasons, organizations must be proactive to not fall into the data silo trap.
Data silosData silos are data sets segregated from the rest of the enterprise. They are essentially islands of data unto themselves, and their existence makes it difficult and expensive to analyze for trends and insights across an organization as a whole. Data silos are also prone to containing duplicate or conflicting data and can lead to bad analysis and false conclusions. There are several data integration benefits that organizations should keep in mind:
- It’s best to speak the same language: managing the creation of datasets, data catalogs and definitions of data within your organization are best managed in a consistent, governed, repeatable and shareable way. Reinventing the wheel every time analysis is required is a recipe for disaster.
- All data types have their strengths: When combined effectively, various data types and from different sources are able to paint a much more coherent picture than pulling siloed data from your HR or sales departments and viewing in a vacuum.
- It allows for unified data governance levels: Using data governance maturity models and integrating data from all sources, organizations can holistically determine which level of data governance is right for them. This is much more difficult when data is siloed in various departments.