Skip to content

Insight and analysis of technology and business strategy

Metadata-Driven Data Governance: the How and Why

In our previous discussion, we explored the role of data stewards and their vital function for data governance programs. They’re the champions that identify data quality shortfalls and work with business partners to improve data quality. Data stewards are the subject matter experts in specific data sets, their value to the organization, and how data is consumed.


Maximizing the impact of our data stewards requires technical capabilities that store data, descriptions about our data sets, and enable programmatic measurement and monitoring to ensure compliance with corporate policies and data use objectives. Metadata enables this programmatic implementation of our data governance policies. Metadata has value through the ability to describe our complex data sets, create a business layer on integrated data, and identify how data is being consumed by teams across the organization. Metadata comes in many forms, including business descriptions of data, usage information about a given data sets, geographic information about where data was created, the source of third party data sets, and descriptions of quality minimums for a dataset.

Effectively leveraging our metadata requires three actions:

  • Governing: Our first step is to define our policies for metadata. This includes what data points we capture and store, the systems responsible for storage of metadata, and the associated data sets and the approved corporate uses of the data.
  • Finding: Once we have identified the types of metadata we would like to govern and the controls for accessing the data, we must locate it across the organization. This often takes dual tracks of human-centric processes for analysis and inventory coupled with programmatic approaches for metadata discovery.
  • Managing: Once we have identified our existing metadata and identified future metadata for creation and storage, we can implement programmatic processes to use this metadata to drive key business processes for data integration, measurement of business impact, and better risk management of our data assets.

Metadata unlocks a variety of values in our data ecosystems. The ability to automate key business processes for modeling future behavior and evolving models as the physical world represents evolves. The most common implementation of this use of metadata to bridge the real world and virtual worlds divide is digital twins for use in supply chains, manufacturing and civil engineering.  This metadata driven automation allows for rapid adjustments to processes as physical manifestations and users change behavior.

A growing use of metadata is the programmatic capture of who is consuming our data sets and how. This measurement enables product management and data governance teams deeper visibility into data usage across the organization and influences future third-party data investment, future risk management strategies, and technology investments. This measurement allows product managers to determine if the cost of producing a given report is worth the investment or if those dollars can be better invested in other data products and consumption models. Product managers can further explore the teams across the organization and the data they most often consume and use to further enhance data products for the most impactful power users.

Another common use of metadata is the detailed description of data sets and their contained data elements for the calculation of the risk posture of the organization. This use is dominant in organizations that generate and store sensitive data about individuals, commonly consumer facing, financial services or healthcare organizations that have deep details about consumer behavior and storage large quantities of private data. This use of metadata enables organizations to be very focused on the protection of sensitive and regulated data. Data governance teams can identify the cost of compliance failures against the value of open data access, while simulating the potential risks for more or less restrictive data access policies. Automated control of processes including data access, data product creation, access auditing and automated checks for data quality and completeness are additional methods for the programmatic consumption of metadata to drive and measure data transformation through the enterprise.

This concept of programmatic implementation of controls, data quality, risk management and bridging the real and physical world creates new technology opportunities for innovation. DataHub is one innovation in this space, facilitating the centralized storage of metadata to drive federated and automated processes for complex data environments.

Metadata becomes a connective glue in modern data architectures. Allowing different technologies to have common layers of reference for the automation of processes and access. Metadata enables data consumers to have access to diverse data across multiple different integration points in a unified way. Centralized metadata storage through common data catalogs and feature stores ensures a corporate wide view into data, complexity, risk and value and enables separate engineering teams to implement their preferred technologies while maintaining common reference points centrally.

Join us for our next data governance conversation as we explore data lineage and its value to organizations to produce diverse data products, meet regulatory obligations and simplify complex data transformation pipelines. Don’t forget to sign up  so you don’t miss it.


Pythian Blogs

  • There are no suggestions because the search field is empty.

Tell us how we can help!