Data Governance in Legacy Environments
In our previous post we discussed the governance requirements for creating, managing and deploying analytical models. Analytical models do not stand on their own in today’s complex data landscapes. They have unique needs that must be captured in policy, automated through ML Ops platforms and regularly reviewed and updated to account for changing technology capabilities and market conditions.
Modern data and analytics governance is about lightweight application of policy in reproducible methods through automation. Instrumentation is part of our data pipelines, not secondary actions to inventory and report on compliance with policies and regulatory requirements.
One of the most difficult challenges faced by data governance teams is often the governance of data that is stored and processed in legacy technology platforms. Legacy environments often hold the most critical and highest risk data for organizations. Financial services, insurance and manufacturing organizations that have been around for generations often still have heavy reliance on mainframes or legacy UNIX systems for core processing and large compliance demands that drive data governance needs.
Legacy can take many meanings depending on the organization, their level of technology adoption and plans for modernization activities. From a data governance perspective legacy systems are those that create added complexity, time or cost when implementing and automating controls around the access, retention, inventory or quality measurement of data. Legacy systems will often be under limited levels of vendor support, receiving security patches but no major updates to functional capabilities.
With the limited ability to integrate with modern data governance tools and limited visibility into system processes, legacy platforms create a risk point in the organization that must be managed through alternative means to ensure policies are implemented on the data they store and process. The first stopgap is often manual processes to review logs, create access control rules and remove data to be compliant with data governance policies. While this can meet most governance needs, it can be cumbersome and error prone at scale.
Other effective methods for incorporating legacy platforms into modern data governance programs include:
Automation of Human Reviews and Checks – While comprehensive governance capabilities are lacking on many legacy platforms, their ability to automate simpler tasks like log review and verification, as well as access verification and service availability provide wins for governance and visibility. Checks should include automated periodic reviews of who has access to what data, if they have a need to make the access and if any error conditions were found for attempts to access data where the request was denied.
Modernization Programs – In addition to governance deficiencies, legacy platforms often present risk through lack of market-available skills for operations, lack of upgrade paths for added functionality and high costs for operations and support. While modernization of these systems does not occur overnight, long term sustained efforts to retire these systems will provide the largest benefits for governance programs by allowing the introduction of modern platforms and integration techniques.
Accept Risk, Decrease Obligations – Many times organizations will accept added layers of risk for legacy systems by changing policies for data retention, data obfuscation and times between access approvals. While these should be temporary and used to minimize the operational burden on policy and enforcement verification, they can push other parts of the organization to speed up modernization activities to eliminate the added risk.
Modern Services Fronting Legacy Platforms – Many organizations will build micro-services to sit in front of legacy platforms, minimizing the number of users and systems directly accessing them. This approach to fronting legacy systems with modern architectural patterns and technology can provide governance benefits, performance benefits and enable separation of core functions for later migration away from legacy systems to target state of the art platforms.
Optimal solutions for most organizations will be a combination of the items above. The right combination of solutions will depend on data risk, platform capabilities, operational team sizes and ability to absorb manual process overhead. The ultimate objective is to ensure compliance with policies and regulatory obligations at the time the data is read, transformed, stored or analyzed. Approaches must then include the ability to report on consumption of data on these legacy platforms.
In our next discussion we will explore data monetization and the necessary policies to govern data that is consumed by outside organizations. These data products take on added support needs, quality standards and rights of reuse that are key to defining our policies, architecting our systems and modeling our datasets.