Unifying a Tier-1 Financial Institution's Data Estate on the Databricks Lakehouse

3 min read
Mar 5, 2026 1:47:32 PM

Pythian consolidated legacy warehouses, governed petabytes of regulated data, and deployed production AI

A global financial services firm with $50B+ in assets under management ran a fragmented data environment: a 12-year-old Teradata warehouse, siloed Oracle databases, and an ungoverned Databricks deployment.

Pythian unified the estate onto a production-grade Lakehouse, implemented enterprise-wide Unity Catalog governance, and delivered three production ML models—all within 11 months.

38%

Reduction in total platform cost

$4.1M

Annual savings

5x

Faster regulatory reporting

Account profile

Industry: Financial Services

Organization scale: Global enterprise, $50B+ assets under management, 15,000+ employees across 30 countries

Tech stack: 

  • Databricks (Azure)Delta Lake & Unity Catalog
  • Photon & MLflow
  • Power BI & Apache Airflow
  • Teradata (legacy — decommissioned)
  • Oracle (legacy — partially decommissioned)

Built for yesterday's compliance — not tomorrow's AI

The institution's data infrastructure had evolved over more than a decade, accumulating layers of technical debt that made modernization difficult and expensive. Three core friction points threatened the firm's competitive position and regulatory standing.

Regulatory reporting lag

Multiple jurisdictions demanded same-day risk reporting. The Teradata warehouse required 72+ hours for consolidated risk-exposure reports. The firm's own risk committee flagged this as a material operational risk.

Three platforms, zero governance

Data lived across a 12-year-old Teradata appliance, Oracle transactional databases, and an ungoverned Databricks deployment. Over 4,000 stored procedures in BTEQ and PL/SQL encoded business logic no single team fully understood.

$10.8M in costs, zero production AI

Combined platform costs hit $10.8M annually. Databricks spend climbed 22 percent quarter-over-quarter from idle clusters and unoptimized Spark jobs. The data science team's ML models stayed trapped in notebooks — $30M+ in projected fraud-detection savings sat unrealized.

From three disconnected platforms to one governed Lakehouse

Pythian treated the data estate as a single system — not three separate problems. The engagement combined legacy-platform fluency with Databricks-native engineering to deliver an architecture built for compliance, cost discipline, and production AI.

Discovery phase

Pythian assessed 80+ Databricks workspaces for utilization and governance gaps, cataloged 4,000+ stored procedures, and mapped ETL dependency chains across 14 DataStage jobs. The output: a prioritized roadmap that sequenced regulatory reporting workloads first for early stakeholder wins.

Strategic architecture

A production-grade medallion architecture (Bronze → Silver → Gold) on Azure Databricks:

Bronze

Ingested raw transactional data via Lakeflow Connect.

Silver

Applied regulatory data quality rules using Delta Live Tables.

Gold

Served audit-ready datasets to Power BI and ML pipelines.

Implementation roadmap

Phase 2: Migration and architecture build (months 3–7)

Deployed medallion architecture on Azure Databricks. Converted BTEQ and PL/SQL to Spark SQL and PySpark (65 percent automated via BladeBridge, 35 percent manual). Replaced DataStage ETL with Delta Live Tables and Airflow pipelines. Migrated 2.4 PB with dual-validation — zero data loss. Unity Catalog deployed enterprise-wide. Photon enabled on priority workloads. ML model training initiated in parallel.

Phase 3: AI deployment and knowledge transfer (months 8–11)

Power BI connected to Gold layer — same-day regulatory reporting replaced the 72-hour batch cycle. Three production ML models deployed via MLflow: fraud detection, churn prediction, credit-risk scoring. MLOps infrastructure built. Teradata decommissioned, Oracle workloads consolidated. 24/7 managed services transition.

From legacy bottleneck to intelligence engine

The unified Databricks Lakehouse didn't just replace aging infrastructure — it fundamentally changed what the institution could do with its data. Regulatory reporting that once took days now completes in minutes. ML models that lived in notebooks now operate in production. The entire data estate is governed, auditable, and cost-controlled.

Three production AI systems in 11 months

Real-time fraud detection (sub-200ms), churn prediction, and automated credit-risk scoring. The fraud model flagged $8.2M in anomalous activity within 90 days — validating the $30M+ opportunity from the challenge assessment.

100% governance coverage

Unity Catalog now governs every data asset with full lineage tracking and automated compliance reporting across SOC 2, GDPR, and local financial regulations.

Databricks consulting services

Ready to solve your data challenges?

Speak with a Pythian Databricks expert ->
On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.