Unifying a Tier-1 Financial Institution's Data Estate on the Databricks Lakehouse
Pythian consolidated legacy warehouses, governed petabytes of regulated data, and deployed production AI
A global financial services firm with $50B+ in assets under management ran a fragmented data environment: a 12-year-old Teradata warehouse, siloed Oracle databases, and an ungoverned Databricks deployment.
Pythian unified the estate onto a production-grade Lakehouse, implemented enterprise-wide Unity Catalog governance, and delivered three production ML models—all within 11 months.
Reduction in total platform cost
Annual savings
Faster regulatory reporting
Account profile
Industry: Financial Services
Organization scale: Global enterprise, $50B+ assets under management, 15,000+ employees across 30 countries
Tech stack:
- Databricks (Azure)Delta Lake & Unity Catalog
- Photon & MLflow
- Power BI & Apache Airflow
- Teradata (legacy — decommissioned)
- Oracle (legacy — partially decommissioned)
Built for yesterday's compliance — not tomorrow's AI
The institution's data infrastructure had evolved over more than a decade, accumulating layers of technical debt that made modernization difficult and expensive. Three core friction points threatened the firm's competitive position and regulatory standing.
Regulatory reporting lag
Multiple jurisdictions demanded same-day risk reporting. The Teradata warehouse required 72+ hours for consolidated risk-exposure reports. The firm's own risk committee flagged this as a material operational risk.
Three platforms, zero governance
Data lived across a 12-year-old Teradata appliance, Oracle transactional databases, and an ungoverned Databricks deployment. Over 4,000 stored procedures in BTEQ and PL/SQL encoded business logic no single team fully understood.
$10.8M in costs, zero production AI
Combined platform costs hit $10.8M annually. Databricks spend climbed 22 percent quarter-over-quarter from idle clusters and unoptimized Spark jobs. The data science team's ML models stayed trapped in notebooks — $30M+ in projected fraud-detection savings sat unrealized.
From three disconnected platforms to one governed Lakehouse
Pythian treated the data estate as a single system — not three separate problems. The engagement combined legacy-platform fluency with Databricks-native engineering to deliver an architecture built for compliance, cost discipline, and production AI.
Discovery phase
Pythian assessed 80+ Databricks workspaces for utilization and governance gaps, cataloged 4,000+ stored procedures, and mapped ETL dependency chains across 14 DataStage jobs. The output: a prioritized roadmap that sequenced regulatory reporting workloads first for early stakeholder wins.
Strategic architecture
A production-grade medallion architecture (Bronze → Silver → Gold) on Azure Databricks:
Implementation roadmap
Phase 1: Assessment and FinOps (months 1–2)
Environment profiling across Databricks, Teradata, and Oracle. Immediate wins — idle cluster elimination, compute right-sizing, autoscaling guardrails — cutting Databricks spend 30 percent before migration began. Migration readiness scoring for 4,000+ procedures and 14 ETL jobs.
Phase 2: Migration and architecture build (months 3–7)
Deployed medallion architecture on Azure Databricks. Converted BTEQ and PL/SQL to Spark SQL and PySpark (65 percent automated via BladeBridge, 35 percent manual). Replaced DataStage ETL with Delta Live Tables and Airflow pipelines. Migrated 2.4 PB with dual-validation — zero data loss. Unity Catalog deployed enterprise-wide. Photon enabled on priority workloads. ML model training initiated in parallel.
Phase 3: AI deployment and knowledge transfer (months 8–11)
Power BI connected to Gold layer — same-day regulatory reporting replaced the 72-hour batch cycle. Three production ML models deployed via MLflow: fraud detection, churn prediction, credit-risk scoring. MLOps infrastructure built. Teradata decommissioned, Oracle workloads consolidated. 24/7 managed services transition.
From legacy bottleneck to intelligence engine
The unified Databricks Lakehouse didn't just replace aging infrastructure — it fundamentally changed what the institution could do with its data. Regulatory reporting that once took days now completes in minutes. ML models that lived in notebooks now operate in production. The entire data estate is governed, auditable, and cost-controlled.
38% reduction in total platform cost
$2.8M from Teradata decommissioning and Oracle reduction. $1.3M from Databricks FinOps. Total cost fell 38 percent year-over-year despite higher data volume and concurrency.
5x faster regulatory reporting
Risk-exposure reports that required 72+ hours now complete in under 45 minutes via Photon-enabled Databricks SQL. Data science query wait times dropped from hours to seconds, freeing 60 percent of team capacity for model development.
Three production AI systems in 11 months
Real-time fraud detection (sub-200ms), churn prediction, and automated credit-risk scoring. The fraud model flagged $8.2M in anomalous activity within 90 days — validating the $30M+ opportunity from the challenge assessment.
100% governance coverage
Unity Catalog now governs every data asset with full lineage tracking and automated compliance reporting across SOC 2, GDPR, and local financial regulations.
Databricks consulting services
Ready to solve your data challenges?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.

Modernizing Big Data Infrastructure from Hadoop for a Tier-1 Financial Institution

Modernizing from Greenplum for Mission-Critical Analytics for a Global Financial Institution

Modernizing a Legacy Oracle Exadata Data Warehouse for a Global Retailer
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.