Oracle AI Data Platform (AIDP): No-Nonsense Platform Overview
Oracle’s AI Data Platform (AIDP) is essentially an attempt to fix the biggest annoyance ...

Oracle has recently released a new service within OCI called AI Data Platform Workbench, or AIDPW for short.

Navigating the complexities of data engineering releases can cause endless sleepless nights for even the most seasoned ...

Don't waste time importing simple JSON files into a database just to run a quick query. With Python's list ...
With the extensive adoption of Elasticsearch as a search and analytics engine, more often we build data pipelines that ...
Consider the following situation: You have a data ingestion pipeline where the data comes in real-time on weekdays and ...
Memoization is a powerful technique that allows you to improve performance of repeatable computations. Although it ...
Data warehouse or data lake? We break down the pros and cons of each In the book Designing Cloud Data Platforms, ...
Some time ago there was a car ad with the slogan "Don't use it, abuse it", meaning that no matter what you do to the ...
The England and Wales Cricket Board governs every aspect of the sport in those two countries, and it holds massive ...
“The transformation of the superstructure, which takes place far more slowly than that of the substructure, has taken ...
Intro In this blog post, I would like to share some options that you can consider to model your cloud DW for better ...
Although I haven't produced a lot of posts, those I have produced were always strictly related to the Oracle RDBMS as a ...

Apache Kafka and Apache Flink are popular data streaming applications platforms. However, provisioning and managing ...

Our previous post focused on ‘lightweight governance’ – enabling engineering and product teams across an organization ...
Introduction In our last blog article on data management and the DART methodology, we discussed the importance of ...

In this post, I’ll share a quick start guide on Google Cloud Platform’s (GCP) Cloud Fusion. We’ll first take a look at ...
In our previous discussion, we explored the role of data stewards and their vital function for data governance ...
Raw incoming data needs to go through a series of data preparation steps before it can be used for analysis. These ...

Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in ...

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines. It’s a ...
I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution ...

Apache Kafka and Apache Flink are popular platforms for data streaming applications. However, provisioning and managing ...
This post is part two of describing (near) real-time data processing for BigQuery. In this post, I will use Dataform to ...

Most corporations have huge amounts of data in RDBMS (relational database management system). When considering a RDBMS ...
Apache Beam is an SDK (software development kit) available for Java, Python, and Go that allows for a streamlined ETL ...

Architecture diagram For effective monitoring of ADF pipelines, we are going to use Log Analytics, Azure Monitor and ...

Secure data ingestion is an essential component of cloud modernization—and an essential next step in the broader effort ...

Bridging Analytical Models and Modern Governance In our previous post we discussed the governance requirements for ...
What, Snowflake? Yes, Snowflake. While my core skills are based on the Oracle database, lately I’ve been working more ...

What is Apache Airflow? Airflow is a platform to programmatically author (designing pipelines, creating workflows), ...