Global Analytics with Azure Cosmos Db and Synapse Analytics - SQL On The Edge Episode 21
Global Analytics
There are two other pieces to complete the puzzle. First is a new Cosmos Db capability Microsoft has released called the "Cosmos Db Analytical Store". This feature takes the data coming live from Cosmos Db and stores it in Parquet format which is compressed, columnar, and optimized for analytics. This is a simple change that only requires enabling the option (for now just for new containers but this will likely change once the preview phase has completed). After this is enabled, the process and the storage are all automatically managed by Azure and you can simply focus on building your business solution. The final piece is a new Synapse capability called Synapse Link. Cosmos Db is the first data store to support Synapse Link but it is planned to have other stores like Azure SQL Db or Azure PostgreSQL/MySQL to support it as well. Synapse Link allows transparent querying of an external database from a Synapse workspace without the need to code any sort of ETL. In the case of Cosmos Db, because it uses the "Analytical Store" it does not impact the performance of the operational Cosmos DB databases at all, the compute is all spent on the Synapse side. Implementation details for other data stores like Azure SQL Db are not known at this time but it would be great if Microsoft stays with this pattern of not impacting the operational stores resources. So now with all the pieces, let's think about Global Analytics. I can deploy a Cosmos Db database in West US, another one in Germany, and another one in Japan. I can set them to be Multi-Master so my local sites in those three geographies will have the lowest latency for their local data as well as transparent access to the data from the other sites. Then, I create a Synapse Analytics workspace, say in West US (if that's where HQ might be located) and the Cosmos DB in West US has the Analytical Store activated. Now, this is where the magic happens. Because I have Multi-Master enabled between West US, Germany, and Japan this means that the Analytical Store in my West US copy will contain the data from all 3 global sites! And then my Synapse workspace simply connects to this Analytical Store and can query this data transparently without any dependency or worry of where in the world the data actually originated! And this was all achieved through service configuration, no actual coding.Architecture
Here is a screenshot of how this can be architected in Azure:
Cosmos Db databases here are used for operational data from sites around the globe and Spark on Synapse is used for analytics in West US. Note that I only put one copy of Synapse to illustrate the point but there is no technical limitation here. If I wanted to I could add Synapse in all 3 sites and provide the lowest latency experience where all my users are. Cost is the limiting factor here. And once the data is available as a Spark table then you can use whatever you want to query it or visualize it. Power BI, Tableau, Looker, etc. As you can see, all of this is Platform as a Service, no piece of actual infrastructure has to be managed by the client. This not only means the barrier of entry for this type of solution is low but it also means that this type of capability is achievable and affordable by all kinds of small and medium businesses. Let's just think of 5 years ago and the thought of Global Analytics being widely available would have been unheard of! In case you can't tell already, I'm incredibly excited by how this technology has evolved.
Demo
Alright, with the main solution covered, let's check out the demo where I will walk through how easy it is to perform Global Analytics with the power of these two Azure services. Cheers! SQL On The Edge Episode 21Share this
You May Also Like
These Related Stories
Infographic: five ways to make analytics for everyone
Infographic: five ways to make analytics for everyone
Apr 17, 2018 12:00:00 AM
1
min read
Data and analytics make the world go round

Data and analytics make the world go round
Jun 21, 2022 12:00:00 AM
2
min read
The Four Stages of Analytics Maturity: Part Three
The Four Stages of Analytics Maturity: Part Three
Jun 29, 2021 12:00:00 AM
8
min read
No Comments Yet
Let us know what you think