Increase your data visualization/reporting velocity and performance with proper semantic layers
We have all seen it before. You finally get access to new data sources and want to extract insight from that data. You open your favorite reporting or data visualization tool, connect and start updating the tool’s semantic layer. The most common items that get done there are:
- Organizing attributes and measures into folders
- Renaming attributes and measures in readable business terms
- Creating hierarchies
- Defining number format and default aggregations for measures
- Creating computed fields for missing measures and attributes. For example:
- Profit := Revenue - Cost
- Region: = if State in (NY,...) then “East” elseif State in (CA,...) then “West” else “Unknown”
- Development velocity: It’s much faster to create your data insights by dragging attributes and measures versus defining them. Getting the formulas just right can be very challenging if you don’t understand the nuances in your data.
- Performance: It’s much faster to have all the measures and attributes pre-calculated and stored in the data engine. There is no math involved and retrieving data from modern data stores is very cost effective these days. Combinations of filters, joins and complex formulas can generate poor-performing queries.
- Accuracy/Consistency: No matter which tool is used, the formulas are consistent and built by the people who understand the nuances of the underlying data. Attribute data is consistent across reporting/data visualization tools. There is nothing worse than having two people for on data and presenting numbers that don’t add up with each other. You end up spending lots of time figuring out which one is correct rather than making business decisions.
- Scalability: Multiple people using multiple tools don’t have to recreate computed fields over and over again. A change to a formula doesn’t necessitate updates to every tool’s semantic layer. Having the atomic math done as part of the data pipeline allows multiple tools to be used more efficiently and is easier to govern.
Share this
You May Also Like
These Related Stories
Investigating IO performance on Amazon EC2
Investigating IO performance on Amazon EC2
Dec 30, 2016
7
min read
Data encryption at rest in Oracle MySQL 5.7
Data encryption at rest in Oracle MySQL 5.7
Apr 20, 2016
5
min read
Explaining Azure Data Factory Mapping Data Flows – SQL On The Edge Episode 20
Explaining Azure Data Factory Mapping Data Flows – SQL On The Edge Episode 20
Jun 18, 2019
2
min read
No Comments Yet
Let us know what you think