Defining “real-time’ or ‘streaming’ analytics and its use cases
This is the first in a series on real-time or streaming data analytics best practices, including terminology, system design and examples of real-world success. In this initial post, we’ll focus on terminology and possible use cases. When launching a multi-part series on real-time and streaming analytics in a modern data platform, it’s always best to make sure everyone’s on the same page. So let’s begin with some definitions and use cases. The terms “real-time” or “streaming”, for example, may seem obvious. But they can actually mean different things to different people within the context of a cloud data platform. Adding to the potential for confusion, these terms are also relevant in two different areas of a layered data platform – the ingestion layer and the processing layer:
- Real-time or streaming ingestion takes place via pipelines that stream data, one message at a time, from a source into data storage, the data warehouse, or both.
- Real-time or streaming processing typically refers to straightforward data transformations applied to streaming data, such as converting a date field from one format to another, or more complex data cleanup like enforcing a consistent address field format.
- Real-time or streaming data analytics is usually reserved for the application of complex computations on streaming data, such as calculating the probability of a certain event happening based on previous events.