Data Journeys¶
There are two core concepts in DataKitchen's DataOps Observability: the data estate and the data journey.
Data journey¶
A data journey represents a collection of components in your data estate responsible for creating a data analytic deliverable. Often, the components that make up a data journey are only loosely related, can even be spread across different orchestration, storage, or infrastructure tools, and may be managed by different teams or team members. A journey might not actually exist in practice but as an abstraction of components that have implied relationships.
Observability lets you create a holistic, end-to-end view of the relationships and dependencies that make up each journey in your data estate.
Data journey example¶
Every week, a department creates Deliverable A. Deliverable A is made through a sequence of different components.
An Ingestion Team is responsible for collecting source data from different vendors and processing it for the downstream team to use. The next team works out of two pipeline tools: the first pipeline is fed by a continuous stream of incoming records, while the second has finite, batch runs.
In this case, the data journey would include a dataset, streaming pipeline, batch pipeline, and most likely a server to facilitate the work.
With Observability, you can:
- Recreate this journey in its entirety, including the dependencies that link each component, so you can see everything in one place.
- Monitor alerts and statuses generated as Observability verifies the events in your data estate.
- Define rules so Observability can notify you of events you mark important.
- Ensure Deliverable A reaches stakeholders successfully every time.
Instances¶
When a single batch pipeline completes an operation from start to finish, that's considered one run of the batch pipeline.
When a journey completes the end-to-end process of creating a data analytic asset, that's considered to be one instance of that journey. An instance represents a runtime execution of a journey.
In the example above, each time Deliverable A is created, the journey has undergone an instance.
Because you design journeys to reflect your data estate, you also have the ability to define the frame of reference for journey instances. For example, you can define instances based on a schedule or certain batch pipeline events.
Observability instances help you capture real-world situations that are an extension of the individual work being done across your data estate.