Batch Pipeline¶
A batch pipeline is a type of Observability component that represents a series of finite processes running in a data estate.
Observability pipelines help to group analytic processes, define runs for monitoring and evaluation, and typically represent tasks that are expected to execute when delivering a data asset.
Pipeline types¶
In Observability, there are two types of pipeline components:
- Batch pipeline: this component represents a finite, batch process with recurring runs. For example, an Airflow DAG or a DataKitchen DataOps Automation recipe variation.
- Streaming pipeline: this component represents an infinitely-running, event-based workflow. For example, a real-time Apache Kafka process.
Batch pipeline configuration¶
You define both the scope (the extent of activity that represents a single end-to-end run) and the individual tasks that make up the batch pipelines in your data estate.
Batch pipeline configurations:
- Always include events, which are moments of interest sent from your data estate and captured by Observability.
- Can include tasks, which represent the steps expected for each run of that batch pipeline.
- May have rules set up to notify you of changing situations. Rules can be defined when the batch pipeline is used as a component in a journey.
Note
Batch pipeline versus run: While a batch pipeline reflects what you expect to happen in a process, a run is what actually happens during the execution of the component. See Runs for more information.
Create a new batch pipeline¶
Batch pipelines are automatically generated based on event information received by the Event Ingestion API.
Observability creates a new batch pipeline any time
the system receives an event whose pipeline_key does not match keys for existing batch pipelines in the project.
You can also manually create batch pipelines in the UI. Follow the instructions outlined in Create Components.