Batch Pipeline¶

A batch pipeline is a type of Observability component that represents a series of finite processes running in a data estate.

Observability pipelines help to group analytic processes, define runs for monitoring and evaluation, and typically represent tasks that are expected to execute when delivering a data asset.

Pipeline types¶

In Observability, there are two types of pipeline components:

Batch pipeline: this component represents a finite, batch process with recurring runs. For example, an Airflow DAG or a DataKitchen DataOps Automation recipe variation.
Streaming pipeline: this component represents an infinitely-running, event-based workflow. For example, a real-time Apache Kafka process.

Batch pipeline configuration¶

You define both the scope (the extent of activity that represents a single end-to-end run) and the individual tasks that make up the batch pipelines in your data estate.

Batch pipeline configurations:

Always include events, which are moments of interest sent from your data estate and captured by Observability.
Can include tasks, which represent the steps expected for each run of that batch pipeline.
May have rules set up to notify you of changing situations. Rules can be defined when the batch pipeline is used as a component in a journey.

Note

Batch pipeline versus run: While a batch pipeline reflects what you expect to happen in a process, a run is what actually happens during the execution of the component. See Runs for more information.

Create a new batch pipeline¶

Batch pipelines are automatically generated based on event information received by the Event Ingestion API.

Observability creates a new batch pipeline any time the system receives an event whose pipeline_key does not match keys for existing batch pipelines in the project.

You can also manually create batch pipelines in the UI. Follow the instructions outlined in Create Components.