Skip to content

Batch Pipeline

A batch pipeline is a type of Observability component that represents a series of finite processes running in a data estate.

Observability pipelines help to group analytic processes, define runs for monitoring and evaluation, and typically represent tasks that are expected to execute when delivering a data asset.

Pipeline types

In Observability, there are two types of pipeline components:

  • Batch pipeline: this component represents a finite, batch process with recurring runs. For example, an Airflow DAG or a DataKitchen DataOps Automation recipe variation.
  • Streaming pipeline: this component represents an infinitely-running, event-based workflow. For example, a real-time Apache Kafka process.

Batch pipeline configuration

You define both the scope (the extent of activity that represents a single end-to-end run) and the individual tasks that make up the batch pipelines in your data estate.

Batch pipeline configurations:

  • Always include events, which are moments of interest sent from your data estate and captured by Observability.
  • Can include tasks, which represent the steps expected for each run of that batch pipeline.
  • May have rules set up to notify you of changing situations. Rules can be defined when the batch pipeline is used as a component in a journey.

Note

Batch pipeline versus run: While a batch pipeline reflects what you expect to happen in a process, a run is what actually happens during the execution of the component. See Runs for more information.

Create a new batch pipeline

Batch pipelines are automatically generated based on event information received by the Event Ingestion API.

Observability creates a new batch pipeline any time the system receives an event whose pipeline_key does not match keys for existing batch pipelines in the project.

You can also manually create batch pipelines in the UI. Follow the instructions outlined in Create Components.