Dataset¶
A dataset is a type of Observability component that represents a collection of data in a data estate.
Observability datasets help to describe a read or write operation on a specified dataset, for example, a database table, a data file, or a data lake. Datasets can include source data from an ingestion point, processed data necessary for downstream teams, and can create connections between pipelines.
Dataset configuration¶
You define the component activity to align with the datasets in your data estate.
Dataset configurations always include events, which are moments of interest sent by your data estate and captured by Observability.
Dataset Operation event¶
Observability expects to receive Dataset Operation events for dataset components. This event represents when a read or write action occurs.
- Read operation: an asset has been read from the dataset.
- Write operation: an asset has been written to the dataset.
Create a new dataset¶
Datasets are automatically generated based on event information received by the Event Ingestion API.
Observability creates a new dataset any time the system receives an event whose dataset_key does not match keys for existing
datasets in the project.
You can also manually create datasets in the UI. Follow the instructions outlined in Create Components.