Skip to content

Data Estate

There are two core concepts in DataKitchen's DataOps Observability: theĀ data estateĀ and theĀ data journey.

Data estate

A data estate is a collection of all the data and infrastructure—that is, the organizational structures set up to operate the tools and data assets—within an organization. For scalability and to improve management best practices, methods to supervise a data estate should be established.

Observability can help you monitor, drill down, and better understand your data estate. To do this,Ā eventsĀ from your local tools and data assets are ingested by the system, associated with a particularĀ component, and displayed for you in the UI. You can then use the information in Observability to understand the realities of your data estate; supervise the activity of the batch pipelines, streaming pipelines, datasets, and servers; and stay alert to events as they happen.

Events

An event is any moment of interest that occurs in your data estate. Events are sent from theĀ tools in your data estateĀ and received by Observability. The system uses metadata sent with the event to categorize, organize, and display them in the UI.

Observability recognizes five types of events, each describing a common category of work that occurs in data analytics: Run Status, Test Outcomes, Metric Log, Dataset Operation, and Message Log.

Each event includes required and optional properties sent with the event to provide more context and details. Of these properties, keys are the most important. Keys are how Observability associates events with the correct project and component.

Components

A component represents a batch pipeline, a dataset, a server—a resource or tool, essentially. Events are always associated with a component, and if a batch pipeline, a run.

Example

Example: The Run Status event describes the behavior of a batch pipeline run. While a run is in progress, other event types may be received for the batch pipeline, such as a Metric Log event or Test Outcomes event. Observability aggregates all these events per run and displays them under the Batch Runs tabs.