Completeness Tests¶

Completeness tests verify that your data has no unexpected gaps — whether at the record level (missing values within rows), the table level (fewer rows than expected), or the temporal level (missing time periods in date-based data). Failures in completeness often point to ingestion problems, upstream processing errors, or source system outages.

Required Entry¶

Tests that a non-null value is present in each record for the column, consistent with baseline data.

Auto-generated when profiling finds that every value in the column is non-null (in tables with more than 10 rows).


Scope	Column
Measures	Count of missing values
Threshold	Maximum acceptable count of missing values (default: 0)
On failure	Not every record for the column is filled with non-null values.
Default severity	Fail

How it works: Computes the difference between the total row count and the count of non-null values in the column. The test fails when the number of missing values exceeds the threshold.

When to use: This test is appropriate for columns that should always contain a value, such as primary keys, foreign keys, or other business-critical fields. If the column had no nulls at baseline, any appearance of missing values likely indicates a data quality issue upstream. If a small number of nulls are acceptable, increase the threshold accordingly.

Percent Missing¶

Tests for a statistically significant shift in the percentage of missing values compared to baseline data.

Auto-generated when profiling finds that the column contains some missing values.


Scope	Column
Measures	Cohen's H Difference (0.20 = small, 0.50 = moderate, 0.80 = large, 1.20 = very large)
Threshold	Maximum acceptable Cohen's H Difference (default: 2)
On failure	Significant shift in percent of missing values compared to baseline.
Default severity	Warning

How it works: Compares the current ratio of non-null values to total values against the baseline ratio using the Cohen's H statistic — a standardized measure of the difference between two proportions. The test fails when the absolute difference exceeds the threshold.

When to use: This test uses Cohen's H to detect whether the rate of missing data has shifted meaningfully since baseline. An uptick in missing data may indicate a collection issue at the source, while a larger change may point to a processing failure. A drop in missing data can also be significant if downstream analytics assume a certain level of nullability. You can refine the threshold as you observe legitimate variation over time.

Row Count¶

Tests that the count of records has not decreased from a baseline count.

This test is not auto-generated. You can create it manually from the Test Definitions page.


Scope	Table
Measures	Row count
Threshold	Expected minimum row count
On failure	Row count less than baseline count.
Default severity	Fail

How it works: Counts the total rows in the table and fails when the count falls below the threshold.

When to use: This test is appropriate for any dataset where you need to verify a minimum number of rows is present. Because it tests against a fixed threshold, it works best for tables whose size is relatively stable. For datasets where the row count varies significantly between refreshes, consider Row Range instead.

Row Range¶

Tests that the count of records is within a percentage above or below a baseline count.

This test is not auto-generated. You can create it manually from the Test Definitions page.


Scope	Table
Measures	Percent of baseline row count
Threshold	Maximum acceptable percentage deviation above or below baseline
On failure	Row count is outside of threshold percent of baseline count.
Default severity	Fail

How it works: Computes the absolute percentage difference between the current row count and the baseline count. The test fails when this percentage exceeds the threshold.

When to use: This test is better suited than Row Count for incremental or windowed datasets where you expect the row count to fluctuate within a predictable range. Set the threshold to the maximum percentage deviation you consider acceptable.

Daily Records¶

Tests for the presence of at least one record for every calendar day within the minimum and maximum date range of the column.

Auto-generated when profiling finds a date column spanning more than 21 days with no gaps between dates.


Scope	Column
Measures	Count of missing calendar days
Threshold	Maximum acceptable count of missing calendar days (default: 0)
On failure	Not every date value between minimum and maximum dates is present.
Default severity	Warning

How it works: Calculates the total number of calendar days in the range from the minimum to maximum date value, then subtracts the count of distinct dates present. The test fails when the number of missing days exceeds the threshold.

When to use: This test is relevant for transactional data where you expect at least one record every day. A failure suggests missing records for the identified days. If certain days legitimately have no records (such as holidays), adjust the threshold to accommodate the expected gap count.

Weekly Records¶

Tests for the presence of at least one record for every calendar week within the minimum and maximum date range of the column.

Auto-generated when profiling identifies a transactional date column in a cumulative table spanning more than 3 weeks with no weekly gaps.


Scope	Column
Measures	Count of missing calendar weeks
Threshold	Maximum acceptable count of missing calendar weeks (default: 0)
On failure	Not every week between minimum and maximum date range has at least one date present.
Default severity	Fail

How it works: Calculates the total number of calendar weeks in the range from the minimum to maximum date value, then subtracts the count of distinct weeks that contain at least one date. The test fails when the number of missing weeks exceeds the threshold.

When to use: This test is relevant for transactional data where you expect at least one record each week. A failure suggests missing records for the identified weeks.

Monthly Records¶

Tests for the presence of at least one record for every calendar month within the minimum and maximum date range of the column.

Auto-generated when profiling identifies a transactional date column in a cumulative table spanning more than 2 months with no monthly gaps.


Scope	Column
Measures	Count of missing calendar months
Threshold	Maximum acceptable count of missing calendar months (default: 0)
On failure	Not every month between minimum and maximum date range has at least one date present.
Default severity	Fail

How it works: Calculates the total number of calendar months in the range from the minimum to maximum date value, then subtracts the count of distinct months that contain at least one date. The test fails when the number of missing months exceeds the threshold.

When to use: This test is relevant for transactional data where you expect at least one record each month. A failure suggests missing records for the identified months.

Volume¶

Tests that the row count of all or a subset of records in a table is within a derived tolerance range.

Auto-generated by the monitoring system. Can also be created manually.


Scope	Table
Measures	Row count
Threshold	Expected row count range (derived from historical data or manually configured bounds)
On failure	Row count is outside expected range.
Default severity	Fail

How it works: Counts the rows in the table (optionally filtered by a subset condition) and compares the result against a tolerance range. The range can be derived automatically from historical row counts or set manually via lower and upper bounds.

When to use: This test compares the row count against a dynamically derived range rather than a fixed threshold, making it better suited to datasets whose volume changes over time. It is also used by the Volume monitor for automated anomaly detection. You can optionally define a subset condition to monitor a specific segment of the table.