Validity Tests¶
Validity tests verify that data values conform to expected formats, ranges, and domain rules. A value can be present and non-null but still invalid — for example, a zip code with too few digits, an email with no domain, or a date that predates the dataset's known history. These tests catch values that fail to meet structural or semantic expectations for their column.
Alpha Truncation¶
Tests that the maximum count of characters in column values has not dropped compared to baseline data.
Auto-generated when profiling finds a text column with consistent or near-consistent value lengths.
| Scope | Column |
| Measures | Maximum character length found |
| Threshold | Minimum expected maximum length (default: 95% of longest value at baseline) |
| On failure | Maximum length of values has dropped compared to baseline. |
| Default severity | Fail |
How it works: Computes MAX(LENGTH(column)) and fails when the result drops below the threshold. A decrease in maximum length in a cumulative dataset suggests that previously existing longer values were altered or removed.
When to use: This test is appropriate for cumulative datasets where prior values should still exist unchanged. A failure suggests that some process changed data you would expect to remain at its original length. It would not be appropriate for an incremental or windowed dataset.
Email Format¶
Tests that non-blank email values match the standard email address format.
Auto-generated when profiling identifies the column as an email pattern.
| Scope | Column |
| Measures | Count of invalid email addresses |
| Threshold | Maximum acceptable count of invalid emails (default: 0) |
| On failure | Invalid email address formats found. |
| Default severity | Fail |
How it works: Counts the number of non-blank values that do not match the standard email address format (e.g., user@domain.tld). The test fails when the count exceeds the threshold.
When to use: Apply this test to any column identified as containing email addresses. It validates structural format only — it does not verify that the address exists or can receive mail.
Minimum Date¶
Tests that the earliest date referenced in a column is no earlier than the baseline minimum.
Auto-generated when profiling finds a date column with a non-null minimum date and more than one distinct value.
| Scope | Column |
| Measures | Count of dates before the baseline minimum |
| Threshold | Maximum acceptable count of dates before the baseline minimum (default: 0) |
| On failure | Earliest date value found is before earliest value at baseline. |
| Default severity | Fail |
How it works: Counts the number of date values that fall before the baseline minimum date. The test fails when the count exceeds the threshold.
When to use: This test is appropriate for cumulative datasets where old dates should not change. It works well where new records are added with more recent dates but old dates remain stable.
Minimum Value¶
Tests that the minimum numeric value in a column is no lower than the baseline minimum.
Auto-generated when profiling finds a numeric measurement column with a non-null minimum value and at least two distinct values.
| Scope | Column |
| Measures | Count of values below the baseline minimum |
| Threshold | Maximum acceptable count of values below the baseline minimum (default: 0) |
| On failure | Minimum column value less than baseline. |
| Default severity | Fail |
How it works: Counts the number of values that fall below the baseline minimum. The test fails when the count exceeds the threshold.
When to use: This test is appropriate for cumulative datasets where all prior values are still present, or for any measure with an absolute minimum that makes sense for valid data.
Value Count¶
Tests that the count of distinct values in a column has not dropped compared to baseline.
Auto-generated when profiling finds a Code, Category, Attribute, or Description column with 2 to 10 distinct values and more than 50 non-null records.
| Scope | Column |
| Measures | Count of unique values |
| Threshold | Minimum expected distinct value count (default: count at baseline) |
| On failure | Count of unique values in column has changed from baseline. |
| Default severity | Fail |
How it works: Counts the distinct values in the column and compares the result to the threshold. The test fails when the current count falls below the expected minimum.
When to use: This test is relevant for cumulative datasets where old records are retained, or for any dataset where you expect a set number of distinct values to be present. A failure indicates missing records or a change in categories or value assignment.
Pattern Match¶
Tests that all values in a column match the alphanumeric pattern identified in baseline data.
Auto-generated when profiling finds a single consistent alphanumeric pattern in an attribute, ID, phone, period, or datetime stamp column with more than 10 distinct values.
| Scope | Column |
| Measures | Count of pattern mismatches |
| Threshold | Maximum acceptable count of pattern mismatches (default: 0) |
| On failure | Alpha values do not match consistent pattern in baseline. |
| Default severity | Fail |
How it works: Compares each value in the column against the baseline pattern. Counts how many values do not conform. The test fails when the count exceeds the threshold.
When to use: This test is appropriate for character fields expected to appear in a consistent format, such as IDs, phone numbers, or date stamps stored as text. The expected threshold is the number of records that fail to match the defined pattern.
Street Address¶
Tests that a sufficient percentage of values match a standard US street address pattern.
Auto-generated when profiling identifies the column as a street address pattern.
| Scope | Column |
| Measures | Percentage of values matching the address pattern |
| Threshold | Minimum acceptable match percentage (default: 75%) |
| On failure | Percent of values matching standard street address format is under expected threshold. |
| Default severity | Fail |
How it works: Evaluates each non-blank value against a standard US street address pattern and calculates the percentage that match. The test fails when the match percentage drops below the threshold.
When to use: The street address pattern should match the vast majority of US addresses. Adjust the threshold based on the results you observe — you may want to tighten it to make the test more sensitive to invalid entries.
US State¶
Tests that values in a column are valid two-letter US state codes.
Auto-generated when profiling finds a 2-character text column whose name contains "state" with fewer than 70 distinct values.
| Scope | Column |
| Measures | Count of values that are not US state abbreviations |
| Threshold | Maximum acceptable count of non-matching values (default: 0) |
| On failure | Column values found that are not valid US states. |
| Default severity | Fail |
How it works: Validates each value against a fixed list of two-character US state and territory codes (including Armed Forces designations). Counts the number of values that do not match. The test fails when the count exceeds the threshold.
When to use: Apply this test to columns that should contain US state abbreviations. It validates against the standard set of state, territory, and Armed Forces codes.
Valid Characters¶
Tests for the presence of non-printing characters, leading spaces, or surrounding quotes in text values.
This test is not auto-generated. You can create it manually from the Test Definitions page.
| Scope | Column |
| Measures | Count of values with invalid characters |
| Threshold | Maximum acceptable count of invalid values (default: 0) |
| On failure | Invalid characters, such as non-printing characters, leading spaces, or surrounding quotes, were found. |
| Default severity | Warning |
How it works: Scans each text value for non-printing ASCII characters, leading spaces, and enclosing quotes. Counts the number of records containing any of these issues. The test fails when the count exceeds the threshold.
When to use: This test looks for artifacts of data conversion or ingestion that may be difficult to process or analyze downstream. Non-printing characters, leading spaces, and embedded quotes can cause silent failures in joins, lookups, and string comparisons.
Valid Month¶
Tests that values are valid representations of calendar months consistent with the format at baseline.
This test is not auto-generated. You can create it manually from the Test Definitions page.
| Scope | Column |
| Measures | Count of invalid month values |
| Threshold | Maximum acceptable count of invalid month values (default: 0) |
| On failure | Column values are not valid representations of calendar months. |
| Default severity | Fail |
How it works: Validates each value against a list of acceptable month representations derived from baseline data. The format is determined automatically — full names, abbreviations, or numeric codes — matching the style observed during profiling. The test fails when the count of non-matching values exceeds the threshold.
When to use: Apply this test to columns that store month values as text or numeric codes (not as part of a full date). The test adapts to whatever month format was present at baseline, including full names, three-letter abbreviations, and zero-padded or unpadded numbers.
Valid US Zip¶
Tests that postal codes match the 5-digit or 9-digit standard US formats.
Auto-generated when profiling identifies the column as a zip code.
| Scope | Column |
| Measures | Count of invalid zip codes |
| Threshold | Maximum acceptable count of invalid zip codes (default: 0) |
| On failure | Invalid US zip code formats found. |
| Default severity | Warning |
How it works: Checks each value against the standard US zip code formats: 5-digit (12345) and 9-digit with hyphen (12345-6789). Counts values that do not match either format. The test fails when the count exceeds the threshold.
When to use: Apply this test to columns that should contain US postal codes. It validates format only — it does not verify that the zip code corresponds to a real location.
Valid US Zip-3¶
Tests that postal codes match the 3-digit format of a US regional prefix.
Auto-generated when profiling identifies the column as a Zip-3 prefix.
| Scope | Column |
| Measures | Count of invalid Zip-3 prefixes |
| Threshold | Maximum acceptable count of invalid Zip-3 codes (default: 0) |
| On failure | Invalid 3-digit US zip code regional prefix formats found. |
| Default severity | Warning |
How it works: Checks each value against the 3-digit numeric format expected for US zip code regional prefixes. Counts values that do not match. The test fails when the count exceeds the threshold.
When to use: Zip-3 prefixes are often used to roll up zip code data to a regional level and may be used to anonymize detailed data. This test ensures the column contains only the expected 3-digit prefix format.
Metric Trend¶
Tests that a user-defined aggregate metric for a table is within a derived tolerance range.
This test is not auto-generated. You can create it manually from the Test Definitions page. Also used by the Metric monitor.
| Scope | Table |
| Measures | Aggregate metric value |
| Threshold | Expected metric range (derived from historical data or manually configured bounds) |
| On failure | Aggregate metric is outside expected range. |
| Default severity | Fail |
How it works: Evaluates a user-defined SQL aggregate expression (such as AVG(price), COUNT(DISTINCT customer_id), or SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END)) and compares the result against a tolerance range. The range can be derived automatically from historical values or set manually via lower and upper bounds.
When to use: This test tracks any aggregate measure that matters to your data quality — null counts, averages, ratios, conditional counts, or any expression your database supports. It is also used by the Metric monitor for automated anomaly detection. Define the metric expression and optionally a subset condition to scope the calculation.