Data Quality Testing¶
Use data quality testing to verify that your data meets expected standards before it reaches downstream consumers. TestGen executes tests against your database and produces results you can review, confirm, or dismiss — giving you an ongoing record of data quality over time.
What testing checks¶
TestGen includes 48 test types spanning six data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, and validity. Tests range from simple checks (are required values present? do zip codes match a valid format?) to statistical comparisons (has the average shifted significantly from baseline? has the distribution changed?).
Each test targets a specific column or table and evaluates it against thresholds derived from profiling results. For the full catalog of test types, see TestGen Test Types.
How testing works¶
Testing is organized around test suites, test definitions, and test runs:
- Test suite — A named collection of tests associated with a table group. You generate tests into a suite, and each execution of the suite produces a test run.
- Test definition — An individual test within a suite, specifying the test type, target table and column, thresholds, and parameters. Definitions can be auto-generated from profiling results or created manually.
- Test run — A single execution of all active tests in a suite. Each run produces a set of results with one of five statuses:
- Passed — The data meets the test criteria.
- Warning — The data does not meet the test criteria, but the test severity is set to warn rather than fail.
- Failed — The data does not meet the test criteria.
- Error — The test could not execute (for example, a missing table or permission issue).
- Log — An informational result recorded for reference.
How testing supports your data quality workflow¶
Testing connects to several other features in TestGen:
- Profiling — Profiling discovers the characteristics of your data, and test generation uses those characteristics to create tests with appropriate thresholds.
- Quality scores — Test results contribute to the data quality scores visible on the Quality Dashboard, giving you a high-level view of data quality across your project.
- Notifications — Set up email alerts to be notified when test runs complete or when failures are detected. See Notifications for configuration details.
- Observability — Export test results to DataKitchen's DataOps Observability for centralized monitoring. See Connect to Observability.
When to run tests¶
Best practice is to run tests on a regular schedule — for example, weekly or after each data pipeline refresh — so that issues are detected close to when they occur. You can schedule test runs to execute automatically during off-hours, with results ready for you to review during working hours.
Regenerate tests every few weeks or when your data has evolved, so that test criteria stay aligned with current data characteristics.
Getting started¶
To start testing, you need a test suite with generated or manually defined tests. See Run Tests for execution steps, and Investigate Test Results to review your results and take action on findings.