What is TestGen?¶

DataKitchen's DataOps Data Quality TestGen is a data quality tool that profiles your databases, generates tests, and monitors your tables, enabling you to confidently assess the health of your data from start to finish.

You want to know that the data in your database is consistent and meets expectations as work gets done. Unfortunately, you don't have the time or team capacity to create and implement the test logic. Rather than let your data suffer, use TestGen to algorithmically generate data quality checks that test for inconsistencies and errors for you.

TestGen product screenshots

Take the Product Tour

What Is It? Robust AI-Driven Data Quality Software

One Button Data Quality Tests – Instantly generate automated tests without deep data expertise. Start fast, scale effortlessly.
Data Profiling – Uncover column-level insights and understand problematic rows.
Blazing-Fast In-Database Execution – Minimize data movement by pushing queries directly into your database for speed & security.
120+ AI-Driven Data Quality Checks – Automatically complete data integrity, hygiene, and quality coverage.
Anomaly Detection with Monitors – Stay ahead of data issues with automated alerts on freshness, volume, schema, and data drift.
Data Catalog – A full 360° view of metadata, hygiene issues, PII risks, test results, and Critical Data Elements.
Customizable Quality Scoring & Dashboards – Automated scorecards with drill-down reports track and improve data quality.
Sharable Issue Reports - No time to write up a data quality issue report? Get influence and action on data quality with a single click.
Integrated Seamlessly to DataOps Observability - Checking and testing data is only part of the quality challenge; you also need to monitor all the tools acting upon your data.

Install TestGen¶

Install on Mac / Linux

Recommended Docker install

Install on Windows

Recommended Docker install

Install via pip

Python package install

Seven TestGen touchpoints¶

TestGen algorithmically generates data quality checks that test for inconsistencies and errors in your database—so you don't have to devote development cycles or team resources. You can watch a short demonstration from this link.

A TestGen workflow follows these seven touchpoints: Profile, Investigate, Generate, Execute, Analyze, Monitor, and Data Quality Score.

Step 1: Profile a table group¶

Data profiling is the periodic or routine investigation of tables in a schema. TestGen scans specific tables and columns (i.e., a table group) during a profiling run to gather information about the data types, column contents, and patterns.

Profiling a table group provides context about your database. The system uses current profiling results and hygiene issue details for review and analysis and to derive downstream logic for testing, while previous historical results are retained for cross-reference.

To start profiling, create a table group that defines which tables to scan. See Run Profiling for details. Best practice is to run profiling on a biweekly or monthly schedule—it provides the foundation for testing and monitoring but doesn't need to run frequently.

(Profiling Runs page)

Step 2: Investigate profiling and Data Catalog¶

You can access all profiling runs in a project from the Profiling Runs page, or explore your data through the Data Catalog. The Data Catalog brings together metadata, hygiene issues, PII risks, test results, and Critical Data Elements across all your tables and columns.

For details on what you can learn with profiling, continue with Investigate Data Profiling.

(Data Catalog page)

TestGen automatically detects 32 types of hygiene issues from profiling data—potential data quality improvements that help you identify errors and improve data usability.

(Hygiene Issues page)

Step 3: Generate tests¶

TestGen includes 49 test types that verify the health of your data. These tests consider six data quality dimensions: validity, consistency, completeness, timeliness, accuracy, and uniqueness.

From the 49 available test types, the system assesses the current table group and profiling run to algorithmically generate a suite of tests specific to your data. If, for example, the current profile does not include emails, then the system will not generate the Email Format test.

You can run—and rerun—this test suite to detect variability in your data over time.

You can perform test generation for a test suite as needed to refresh test selection, criteria, and thresholds, but best practice is every few weeks or when you feel your data has evolved.

Test generation is done from the Test Suites page in the UI. See Generate Tests for details.

You can review and manually refine tests before executing them. Tests you've customized can be locked to prevent the definition from being refreshed in future runs. You can also define your own custom tests to cover business-specific rules and validation logic.

(Test Suites page)

Step 4: Execute tests¶

Once you have a profile and test suite, run the tests against your data.

Best practice is to execute tests regularly—for example, once a week in the evening or every weekend. This can be scheduled and automated so results are ready for you to analyze during working hours. Continue with Run Tests for details.

(Test Runs page)

Step 5: Analyze test results¶

When you execute the tests in a test suite, the runs and results display under Test Runs. From there, you can filter and drill down the results for each run and test.

For an overview of how to analyze your TestGen outcomes, see Investigate Test Results.

(Test Results page)

Test results can be sent automatically to DataOps Observability, or you can download an issue report to share directly. You can also configure notifications to alert your team when issues are detected.

Step 6: Monitor your data¶

Monitors provide proactive anomaly detection for your data tables. While tests check your data against specific thresholds, monitors learn your data's patterns over time and alert you when values deviate from what's expected.

TestGen supports four types of monitors:

Freshness – Detects when tables are not updated on their expected schedule
Volume – Tracks row count changes and alerts on unexpected spikes or drops
Schema – Detects column additions, deletions, or type changes
Metric – Tracks user-defined metrics for anomalies

Monitors use prediction models to automatically calculate expected ranges based on historical data. As they collect more data, their thresholds become increasingly accurate.

See Monitor Tables for details on setting up and managing monitors.

(Monitors Dashboard)

Step 7: Data quality scoring¶

TestGen automatically creates data quality scores for every table group based on hygiene issues and test results. These scores are listed in a Quality Dashboard and editable through a Score Explorer.

The Quality Dashboard provides an at-a-glance view of data quality across all your table groups, with drill-down capabilities to investigate specific issues.

(Quality Dashboard)

System architecture¶

TestGen is a self-hosted application deployed within your network. It connects to your database with read-only access — no data is extracted or copied. For details on TestGen's architecture, deployment options, and security posture, see System Architecture and Security.

Supported databases¶

The following databases are compatible with TestGen:

TestGen also supports structured data stored in Apache Iceberg tables and file formats like Parquet, Avro, ORC, CSV, and JSON that are compatible with database external tables.

Get started with TestGen¶

See Get Started for information on how to get started with TestGen.

Additional resources¶

Monitor Tables – Set up proactive anomaly detection
Quality Scores – Understand your data quality scores
Data Catalog – Explore your data assets
Notifications – Configure alerts for your team
Scheduling – Automate profiling and test execution