Open Source · Free To Start · No SQL Required
★ GitHub

You Have 400 Tables.
Tests for 4 of Them.
TestGen Fixes That.

Point TestGen at your database and it profiles every table, flags hygiene issues automatically, and generates thousands of data quality tests with no SQL, no YAML, and no coding required. Full coverage in minutes. Not months. First profiling run in under 5 minutes.

TestGen Quality Dashboard showing scorecards with Total Score, CDE Score, and quality dimension breakdown
51Profiling Characteristics
27Hygiene Detectors
1000sof Tests Per Profiling Run
<5 minFirst Profiling Run
$0To Start

Bad data has always been expensive. AI makes it exponentially worse.

A bad row that once broke one dashboard now corrupts thousands of downstream reports, model predictions, and automated decisions. The blast radius of a single data quality failure keeps growing.

The teams that stay trusted are the ones who got test coverage before they needed it. Not after the incident report.


Download Free Guide: AI & Data Quality →
From the community: r/dataengineering

"We just eyeball row counts and pray." When there's no time to write tests, this is the actual quality strategy at most data teams. The monitoring is vibes-based.

From the community: dbt Community Forum

"The data changes faster than I can keep the tests up to date." Handwritten tests become technical debt overnight. A schema change upstream breaks them all.

From the community: Hacker News

"Nobody gives us the time to write tests. It's always the next feature, never quality." This is the #1 reason data engineers do not test, confirmed across 849 community comments.

Real Customer · Manager, Data Quality
"DataKitchen enabled us to deliver over 10,000 data quality validation tests that run every release. Executives don't trust our analytics? Now, they trust us."
Manager, Data Quality | Enterprise Software Company
Research · 849 Community Voices

We Asked The Community Why Data Engineers Don't Test.

We read 849 comments across 18 threads on Reddit, Hacker News, Stack Overflow, and the dbt Community Forum. The answers were honest, funny, and occasionally brutal. They are exactly what TestGen was built to solve.

Read the full breakdown →
#1 Barrier
Nobody gives us time to write tests
#2 Barrier
Data changes faster than tests can keep up
#3 Barrier
No domain knowledge to write meaningful tests
#4 Barrier
Too many false positives: alerts nobody trusts
TestGen addresses all four. See how →

Everything You Need.
None of What You Don't.

Built for data engineers who need coverage fast. Not another platform to configure for six months before it delivers value.

Data Profiling

51 column-level characteristics captured in a single run: types, patterns, nulls, value distributions, percentiles, and PII signals. Every table. No SQL written.

Hygiene Detection

27 types of data problems flagged automatically after profiling, before you write a single test. Invalid formats, mixed types, blank value variants, stale tables, and more.

Auto Test Generation

One profiling run creates 32 test types applied across every column, generating thousands of individual test instances automatically. TestGen infers bounds, patterns, and expected distributions from your data with no configuration required.

Data Catalog

360-degree column-level view: semantic type, value distribution, hygiene flags, PII risk, test results, and Critical Data Element tagging. All derived from profiling with no manual entry.

Quality Scoring

Automated scorecards roll up profiling and test results per table, domain, or pipeline zone. Drill to the column pulling the score down. Share a 1-click issue report.

Table Monitors

ML-driven anomaly detection on freshness, volume, schema drift, and metric drift. TestGen learns your data's normal behavior and alerts when it deviates. No thresholds to configure manually.

Business Rule Tests

10 configurable test types for rules that cannot be inferred from data automatically: Data Match, Prior Match, Aggregate Match, and more. Configure them in the UI with no custom SQL required.

CLI & CI/CD Integration

Run tests in any orchestrator: Airflow, dbt, Azure Data Factory, GitHub Actions. Non-zero exit codes stop the pipeline before bad data reaches production. Works at every Medallion layer: Bronze ingestion, Silver transformation, and Gold delivery.

Observability Integration

TestGen is the data quality layer. DataOps Observability is the pipeline layer. Together they cover every point where data can fail: from a bad source column to a broken pipeline step to a wrong number in a dashboard.

TestGen results export directly into the DataOps Observability timeline. One view. Every failure. Source to customer.

Learn about DataOps Observability →

Three Teams. One Tool.

Data Engineers

Get test coverage across all your tables, not just the four important ones that got manual tests.

No YAML. No SQL. No weeks of test-writing. Profile your tables, generate tests automatically, and integrate into your pipeline with a single CLI command. Works with Airflow, dbt, ADF, and any CI/CD system.

Data Quality Teams

Build scorecards, track quality over time, and create evidence that moves upstream teams to fix their data.

Quality scores by table, domain, or pipeline zone. Drill down to the exact column pulling the score down. One-click shareable issue reports give you something concrete to bring to the source team conversation.

Data Governance Teams

Automated PII detection. CDE tagging per column. Audit-ready issue reports. No manual catalog curation required.

Catalog your data assets, flag PII risks, and tag Critical Data Elements with evidence of quality at every layer. Quality scoring by business domain and stakeholder group. Everything is derived from profiling runs, not hand-entered metadata.

What TestGen Actually Catches

Before you install anything, here is exactly what TestGen finds.

27 Hygiene Issues Found After Profiling

  • Non-standard blank values: empty string, N/A, 0 used as null, and similar variants
  • Invalid zip code format in string column
  • Leading or trailing spaces in text fields
  • Mostly dates stored in string column
  • Multiple data types within same column name
  • No column values present at all
  • Mostly not-null but sporadic empty values
  • Recency issue: no records within the last year
  • Duplicate values in a column expected to be unique
  • PII risk: email, phone, or SSN pattern detected
  • Quoted values found in string column
  • Similar values match when standardized

Auto-Generated Test Examples

  • Alpha truncation: values cut at consistent length
  • Average shift: mean deviates from historical baseline
  • Constant value present: column stopped changing
  • Daily record count: row count outside expected range
  • Distinct value change: new or missing categorical values
  • Future date: timestamp values beyond today
  • Incremental average shift: trend is breaking
  • Value present in list-of-values: referential integrity check
  • Minimum and maximum value bounds exceeded
  • Percent unique: uniqueness ratio outside tolerance
  • Pattern match: format regex violated
  • Required entry: nulls found in a non-null column

Custom Business Rule Tests

  • Data Match: value matches a reference table lookup
  • Prior Match: value unchanged from the previous run
  • Aggregate Match No Drops: sum consistent across joins
  • Row count match between source and target
  • Cross-column consistency rules
  • Date sequence validation
  • Referential integrity across tables
  • Business-defined range or ratio checks
  • Configurable in the UI with no SQL required
  • Shareable with business users for review

All the Checkmarks.
None of the Typical Cost Burden.

No usage-based surprises. No VC-driven price resets. No 6-month sales cycles. Just a number, published on the page. No per-table tax. Monitor every asset without costs that balloon as your data grows. Vendors like Monte Carlo and Bigeye charge per monitored asset. We do not.

TestGen Open Source TestGen Enterprise Typical Observability Vendor
(e.g., Monte Carlo, Bigeye, Anomalo)
Price $0Free forever $100per user / per connection $50K–200K+per year, negotiated
Data Profiling (51 characteristics) Partial
Auto Test Generation
Hygiene Issue Detection (27 types)
Quality Scoring & Dashboards Partial
Table Monitors (ML anomaly detection)
SSO / Multi-project / RBAC
Pricing Transparency
VC-Backed Pricing Risk None None High
Why our pricing is public: DataKitchen is profitable and investor-free. We will not suddenly pivot pricing models or sunset features after a funding round, because there are no funding rounds. The enterprise version is $100 per user per connection. That's the number. It doesn't change in your renewal conversation.
The Data Quality Tax: The anomaly detection algorithms inside most $100K/year observability platforms are open source: Z-score calculations, time-series variance, and record count comparisons. You are paying an enterprise markup on commodity math. TestGen gives you the same detection capabilities, transparently priced, with auto-generated tests that those platforms don't offer at all. Read the full analysis →

Built by Practitioners.
Not by a Sales Team.


DataKitchen has spent a decade building DataOps tooling for data engineering teams. We've written three books on DataOps. We built the open source DataOps Observability platform. We've spoken at data conferences worldwide.

TestGen is the data quality testing layer we always wished existed. It is purpose-built for the data engineer who needs coverage across hundreds of tables, not just the four most important ones that got manual tests.

We are profitable. We are independent. We are not going to raise a Series B and tell you your annual contract is going up 3x. We charge $100 per user per connection for the enterprise version. That is the number. It's on the page. It doesn't change.

Free Certification

Not ready to install yet? Get certified in Data Observability for free. Learn the concepts, prove the skills, and take it at your own pace.

Get certified free →
3
Books Published on DataOps Including the DataOps Cookbook and The DataOps Way to Data Quality
10+
Years Building DataOps Tools Across orchestration, observability, and data quality
$0
Venture Capital Raised Profitable since year one. No investors, no pivot risk, no pricing surprises.
2
Open Source Products TestGen for data quality and DataOps Observability for pipeline monitoring

Your First Profiling Run
is One Command Away.

Runs in Docker. No cloud account required. No credit card. No sales call.

Two commands. That is it.

$ pip install testgen-tooling
$ tg launch

A browser-based UI opens at localhost. Connect your database, run your first profile, and see hygiene issues and auto-generated tests within minutes.

▸ Works with: Snowflake  ·  Databricks  ·  PostgreSQL  ·  AWS Redshift  ·  Azure Synapse  ·  Azure SQL

Already running pipelines? TestGen's CLI integrates directly into Airflow, dbt, and Azure Data Factory as well as any CI/CD system.
Run tests on every pipeline execution. Fail the pipeline job when data quality fails.

→ CLI reference docs   ·   → Full documentation   ·   → Product tour (3 min)