Skip to content

Get Started with TestGen

DataKitchen's DataOps Data Quality TestGen provides database profiling, hygiene issue detection, data quality test generation, test execution, anomaly monitoring, and data quality scoring—so you can trust that your data is correct in production without devoting development cycles or team resources.

Installation and quick start

Install the open-source version of TestGen and follow the quick start guide to explore its features and capabilities.

  1. Complete the prerequisites, installation, and demo setup steps described in Install on Windows or Install on Mac/Linux or install via Python pip install.
  2. Follow the Quick Start demo tutorial for a walk-through of the software.

Enterprise

For our enterprise TestGen software, see Install Enterprise.

Use TestGen on your data

After exploring the demo, connect your own database and follow the TestGen workflow to start assessing your data quality.

  1. Connect your database and create a table group for the tables you want to assess.
  2. Run profiling to gather baseline statistics and detect hygiene issues.
  3. Generate tests to create a test suite tailored to your data.
  4. Run tests and review the results.
  5. Configure monitors to detect anomalies in freshness, volume, schema, and custom metrics.
  6. Schedule automated runs so results are ready when you need them.

To extend your setup further:

Terms to know

Here are some TestGen terms to know when getting started.

  • Data profiling: Profiling is the periodic or routine investigation of tables in a schema to gather information about the data types, column contents, and data patterns.
  • Table group: A table group represents a selection of database tables in a schema that will be profiled, tested, and monitored.
  • Tests: A test applies to a specific column in the table group and verifies the health of your data in one of six data quality dimensions: validity, consistency, completeness, timeliness, accuracy, and uniqueness.
  • Test suite: A test suite is a collection of tests that can be run—and rerun—to detect variability in your data over time.
  • Monitor: A monitor tracks a specific aspect of a table—freshness, volume, schema, or a custom metric—and alerts you when anomalies are detected. Monitors learn expected patterns over time and automatically calculate thresholds.
  • Project: A project is the space that separates and distinguishes work. Depending on your organizational needs, projects might be set up by department, team, or stakeholder group.

Tip

For a detailed walkthrough of the TestGen workflow, see What is TestGen?.