Skip to content

System Architecture and Security

TestGen is designed to run entirely within your infrastructure, with read-only access to your data. This page describes TestGen's architecture, deployment model, and security posture.

Architecture overview

TestGen runs as a self-hosted application deployed within your network. Nothing is hosted by DataKitchen, and no data leaves your environment.

TestGen system architecture diagram showing the Core Engine, Target Database, Application Database, Scheduler, User Interfaces, Email Server, and Authentication Provider

Core Engine

The Core Engine performs all data quality operations:

  • Data Profiling queries your tables to collect column-level statistics — data types, patterns, distributions, and value characteristics.
  • Test Generation uses profiling results to automatically create a suite of data quality tests tailored to your data.
  • Test Execution runs the generated tests against your database and records pass/fail results.
  • Data Monitoring tracks changes in freshness, volume, schema, and custom metrics over time, using profiling results to detect anomalies.
  • Quality Scores aggregates profiling and testing results into data quality scores for your table groups.

Supporting components

  • The Application Database (PostgreSQL) stores all TestGen configuration, results, and schedules. It does not store your data.
  • The Scheduler automates profiling, test execution, and monitoring on the schedules you configure.
  • The Email Server delivers alert notifications when tests fail or monitors detect anomalies.
  • You interact with TestGen through a browser-based UI or a CLI. All tasks covered in this documentation can be performed in the UI. The enterprise version supports authentication through an external OpenID Connect provider.

How TestGen connects to your database

TestGen connects to your database with read-only access. All profiling queries, test execution, and monitoring queries run as SELECT statements directly in your database. No data is extracted, copied, or moved — your data stays where it is.

Results and metadata are written only to TestGen's own application database, never to your target database.

See What is TestGen? for the list of supported databases.

Deployment

Self-hosted, on-premises

TestGen is deployed entirely within your infrastructure. DataKitchen does not host any part of the application and has no access to your environment or data.

Docker Compose (default)

The default installation uses Docker Compose with two containers: one for the Core Engine (including UI, CLI, and Scheduler) and one for the PostgreSQL application database. This is the quickest way to get started and is suitable for evaluation and small-scale use. See Get Started for installation instructions.

For production deployments, TestGen provides Helm charts for Kubernetes. This is the recommended deployment model for teams that need high availability, easier upgrades, and integration with existing infrastructure. The Helm charts support connecting to a managed or self-hosted PostgreSQL server outside the cluster, so the application database does not need to run in a container.

Container image

The TestGen container image is built on Alpine Linux. The image runs as a dedicated non-root user and does not require privileged mode. Build toolchains are removed from the final image to minimize the attack surface, and builds include a software bill of materials (SBOM) and provenance attestations for supply chain verification.

Network architecture

TestGen makes only outbound connections. No inbound connections from outside your network are required.

Connection Direction Purpose Required
Target database Outbound Read-only queries for profiling, testing, and monitoring Yes
Application database Internal Stores configuration and results (not exposed externally in the default configuration) Yes
SMTP server Outbound Email notifications for test failures and monitor alerts No
OpenID Connect provider Outbound Single sign-on authentication (enterprise only) No
Mixpanel Outbound Anonymous usage analytics (can be disabled) No

Transport security (TLS)

TestGen supports HTTPS for the browser UI by mounting your TLS certificate and key files directly into the application container during installation. See Install on Mac/Linux or Install Enterprise for details.

You can also terminate TLS at a reverse proxy, load balancer, or ingress controller in front of TestGen, following your organization's standard practices.

Data residency and isolation

  • Your data is never extracted. TestGen executes queries directly in your database and stores only aggregate results (row counts, column statistics, pass/fail outcomes) in the application database.
  • Nothing is sent to DataKitchen. TestGen is self-hosted and does not communicate with DataKitchen servers. The one exception is optional anonymous usage analytics — no data content is collected, and you can disable it by setting TG_ANALYTICS=no.
  • Multi-project isolation. The enterprise version supports multiple projects with separate connections, so teams can work independently within a single TestGen installation.

Authentication and access control

The open-source version uses built-in username/password authentication. The enterprise version supports single sign-on (SSO) through any OpenID Connect provider, such as Okta, Microsoft Entra ID, Auth0, or Ping Identity. When SSO is enabled, TestGen delegates all authentication to your identity provider and does not store user passwords. See Configure Single Sign-On for setup details.

The enterprise version also includes role-based access control with five roles (Admin, Data Quality, Analyst, Business, and Catalog) that control access to features across the application. See User Access for role descriptions and the full permissions matrix.

Credentials and secrets

Target database credentials (passwords, private keys, and service account keys) are encrypted at rest in the application database using AES-256-CBC encryption with a PBKDF2-derived key. The encryption key material is provided through environment variables at deployment time. In Kubernetes deployments, the Helm chart auto-generates these secrets and stores them in a Kubernetes Secret resource.

User passwords are hashed with bcrypt before storage. Session tokens are signed with HMAC-SHA256 using a dedicated signing key.

Security testing and vulnerability management

DataKitchen's software development process includes automated static application security testing (SAST) and secret detection scans that run on every code change. The TestGen container image is scanned for vulnerabilities using Docker Scout as part of every release. Findings are remediated on defined timelines based on severity and impact.

DataKitchen engages an independent third party to conduct penetration testing of its software products approximately every two years. Testing follows the OWASP Application Security Verification Standard (ASVS) and includes both black-box and grey-box assessments covering the application, API, and network layers. Critical and high-priority findings are remediated before the final report is issued.

Detailed vulnerability scan reports, penetration test results, and DataKitchen's security policies are available upon request. Contact DataKitchen Support for more information.

Open-source transparency

TestGen's core engine is open-source under the Apache 2.0 license. The full source code is available on GitHub, allowing your security team to audit the codebase directly.