Monitor Tables¶
TestGen Monitors provide proactive, automated anomaly detection for your data tables. While tests verify specific data content rules (such as column formats, value ranges, and referential integrity), monitors focus on table-level patterns over time — tracking whether tables are being updated on schedule, whether row counts are changing unexpectedly, whether the schema has shifted, and whether custom metrics remain within expected bounds.
Monitors are designed to catch the kind of data issues that emerge gradually or suddenly at the operational level: a pipeline that stops running, a table that doubles in size overnight, a column that gets dropped without notice. By learning from historical patterns, monitors can flag these anomalies automatically — often before downstream consumers notice something is wrong.
Monitors vs. tests¶
Monitors and tests serve complementary purposes in your data quality strategy:
| Monitors | Tests | |
|---|---|---|
| Focus | Table-level operational patterns (freshness, volume, schema, custom metrics) | Data content rules (formats, ranges, uniqueness, referential integrity) |
| Detection | Learns expected patterns over time; flags deviations automatically | Checks data against defined thresholds and rules |
| Setup | Mostly auto-generated; minimal configuration required | Generated from profiling results; can be customized per column |
| Frequency | Run at roughly twice the expected data change frequency | Run periodically (e.g., daily or weekly) for thorough data validation or as part of your production orchestration. |
| Scoring | Not included in data quality scores | Contribute to data quality scores |
Monitor types¶
TestGen provides four types of monitors. Three are auto-generated when you configure monitors for a table group; the fourth (Metric) is created manually for custom tracking needs.
Freshness¶
Freshness monitors track when tables were last updated. They work by computing a fingerprint (hash) of key columns and comparing it to the previous fingerprint. When the fingerprint changes, the monitor records how many minutes have elapsed since the last detected change.
After collecting enough update history, the system automatically learns your table's update schedule — whether daily, weekly, or multiple times per day — and adjusts its expectations accordingly. If you configure excluded days (such as weekends or holidays), those are factored in as well.
Once thresholds have been established, Freshness monitors can detect three types of anomalies:
- Late — The table has not been updated, and the elapsed time exceeds the expected staleness threshold.
- Earlier than expected — The table was updated, but sooner than the learned pattern predicts.
- Later than expected — The table was updated, but later than the learned pattern predicts.
Freshness monitors are auto-generated after profiling runs. The profiling process identifies appropriate columns to use for fingerprint comparison.
Volume¶
Volume monitors track changes in row count over time. Each time the monitor runs, it records the current row count and compares it against the expected range.
Volume anomalies indicate that the number of rows in a table has changed more than expected. For example, a table that normally grows by a few thousand rows per day suddenly loses half its records, or a table that should be stable doubles in size.
Volume monitors are auto-generated for every table in the table group during initial setup and whenever new tables are detected.
Schema¶
Schema monitors detect structural changes to your tables. They scan for:
- Table additions — New tables are added to the schema.
- Table drops — Existing tables are removed.
- Column additions — New columns are added to a table.
- Column drops — Existing columns are removed from a table.
- Column modifications — Column data types or properties change.
Unlike other monitor types, Schema does not use a prediction model. Any detected schema change is always reported as an anomaly, since structural changes to your data should always be reviewed.
Schema monitors are auto-generated during initial monitor setup for the table group.
Metric¶
Metric monitors track user-defined numeric metrics over time. You provide a SQL expression that returns a single numeric value, and the monitor tracks that value across runs, using the prediction model to detect anomalies.
Common uses for Metric monitors include:
- Tracking the count of null values in a critical column
- Monitoring the average of a financial metric
- Watching the ratio of records that meet a specific condition
- Tracking any business-specific numeric indicator derived from your data
Metric monitors are not auto-generated. You create them manually through the Edit Table Monitors dialog for individual tables. You can define multiple Metric monitors per table. See Adding Metric monitors for steps.
How the prediction system works¶
Volume and Metric monitors use a machine learning time-series prediction model (SARIMAX) to learn expected patterns from your data and automatically determine which values are normal. Freshness monitors use a separate gap-based threshold system described below. Both approaches eliminate the need for you to manually set thresholds for most monitors.
Freshness thresholds¶
Rather than a time-series prediction model, Freshness monitors use gap-based thresholds derived from the percentile distribution of intervals between consecutive table updates. As enough history accumulates, the system also infers the table's update schedule — detecting daily, weekly, or sub-daily patterns — and derives tighter, schedule-aware thresholds. Weekends, holidays, and other inactive days are automatically excluded from threshold calculations when configured.
Training mode¶
The prediction model needs approximately 30 monitor runs before it can begin making predictions. During this training period:
- The monitor results display a training indicator on the dashboard.
- No anomalies are flagged — the model is still learning normal patterns.
Once enough training data has been collected, the model starts generating predictions. After each monitor run, the model is updated with the latest data and forecasts expected values for upcoming runs.
Sensitivity¶
The sensitivity setting controls how easily monitors flag anomalies. It applies across all monitors in a table group, but the underlying mechanism differs by monitor type. You can adjust sensitivity at any time from the Monitor Settings dialog.
For Volume and Metric monitors, sensitivity controls the width of the tolerance band (in standard deviations) around the predicted value:
| Sensitivity | Band width | Behavior |
|---|---|---|
| Low | Widest (+/-3.0 SD) | Fewest alerts. Only the most extreme deviations trigger anomalies. |
| Medium (default) | Moderate (+/-2.5 SD) | Balanced detection. Good starting point for most use cases. |
| High | Narrowest (+/-2.0 SD) | Most alerts. Even small deviations trigger anomalies. |
For Freshness monitors, sensitivity controls which percentile of historical update intervals is used as the threshold:
| Sensitivity | Interval tolerance | Behavior |
|---|---|---|
| Low | Widest | Fewest alerts. Only intervals that exceed the 99th percentile of historical gaps trigger anomalies. |
| Medium (default) | Moderate | Balanced detection. Intervals beyond the 95th percentile trigger anomalies. |
| High | Tightest | Most alerts. Intervals beyond the 80th percentile trigger anomalies. |
Alternative: Static and historical thresholds¶
If the automatic threshold system is not appropriate for a particular monitor (for example, if your data does not follow a time-based pattern), you can override it on a per-monitor basis. Volume and Metric monitors support three threshold modes:
- Prediction Model (default) — Uses the SARIMAX time series model to automatically determine expected bounds.
- Historical Calculation — Calculates bounds from recent historical values using an aggregate function (Value, Minimum, Maximum, Sum, Average, or a custom Expression).
- Static Thresholds — You manually specify fixed upper and lower bounds.
Freshness monitors use gap-based thresholds by default. You can override them with Static Thresholds if needed. Historical Calculation is not available for Freshness monitors.
These threshold modes are configured per monitor through the Edit Table Monitors dialog.
Getting started¶
To begin using monitors, you need a table group. Volume and Schema Drift monitors can be set up immediately; Freshness monitors require at least one completed profiling run. See Configure Monitors for setup steps, and View Monitor Results for interpreting the dashboard and investigating anomalies.