Profiling Statistics¶
This page describes the profiling statistics that TestGen collects and displays for each column in the Data Catalog. The statistics shown depend on the column's general data type: Alpha (text), Numeric, Datetime, or Boolean.
All column types¶
Every column shows Row Count and Value Count. If profiling could not determine exact row counts, an approximate count based on server statistics is shown instead.
Alpha (text) columns¶
Text columns include the richest set of profiling statistics:
- Missing Values — Breakdown of actual values, nulls, zero-length strings, and dummy values.
- Duplicate Values — Distinct vs. duplicate value counts. If standardized values differ from raw values, a separate bar shows standardized duplicates.
- Case Distribution — Distribution across mixed case, lower case, upper case, and non-alpha values.
- Frequent Values — Most common values and their counts.
- Frequent Patterns — Most common value patterns and their counts.
- Percentage indicators — Digits, Numeric Values, Zero Values, Date Values, Quoted Values, Leading Spaces, Embedded Spaces.
- Length statistics — Minimum, maximum, and average length.
- Text range — Minimum and maximum text values.
- Standard Pattern Match — Whether values match a recognized pattern (e.g., Email, Phone, Street Address, SSN, Credit Card).
Numeric columns¶
- Numeric Distribution — Non-zero values, zero values, and nulls.
- Statistical attributes — Distinct values, average, standard deviation, minimum, minimum > 0, maximum, 25th/50th/75th percentiles.
- Box plot — Visual representation of the value distribution showing quartiles, median, and average.
Datetime columns¶
- Values vs. Null — Proportion of non-null vs. null values.
- Date range indicators — Before 1 Year, Before 5 Years, Before 20 Years, Within 1 Year, Within 1 Month, and Future Dates.
- Date range — Minimum and maximum dates, and distinct value count.
Boolean columns¶
- Boolean Distribution — True, false, and null value counts.