Value and Pattern Consistency Issues¶
Values are inconsistent within a column, across tables, or with expected value sets.
For an overview of all hygiene issue categories, see Data Hygiene Issues.
Pattern inconsistency within column¶
Alpha-numeric string data within this column conforms to 2 to 4 different patterns, with 95% matching the first pattern.
| Likelihood | Likely |
| Quality dimension | Validity |
| How it's detected | Flagged when a text column containing alphanumeric patterns has 2-4 distinct patterns, with the dominant pattern covering at least 95% of values. |
This could indicate data errors in the remaining values that don't conform to the most common pattern.
Suggested action: Review the values for any data that doesn't conform to the most common pattern and correct any data errors.
Pattern inconsistency across tables¶
Alpha-numeric string data within this column matches a single pattern, but other columns with the same name have data that matches a different single pattern.
| Likelihood | Likely |
| Quality dimension | Validity |
| How it's detected | Flagged when a column has a consistent single pattern but columns with the same name in other tables in the table group have a different consistent pattern. |
Inconsistent formatting may contradict user assumptions and cause downstream errors, extra steps, and inconsistent business logic.
Suggested action: Review the profiled patterns for the same column in other tables. You may want to add a step in your processing to make patterns consistent.
Multiple data types per column name (major)¶
Columns with the same name have broadly different types across tables.
| Likelihood | Likely |
| Quality dimension | Consistency |
| How it's detected | Flagged when columns with the same name in different tables have different general types (e.g., one is text and the other is numeric). |
Differences could be significant enough to cause errors in downstream analysis, extra steps resulting in divergent business logic, and inconsistencies in results.
Suggested action: Ideally, you should change the column data types to be fully consistent. If the data is meant to be different, you should change column names so downstream users aren't led astray.
Multiple data types per column name (minor)¶
Columns with the same name have the same general type across tables, but the types do not exactly match.
| Likelihood | Possible |
| Quality dimension | Consistency |
| How it's detected | Flagged when columns with the same name in different tables share the same general type but differ in specific type (e.g., VARCHAR(50) vs VARCHAR(100)). |
Truncation issues may result if columns are commingled and assumed to be the same format.
Suggested action: Consider changing the column data types to be fully consistent. This will tighten your standards at ingestion and ensure that data is consistent between tables.
Unexpected boolean values¶
This column appears to contain boolean (True/False) data, but unexpected values were found.
| Likelihood | Likely |
| Quality dimension | Validity |
| How it's detected | Flagged when the most frequent values include one boolean value (e.g., "true") but the expected counterpart (e.g., "false") is absent. |
This could indicate inconsistent coding for the same intended values, potentially leading to downstream errors or inconsistent business logic.
Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected.
Variant codings for same values¶
This column contains more than one common variant that represents a single value or state.
| Likelihood | Definite |
| Quality dimension | Consistency |
| How it's detected | Flagged when a column with 20 or fewer distinct values contains recognized variant codings for the same concept (e.g., "USA", "US", "United States"). |
This can occur when data is integrated from multiple sources with different standards, or when free entry is permitted without validation. The variations can cause confusion and error for downstream data users and multiple versions of the truth.
Suggested action: Review your source data and ingestion process. Consider cleansing this data to standardize on a single set of definitive codes.
Small percentage of divergent values¶
Under 3% of values in this column were found to be different from the most common value.
| Likelihood | Possible |
| Quality dimension | Validity |
| How it's detected | Flagged when the most frequent value accounts for over 97% but less than 100% of values. |
This could indicate data errors in the minority values. The column is nearly constant but not quite — the divergent values may be incorrect entries.
Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected.
Unexpected numeric values¶
Under 3% of values in this column were found to be numeric in an otherwise text column.
| Likelihood | Likely |
| Quality dimension | Validity |
| How it's detected | Flagged when a text column has a small fraction (under 3%) of numeric values. |
A small number of numeric values in a text column could indicate data entry errors or mixed content.
Suggested action: Review your source data and follow up with data owners to determine whether numeric values are invalid entries here.