Skip to content

Value and Pattern Consistency Issues

Values are inconsistent within a column, across tables, or with expected value sets.

For an overview of all hygiene issue categories, see Data Hygiene Issues.

Pattern inconsistency within column

Alpha-numeric string data within this column conforms to 2 to 4 different patterns, with 95% matching the first pattern.

Likelihood Likely
Quality dimension Validity
How it's detected Flagged when a text column containing alphanumeric patterns has 2-4 distinct patterns, with the dominant pattern covering at least 95% of values.

This could indicate data errors in the remaining values that don't conform to the most common pattern.

Suggested action: Review the values for any data that doesn't conform to the most common pattern and correct any data errors.


Pattern inconsistency across tables

Alpha-numeric string data within this column matches a single pattern, but other columns with the same name have data that matches a different single pattern.

Likelihood Likely
Quality dimension Validity
How it's detected Flagged when a column has a consistent single pattern but columns with the same name in other tables in the table group have a different consistent pattern.

Inconsistent formatting may contradict user assumptions and cause downstream errors, extra steps, and inconsistent business logic.

Suggested action: Review the profiled patterns for the same column in other tables. You may want to add a step in your processing to make patterns consistent.


Multiple data types per column name (major)

Columns with the same name have broadly different types across tables.

Likelihood Likely
Quality dimension Consistency
How it's detected Flagged when columns with the same name in different tables have different general types (e.g., one is text and the other is numeric).

Differences could be significant enough to cause errors in downstream analysis, extra steps resulting in divergent business logic, and inconsistencies in results.

Suggested action: Ideally, you should change the column data types to be fully consistent. If the data is meant to be different, you should change column names so downstream users aren't led astray.


Multiple data types per column name (minor)

Columns with the same name have the same general type across tables, but the types do not exactly match.

Likelihood Possible
Quality dimension Consistency
How it's detected Flagged when columns with the same name in different tables share the same general type but differ in specific type (e.g., VARCHAR(50) vs VARCHAR(100)).

Truncation issues may result if columns are commingled and assumed to be the same format.

Suggested action: Consider changing the column data types to be fully consistent. This will tighten your standards at ingestion and ensure that data is consistent between tables.


Unexpected boolean values

This column appears to contain boolean (True/False) data, but unexpected values were found.

Likelihood Likely
Quality dimension Validity
How it's detected Flagged when the most frequent values include one boolean value (e.g., "true") but the expected counterpart (e.g., "false") is absent.

This could indicate inconsistent coding for the same intended values, potentially leading to downstream errors or inconsistent business logic.

Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected.


Variant codings for same values

This column contains more than one common variant that represents a single value or state.

Likelihood Definite
Quality dimension Consistency
How it's detected Flagged when a column with 20 or fewer distinct values contains recognized variant codings for the same concept (e.g., "USA", "US", "United States").

This can occur when data is integrated from multiple sources with different standards, or when free entry is permitted without validation. The variations can cause confusion and error for downstream data users and multiple versions of the truth.

Suggested action: Review your source data and ingestion process. Consider cleansing this data to standardize on a single set of definitive codes.


Small percentage of divergent values

Under 3% of values in this column were found to be different from the most common value.

Likelihood Possible
Quality dimension Validity
How it's detected Flagged when the most frequent value accounts for over 97% but less than 100% of values.

This could indicate data errors in the minority values. The column is nearly constant but not quite — the divergent values may be incorrect entries.

Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected.


Unexpected numeric values

Under 3% of values in this column were found to be numeric in an otherwise text column.

Likelihood Likely
Quality dimension Validity
How it's detected Flagged when a text column has a small fraction (under 3%) of numeric values.

A small number of numeric values in a text column could indicate data entry errors or mixed content.

Suggested action: Review your source data and follow up with data owners to determine whether numeric values are invalid entries here.