Skip to content

Missing and Incomplete Data Issues

Expected values are absent, blank, non-standard, or potentially duplicated.

For an overview of all hygiene issue categories, see Data Hygiene Issues.

No column values present

This column is present in the table, but no values have been ingested or assigned in any records.

Likelihood Possible
Quality dimension Completeness
How it's detected Flagged when every record in the column is null, zero-length, or a dummy value.

This could indicate missing data or a processing error. Note that this considers dummy values and zero-length values as missing data.

Suggested action: Review your source data, ingestion process, and any processing steps that update this column.


Non-standard blank values

Values representing missing data may be unexpected or inconsistent.

Likelihood Definite
Quality dimension Completeness
How it's detected Flagged when empty strings (as opposed to nulls) or dummy placeholder values are found.

Non-standard values may include empty strings, dummy entries such as "MISSING" or repeated characters used to bypass entry requirements, processing artifacts such as "NULL", or spreadsheet artifacts such as "NA" or "ERROR".

Suggested action: Consider cleansing the column upon ingestion to replace all variants of missing data with a standard designation, like NULL.


Small percentage of missing values found

Under 3% of values in this column were found to be null, zero-length, or dummy values, but values are not universally present.

Likelihood Possible
Quality dimension Completeness
How it's detected Flagged when over 97% of values are present but the column is not completely filled.

This could indicate unexpected missing values in a required column.

Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected, supplemented, or excluded.


Potential duplicate values

This column is largely unique, but some duplicate values are present.

Likelihood Possible
Quality dimension Uniqueness
How it's detected Flagged when a column with over 1,000 distinct values has a most-frequent value that appears only 2 to 4 times.

This pattern is uncommon and could indicate inadvertent duplication. The column is nearly unique but not quite — the small number of repeated values may point to a data quality issue.

Suggested action: Review your source data and follow up with data owners to determine whether this data needs to be corrected.


Similar values match when standardized

When column values are standardized (removing spaces, single quotes, periods, and dashes), matching values are found in other records.

Likelihood Likely
Quality dimension Uniqueness
How it's detected Flagged when the count of distinct values drops after standardization (removing common formatting characters).

This may indicate that formats should be further standardized to allow consistent comparisons for merges, joins, and roll-ups. It could also indicate the presence of unintended duplicates.

Suggested action: Review standardized versus raw data values for all matches. Correct data if values should be consistent.