Data Catalog¶
Use the Data Catalog to investigate data quality issues, explore profiling results across tables and columns, and organize your data assets with metadata tags. The catalog brings together profiling statistics, hygiene issues, test results, PII detection, and quality scores for every table and column in a table group.

Prerequisites¶
- A table group configured for your database connection.
- At least one completed profiling run for that table group. Profiling discovers your tables and columns and populates the catalog with metadata and statistics.
The catalog automatically reflects the latest results as you run additional profiling and testing cycles. Run profiling regularly to keep your catalog current.
Explore your data¶
Select a Table Group from the dropdown at the top of the page. The left panel displays a hierarchical tree of tables and columns — click any item to view its details in the right panel.
Use Search to filter the tree by table or column name. For more targeted exploration, click the filter icon to narrow the tree to specific Critical Data Elements or metadata tags (such as Business Domain, Data Source, or Transform Level).
Investigate table and column quality¶
When you select a table or column, the detail panel shows its data quality scores (overall, profiling, and testing), along with any detected issues. Use this view to assess the health of specific data assets and decide where to focus remediation.
Spot issues¶
The detail panel surfaces three categories of issues:
- Potential PII — Columns where profiling detected patterns matching personally identifiable information, with PII type and risk level.
- Hygiene Issues — Data quality anomalies from profiling, categorized by likelihood (Definite, Likely, or Possible). See Data Hygiene Issues for details.
- Test Issues — Failures and warnings from the most recent test run for each associated test suite. Only confirmed results are shown; dismissed results are excluded.
Each issue category links to its full detail view for further investigation.
Review profiling results¶
For columns, the detail panel shows value distributions and profiling statistics tailored to the column's data type — including missing value breakdowns, frequency distributions, statistical summaries, and pattern analysis. For a full breakdown of what profiling collects for each column type, see Profiling Statistics.
You can also review a column's suggested data type when profiling detects that a more appropriate type exists (e.g., a varchar column that only contains integers).
Preview source data¶
Click Data Preview to retrieve up to 100 distinct sample rows directly from the source database without leaving the catalog. For tables, the preview shows all columns; for individual columns, it shows distinct values. This is useful for verifying what the actual data looks like when investigating an issue.
Track changes over time¶
Click History on a column's value distribution to compare profiling results across runs. Select any past profiling run to see the statistics for that point in time, making it easy to spot trends or regressions.
Generate a table CREATE script¶
For any table, you can generate a SQL CREATE TABLE statement that uses the suggested data types from profiling. Where the suggested type differs from the current database type, the script includes a comment noting the original type (e.g., -- WAS varchar(255)). This is useful for reviewing type optimization recommendations or generating migration scripts.
Organize and classify your data¶
The Data Catalog lets you annotate tables and columns with metadata tags, descriptions, and a Critical Data Element flag. These annotations help you classify your data assets, filter the catalog, and influence how quality scores are weighted.
Tag tables and columns¶
Select a table or column to assign metadata tags across eight categories (see Metadata tags below). Tag values support autocomplete based on values already used across your catalog to help maintain consistency.
You can also add a free-text Description to provide business context or usage notes.
Tip
Tags set on a table group are inherited by its tables, and tags set on a table are inherited by its columns. You can override any inherited value at a lower level. This means you can tag at the table group level and only override where needed.
Mark critical data elements¶
The Critical Data Element (CDE) flag marks tables and columns as high-priority data assets. Critical data elements receive greater weight in quality score rollups and can be used to generate a separate CDE score on scorecards.
The CDE flag supports three states: Yes, No, or Inherit (from the parent table or table group).
Edit tags in bulk¶
To update tags across multiple tables and columns at once:
- Toggle on Edit multiple in the tree toolbar.
- Select the tables and columns you want to update. Selecting a table automatically includes all its columns.
- In the batch edit form, check the tags you want to change and enter the new values. Unchecked tags keep their current values.
- Click Save.
Export catalog data¶
Click Export to download an Excel report with profiling statistics, data types, value distributions, tag values, CDE status, and descriptions. You can scope the export to all columns, only the columns matching your current filters, or only selected columns (when multi-edit mode is active).
Reference¶
Metadata tags¶
| Tag | Description |
|---|---|
| Data Source | Original source of the dataset |
| Source System | Enterprise system source for the dataset |
| Source Process | Process, program, or data flow that produced the dataset |
| Business Domain | Business division responsible for the dataset, e.g., Finance, Sales, Manufacturing |
| Stakeholder Group | Data owners or stakeholders responsible for the dataset |
| Transform Level | Data warehouse processing stage, e.g., Raw, Conformed, Processed, Reporting, or Medallion level (Bronze, Silver, Gold) |
| Aggregation Level | Data granularity of the dataset, e.g., Atomic, Historical, Snapshot, Aggregated, Time-Rollup, Rolling, Summary |
| Data Product | Data domain that comprises the dataset |