Skip to content

Mapping with Variables Example

  • Use Case: When you need to use the same configurations in multiple variations, consider using variables for source and sink connections and for file paths and filenames.
  • Example Recipe: In this example, the recipe uses a star schema modeling approach to data analysis. It takes a global superstore data set, offered by Tableau for training purposes, and splits it into different dimensions, such as product, orders, and customers. It then creates a fact table to store information for analysis, applies a machine learning model to predict seasonal ordering, and finally updates a data catalog.
  • DataMapper Role: The recipe begins with a DataMapper node that copies the data file from an SFTP server and places it in an Amazon S3 data lake.

()

A DataMapper node serves as the root node of this recipe to get the raw data for the workflow.

Web App Configuration

The following images show how the node is configured using the web app forms.

  • Connections to the data source and data sink
  • Mapping to source file and the runtime variables to record
  • Mapping to sink path and the runtime variables to record
  • Variables set at multiple levels to define common connection and path information
  • Tests configurations using the runtime variables captured

()

The data source and data sink connections configured in the web app

()

The data source mapping configured in the web app for the DataMapper node.

()

The data sink mapping configured in the web app for the DataMapper node.

()

The variables defined for the recipe and kitchen for use in the DataMapper node settings.

()

Test configured to run against the source file to verify that the expected data is present.

()

Test to run against the source file to ensure that the file is an expected size.

()

Test to run against the sink to validate that file contents did not get corrupted in transit.

File Contents

The platform records the configurations shown above from web app pages in the appropriate node files: notebook.json, data_sources/source.json, and data_sinks/sink.json. The recipe variables can be found in the variables.json recipe file.

notebook.json

{
    "mappings": {
        "global_superstore_data": {
            "source-name": "source",
            "source-key": "global_superstore_data_source",
            "sink-name": "sink",
            "sink-key": "global_superstore_data_sink"
        }
    }
}
 

data_sources/source.json

{
    "name": "source",
    "type": "DKDataSource_SFTP",
    "config": {
        "username": "{{sftpConfig.username}}",
        "hostname": "{{sftpConfig.hostname}}",
        "pem_file": "{{sftpConfig.key_file}}"
    },
    "keys": {
        "global_superstore_data_source": {
            "file-key": "{{global_superstore_sftp_path}}",
            "use-only-file-key": true,
            "set-runtime-vars": {
                "size": "sftp_global_superstore_data_file_size",
                "row_count": "sftp_global_superstore_data_line_count",
                "md5": "sftp_global_superstore_data_md5_hash"
            }
        }
    },
    "tests": {
        "Ensure_SFTP_Min_Row_Count": {
            "action": "warning",
            "test-variable": "sftp_global_superstore_data_line_count",
            "type": "test-contents-as-integer",
            "test-logic": {
                "test-compare": "greater-than",
                "test-metric": 1500
            }
        },
        "Ensure_SFTP_Min_File_Size_in_Bytes": {
            "action": "warning",
            "test-variable": "sftp_global_superstore_data_file_size",
            "type": "test-contents-as-integer",
            "test-logic": "sftp_global_superstore_data_file_size > 1024*512"
        }
    }
}
 

data_sinks/sink.json

{
    "name": "sink",
    "type": "DKDataSink_S3",
    "config": {
        "access-key": "{{s3Config.access_key}}",
        "secret-key": "{{s3Config.secret_key}}",
        "bucket": "{{s3Config.bucket}}"
    },
    "keys": {
        "global_superstore_data_sink": {
            "file-key": "{{global_superstore_s3_path}}",
            "use-only-file-key": true,
            "set-runtime-vars": {
                "size": "s3_global_superstore_data_file_size",
                "row_count": "s3_global_superstore_data_line_count",
                "md5": "s3_global_superstore_data_md5_hash"
            }
        }
    },
    "tests": {
        "Validate_MD5_Hash": {
            "action": "stop-on-error",
            "test-variable": "sftp_global_superstore_data_md5_hash",
            "type": "test-contents-as-string",
            "test-logic": {
                "test-compare": "equal-to",
                "test-metric": "s3_global_superstore_data_md5_hash"
            }
        }
    }
}