DataMapper Nodes¶

This node type moves data from a data source to a data sink destination by "mapping" the data between the two locations. DataMapper nodes can map single files between a source and a sink, or they can multiple files using wildcards.

Tip

DataMapper nodes are most efficient when working with less than 5 GB of data since the process to move data involves the additional step of mapping. For workflows that require some type of data transformation during a data transfer, consider using a container node, as container nodes can move and alter data.

description.json¶

Note

Node type value: the value for the DataMapper node type is DKNode_DataMapper.

This system file sets the node type and provides a description field to describe a node's purpose. The file is not used for other purposes; other entries are ignored.

{
   "type" : "DKNode_DataMapper",
   "description": "[YOUR DESCRIPTION HERE]"
}

Map single files¶

Every DataMapper node contains a list of mappings between the data source and data sink configured for the node. These mappings reference the source and sink names and the specific keys or files to be mapped between them.

A single-file mapping tells the platform to get a file identified by a source key within the mapping definition and put it in the target file path defined by a sink key. The mapping is defined in the notebook.json system file, and the JSON files in the data_sources and data_sinks directories supply the details.

notebook.json

{
  "mappings": {
    "mapping1": {
      "source-name": "source",
      "source-key": "mapping1_sourcefile",
      "sink-name": "sink",
      "sink-key": "mapping1_sinkfile"
    }
  }
}

data_sources/source.json

{
  "name": "source",
  "type": "DKDataSource_SFTP",
  "config-ref": "sftpConfig",
  "keys": {
    "mapping1_sourcefile": {
      "file-key": "input-folder/file.csv",
      "use-only-file-key": true
    }
  }
}

data_sinks/sink.json

{
  "name": "sink",
  "type": "DKDataSink_S3",
  "config-ref": "sftpConfig",
  "keys": {
    "mapping1_sinkfile": {
      "file-key": "output-folder/file.csv",
      "use-only-file-key": true
    }
  }
}

Note

Source and sink names: if the DataMapper node is created in the web app, the source and sink are automatically named "source" and "sink"; and all mappings will use those default names as well. If the DataMapper node is created with different names using the File Editor or CLI, the web app will use those names in the mappings.

Map multiple files with wildcards¶

A DataMapper node can extract a list of files using a wildcard defined in the data_sources JSON file. The following example illustrates how a wildcard can be used to get an undefined number of CSV files from an SFTP source and put them in an S3 sink.

Note that source and sink folder paths may also include wildcards, limited to the final segment of a path. See File-Based Wildcards for more information.

notebook.json

{
  "wildcard-will-automatically-create-mappings":  [
    {
      "data-source": "source",
      "data-sink": "sink"
    }
  ]
}

data_sources/source.json

{
  "name": "source",
  "type": "DKDataSource_SFTP",
  "config-ref": "sftpConfig",
  "wildcard": "*.csv",
  "wildcard-key-prefix": "input-wildcard-folder"
}

data_sinks/sink.json

{
  "name": "sink",
  "type": "DKDataSink_S3",
  "config-ref": "s3Config",
  "wildcard-key-prefix": "output-wildcard-folder"
}

Visual examples in Automation¶

Users can add mappings or wildcards to a DataMapper node using the Node Editor.

(DataMapper Node tab)

A DataMapper mapping defines the source and sink file paths.

(DataMapper Node fields)

A DataMapper wildcard mapping defines the paths and filename pattern.

(DataMapper Node with wildcard)