Dictionary¶
Dictionary data sources and sinks enable recipes to use in-memory data as runtime variables or employ Jinja functions to manage or transform files for downstream inputs and testing. It is essentially a JSON dictionary, which can be shared across any sources and sinks with the same data store (for example, an Amazon S3 bucket).
This connector provides flexibility in recipe building beyond the standard runtime variables available for sources and sinks in node configurations.
- Use a Dictionary data source to send data via a runtime variable to a data sink with a DataMapper node.
- Use a Dictionary to move data from a source to a runtime variable for use in tests or as ingredient inputs.
- Use a Dictionary data source in a container node to copy a shared recipe file into the container.
- Use a Dictionary data sink to put data from a different source into a runtime variable and use in downstream nodes to process files.
Warning
Dictionaries support small datasets. Dictionary data sources and sinks have a data storage limit of 1 MB and are not intended for use with large datasets.
Dictionary data sources and sinks are in the system category of I/O connectors.
See Dictionary Setup Examples for a guided tour of a dictionary data sink configuration.
Connector type values¶
The "type": value to use in the source or sink JSON files.
| Connector type | value |
|---|---|
| Data source | DKDataSource_Dictionary |
| Data sink | DKDataSink_Dictionary |
Connection properties¶
The Dictionary data sources and sinks do not require connection configuration as they leverage your Automation user account credentials from your current session.
System source and sink properties¶
| Field | Type | Required? | Description |
|---|---|---|---|
set-runtime-vars |
dictionary | no | Used to declare runtime variables set equal to built-in variables. See File-Based Source and Sink Variables for more information. |
Source and sink properties¶
| Field | Type | Required? | Description |
|---|---|---|---|
bucket-name |
alphanumeric, underscore (_) | no | Specifies an in-memory location where the dictionary is stored. Different data sources and sinks with same bucket-name share access to dictionary and its data. When bucket-name is not defined, dictionary access is only available to data sources and sinks that share the same name. |
Step (key) properties¶
| Field | Type | Required? | Description |
|---|---|---|---|
variable |
alphanumeric, see runtime variable in Naming Conventions | no | Supported in data sinks only Specifies the name of a runtime variable that will be used to store the data being sent to this key. |
Data source logic¶
The system follows this process for each dictionary source input:
-
First, check for a value defined as an input mapping. If a value exists, map it into the specified container file.
An input mapping is stored in the node's
data_sources/*.jsonfile, where keyname is the Mapping Name entry and value is the source's JSON Value entry. -
If there is no value defined as an input mapping, check the bucket defined in Source Connections for a value. If a value exists, map it into the specified container file.
- The default bucket is an Automation internal resource and shares the name of the source, but users may designate another bucket name in the Connections tab or in
data_sources/*.json. Any other sources and sinks with the same bucket name will share the data stored in them.
- The default bucket is an Automation internal resource and shares the name of the source, but users may designate another bucket name in the Connections tab or in
Note
Dictionary source priority and syntax: Input mapping key values take precedence over data already stored in a shared bucket by a dictionary sink from a previous node. If the dictionary is intended to retrieve data from a previous sink, the Target Variable field in the output mapping must have an empty value, such as "" or null or [] or 0 or {}.
Data source examples¶
Example 1¶
This example shows three potential key values for moving data into a container.
{
"name" : "my_dict_datasource",
"type" : "DKDataSource_Dictionary",
"config": {
"bucket-name" : "source"
},
"keys" : {
"mapping1" : { ".... some arbitrary JSON data ... "},
"mapping2" : "{{the_value_of_a_variable_you_want_to_send_to_a_sink}}",
"mapping3" : "{{load_text('some_file_you_put_in_resources_to_be_sent.csv')}}"
}
}
Example 2¶
This example shows how a dictionary data source is used to copy a Python script into the container. It uses the load_text Jinja function to get the file from the recipe's resources directory, where it can be shared among multiple container nodes. This particular script is used to either start or stop an instance of a SQL Server Analysis Service.
Composite mappings in the Inputs tab of the Node Editor
- Mapping Name: manage_ssas
- Source JSON Value:"{{load_text('manage_ssas_instance.py')}}"
- Container Target File Path: manage_ssas_instance.py
data_sources/dict_datasource.json
{
"name": "dict_datasource",
"type": "DKDataSource_Dictionary",
"config": {
"bucket-name": "source"
},
"keys": {
"manage_ssas": "{{load_text('manage_ssas_instance.py')}}"
}
}
notebook.json
{
"image-repo": "{{dockerhubConfig.image_repo.general_purpose}}",
"image-tag": "{{dockerhubConfig.image_tag.general_purpose}}",
"dockerhub-namespace": "{{dockerhubConfig.namespace.general_purpose}}",
"container-input-file-keys": [
{
"filename": "manage_ssas_instance.py",
"key": "dict_datasource.manage_ssas"
}
],
"tests": {
"test_success": {
"action": "stop-on-error",
"test-variable": "success",
"type": "test-contents-as-boolean",
"test-logic": {
"test-compare": "equal-to",
"test-metric": "True"
}
}
}
}
Data sink logic¶
The system follows this process for each dictionary sink output:
- First, check for a value defined as an output mapping Target Variable.
- If a value exists, set it on the target runtime variable.
- If there is no target variable defined, check the bucket defined in Source Connections for a value. If a value exists, set it on the key in the bucket.
- The default bucket is an Automation internal resource and shares the name of the sink, but users may designate another shared bucket name in the Connections tab or in
data_sinks/*.json. Any other sources and sinks with the same bucket name will share the data stored in them.
- The default bucket is an Automation internal resource and shares the name of the sink, but users may designate another shared bucket name in the Connections tab or in
Data sink example¶
In this example, there are two keys showing the two possible paths for storing sink values.
- mapping1 will send the data to a runtime variable called
output_target_var. is a placeholder for the location to send data to the shared bucket. - mapping2 is null, triggering the system to send data to the default bucket.