GPC docker-share Directory¶
This required directory stores any scripts or files that need to be executed in the node.
Additional scripts and files can be added to the directory as needed. To maintain expected behaviors, scripts must be referenced in the config.json file.
When a GPC conditional node runs, the system places any files defined in the docker-share folder of the node's Configuration tab into the docker-share folder of the GPC itself, allowing the recipe files stored in Git to become runtime files at execution. For more information, see Create a Conditional Node and GPC Best Practices.
Supported script types¶
.sh(bash/shell scripts).py(Python3 scripts).jpynb(Jupyter Notebook scripts)
For other languages and tools, DataKitchen offers other container images, or you can build your own images to support more script types.
docker-share/config.json¶
The config.json system file defines the container runtime execution. It is located within the docker-share directory.
This file identifies apps to be installed in the container, scripts to be executed in the container, variables to use in the scripts, and values to export for testing.
Default config.json file format¶
{
"apt-dependencies": [ ],
"dependencies": [ ],
"keys": {
"run_script": {
"script": "value",
"environment": {},
"parameters": {},
"export": [ ]
}
}
}
config.json properties and descriptions¶
The following tables provide descriptions of the config.json properties and expected values.
Top-level config properties¶
| Top-level property | Description and comments | Formatting examples |
|---|---|---|
| apt-dependencies |
The list of software packages to be installed in the container at runtime via apt-get (Ubuntu). This list can be empty. |
One package: With version: Multiple:
|
| dependencies |
The list of software packages to be installed in the container via pip (Python). This list can be empty. These strings support requirement specifiers. |
One package: With version: Multiple:
|
| keys |
The dictionary of scripts to be executed in the container. If there are multiple scripts, they will be run in the order listed in this config file. For example, an initial script could fix any dependency issues, then a subsequent script could run for a data transformation. |
Consult the following table for the "keys" properties. |
"keys" properties¶
| "keys" property | Description and comments | Formatting example |
|---|---|---|
| run_script |
The field containing the script to be executed in the container node, along with its associated parameters and variables. The default field name is You can include several of these fields in |
|
| script |
The name of the script (.sh, .py, .ipynb) to be executed against existing assets. This field is required and cannot be empty. |
"script": "script_example.py" |
| environment |
The dictionary of environment variables to be set for shell scripts, Python scripts, and Jupyter Notebooks. This field is optional.
Scripts fail on secrets. Your script will fail if it includes vault expressions or Jinja variable expressions that resolve to vault secrets. Instead, add the secret to a parameter or environment variable. This validation is a security measure to protect your toolchain credentials. |
|
| parameters |
The dictionary of parameters to be set in the container. This field is where you can define parameters and the vault paths for toolchain credentials or secrets that need to be injected in a Python script or Jupyter Notebook. Another common use of parameters is to reference runtime variables generated by upstream nodes in the recipe graph for use in executing the container node. This field is optional.
|
Accepts three entry types: a name/value string, a vault secret, and a variable.
|
| export |
The list of variable names to export from the container for use in tests. When exported, these variables will be available for other nodes. This field is optional.
|
"export": [ "success" ] |
config.json example¶
The following is a complete example that incorporates the examples shown in the table above.
{
"apt-dependencies": [
"datamash=1.4-1",
"rsync"
],
"dependencies": [
"pycryptodome==3.17",
"psutil"
],
"keys": {
"run_script": {
"script": "script_example.py",
"environment": {
"BRANCH": "{{branch}}"
},
"parameters": {
"vault_parameter": "#{vault://dockerhub/secret_name}"
},
"export": [
"success"
]
}
}
}
Inputs and outputs¶
If the container node has a source, any files from the Inputs mappings defined in the node are placed in the docker-share folder by the system. If the container node has a sink, any files for the Outputs mappings defined in the node are retrieved from the docker-share folder.
The setup for mappings in general container nodes applies to GPC container nodes as well. See Configure Container Nodes and Container Notebook Properties for details on input and output mappings.
Troubleshooting: Pinned Package Dependencies