Skip to content

GPC docker-share Directory

This required directory stores any scripts or files that need to be executed in the node.

Additional scripts and files can be added to the directory as needed. To maintain expected behaviors, scripts must be referenced in the config.json file.

When a GPC conditional node runs, the system places any files defined in the docker-share folder of the node's Configuration tab into the docker-share folder of the GPC itself, allowing the recipe files stored in Git to become runtime files at execution. For more information, see Create a Conditional Node and GPC Best Practices.

Supported script types

  • .sh (bash/shell scripts)
  • .py (Python3 scripts)
  • .jpynb (Jupyter Notebook scripts)

For other languages and tools, DataKitchen offers other container images, or you can build your own images to support more script types.

docker-share/config.json

The config.json system file defines the container runtime execution. It is located within the docker-share directory.

This file identifies apps to be installed in the container, scripts to be executed in the container, variables to use in the scripts, and values to export for testing.

Default config.json file format

{
    "apt-dependencies": [ ],
    "dependencies": [ ],
    "keys": {
        "run_script": {
            "script": "value",
            "environment": {},
            "parameters": {},
            "export": [ ]
        }
    }
}

config.json properties and descriptions

The following tables provide descriptions of the config.json properties and expected values.

Top-level config properties

Top-level property Description and comments Formatting examples
apt-dependencies

The list of software packages to be installed in the container at runtime via apt-get (Ubuntu).

This list can be empty.

One package: ["datamash"]

With version: ["datamash=1.4-1"]

Multiple:

[
  "datamash=1.4-1",
  "rsync"
]

dependencies

The list of software packages to be installed in the container via pip (Python).

This list can be empty.

These strings support requirement specifiers.

One package: ["psutil"]

With version: ["pycryptodome==3.17"]

Multiple:

[
  "pycryptodome==3.17",
  "psutil"
]

keys

The dictionary of scripts to be executed in the container.

If there are multiple scripts, they will be run in the order listed in this config file.

For example, an initial script could fix any dependency issues, then a subsequent script could run for a data transformation.

Consult the following table for the "keys" properties.

"keys" properties

"keys" property Description and comments Formatting example
run_script

The field containing the script to be executed in the container node, along with its associated parameters and variables.

The default field name is run_script, but it can be named anything. DataKitchen recommends naming this field for the script it's running.

You can include several of these fields in config.json, one for each script you want to run in the container node. The field names must be unique.

"run_script": {
  "script": "value",
  "environment": { },
  "parameters": { },
  "export": ["value"]
}
script

The name of the script (.sh, .py, .ipynb) to be executed against existing assets.

This field is required and cannot be empty.

"script": "script_example.py"
environment

The dictionary of environment variables to be set for shell scripts, Python scripts, and Jupyter Notebooks.

This field is optional.

  • This environment field is used to assign variables and vault secrets to inject into shell scripts since the parameters field is not supported for shell scripts.
  • This field is not as important for Python scripts, as all variables can be assigned using parameters.

Scripts fail on secrets. Your script will fail if it includes vault expressions or Jinja variable expressions that resolve to vault secrets. Instead, add the secret to a parameter or environment variable. This validation is a security measure to protect your toolchain credentials.

"environment": {
  "BRANCH": "{{branch}}"
}
parameters

The dictionary of parameters to be set in the container.

This field is where you can define parameters and the vault paths for toolchain credentials or secrets that need to be injected in a Python script or Jupyter Notebook. Another common use of parameters is to reference runtime variables generated by upstream nodes in the recipe graph for use in executing the container node.

This field is optional.

  • Parameters do not apply to shell scripts. Use environment variables instead.

Accepts three entry types: a name/value string, a vault secret, and a variable.

"parameters": {
  "my_param": "value",
  "vault_param":
    "#{vault://path/secret}",
  "jinja_param":
    "{{my_variable}}"
}
export

The list of variable names to export from the container for use in tests. When exported, these variables will be available for other nodes.

This field is optional.

  • Export variables are restricted to a 1 MB maximum size.
  • The export field is not used in shell scripts.
  • The workaround for exporting values from shell script execution is to use echo commands in the script to write value outputs to a file in the docker-share directory, then assign a variable in notebook.json to the contents of that file. Finally, configure tests using that assigned variable. See Shell Script Exports for more information.
  • Any variable listed here should be declared as a global variable with a defined value within the Python script.
"export": [ "success" ]

config.json example

The following is a complete example that incorporates the examples shown in the table above.

{
    "apt-dependencies": [
        "datamash=1.4-1",
        "rsync"
    ],
    "dependencies": [
        "pycryptodome==3.17",
        "psutil"
    ],
    "keys": {
        "run_script": {
            "script": "script_example.py",
            "environment": {
              "BRANCH": "{{branch}}"
            },
            "parameters": {
                "vault_parameter": "#{vault://dockerhub/secret_name}"
            },
            "export": [
                "success"
            ]
        }
    }
}

Inputs and outputs

If the container node has a source, any files from the Inputs mappings defined in the node are placed in the docker-share folder by the system. If the container node has a sink, any files for the Outputs mappings defined in the node are retrieved from the docker-share folder.

The setup for mappings in general container nodes applies to GPC container nodes as well. See Configure Container Nodes and Container Notebook Properties for details on input and output mappings.

Troubleshooting: Pinned Package Dependencies