Skip to content

GPC Best Practices

This document provides some best practices that can help you get the most out of a General Purpose Container (GPC) container node.

Configure the image tag

When configuring a GPC, the docker container image version is set by a tag. The tag can be specified in the Connections tab of the Node Editor and is stored in the notebook.json file.

You can find a list of the GPC image tags here: https://hub.docker.com/r/datakitchenprod/dk_general_purpose_container/tags

Best practice: Use a fixed version

DataKitchen recommends setting the Tag to a specific version, for example, v0.167.

Using a fixed version of the GPC means that:

  • Future GPC updates and version releases do not automatically apply to your recipe nodes.
  • You can update your nodes to newer versions on a schedule that suits your workflow.

Tip

Try to update your GPC tag so you're never more than two versions behind.

docker-share directory

The docker-share directory, or docker-share folder, is the runtime file system of a GPC. For processes that occur in a GPC, this designated folder should be the only location that stores files and scripts to execute at runtime.

Don't rely on files located outside of the docker-share folder or on the internal file structure of the GPC itself. The system can behave unpredictably if a file or script used by a GPC is located outside of docker-share.

Best practice: referencing files

If a file saved in the docker-share directory is needed by a script at runtime, that file can be referenced in the script:

  1. With the Python import statement from <file_name> import <function_name>, if the file being called is .py.
  2. As docker-share/<filename>, for any file type.
  3. Using os.path.join(os.path.dirname(__file__), "<filename>"), for any file type.

Example

The following example shows a primary script, example.py (recorded correctly in the config.py GPC file) referencing 3 additional files. All 5 files are saved in the docker-share folder.

config.json

{
    "apt-dependencies": [],
    "dependencies": [],
    "keys": {
        "run_script": {
            "script": "example.py",
            "environment": {},
            "parameters": {},
            "export": []
        }
    }
}

example.py

import os
from utilities import handle_json_file, handle_text_file

json_path = os.path.join(os.path.dirname(__file__), "file.json")
with open(json_path) as file:
    handle_json_file(file)

text_path = "docker-share/data.txt"
with open(text_path) as file:
    handle_text_file(file)

utilities.py

import json

def handle_json_file(file):
    print(json.load(file))

def handle_text_file(file):
    print(file.read())

file.json

{
    "value": "xyz",
    "metric": 34
}

data.txt

lorem ipsum

config.json

This system file identifies apps to be installed in the container, scripts to be executed in the container, variables to use in the scripts, and values to export for testing.

Best practice: additional packages

Use the "dependencies": property to specify a list of software packages to install in the container.

If you install additional Python packages, you should pin each to a specific version using the requirement specifier syntax package==version. This ensures that containers will not attempt to download the latest package at order runtime. If your order runs use different versions of Python apps, you may see inconsistent results.

Note

Note, assigning fixed versions may cause dependency discrepancies as new packages are released. Set up a schedule to routinely update package versions. See Pinned Package Dependencies for more information.