GPC Best Practices¶
This document provides some best practices that can help you get the most out of a General Purpose Container (GPC) container node.
Configure the image tag¶
When configuring a GPC, the docker container image version is set by a tag. The tag can be specified in the Connections tab of the Node Editor and is stored in the notebook.json file.
You can find a list of the GPC image tags here: https://hub.docker.com/r/datakitchenprod/dk_general_purpose_container/tags
Best practice: Use a fixed version¶
DataKitchen recommends setting the Tag to a specific version, for example, v0.167.
Using a fixed version of the GPC means that:
- Future GPC updates and version releases do not automatically apply to your recipe nodes.
- You can update your nodes to newer versions on a schedule that suits your workflow.
Tip
Try to update your GPC tag so you're never more than two versions behind.
docker-share directory¶
The docker-share directory, or docker-share folder, is the runtime file system of a GPC. For processes that occur in a GPC, this designated folder should be the only location that stores files and scripts to execute at runtime.
Don't rely on files located outside of the docker-share folder or on the internal file structure of the GPC itself. The system can behave unpredictably if a file or script used by a GPC is located outside of docker-share.
Best practice: referencing files¶
If a file saved in the docker-share directory is needed by a script at runtime, that file can be referenced in the script:
- With the Python import statement
from <file_name> import <function_name>, if the file being called is .py. - As
docker-share/<filename>, for any file type. - Using
os.path.join(os.path.dirname(__file__), "<filename>"), for any file type.
Example¶
The following example shows a primary script, example.py (recorded correctly in the
config.py GPC file) referencing 3 additional files.
All 5 files are saved in the docker-share folder.
config.json
{
"apt-dependencies": [],
"dependencies": [],
"keys": {
"run_script": {
"script": "example.py",
"environment": {},
"parameters": {},
"export": []
}
}
}
example.py
import os
from utilities import handle_json_file, handle_text_file
json_path = os.path.join(os.path.dirname(__file__), "file.json")
with open(json_path) as file:
handle_json_file(file)
text_path = "docker-share/data.txt"
with open(text_path) as file:
handle_text_file(file)
utilities.py
import json
def handle_json_file(file):
print(json.load(file))
def handle_text_file(file):
print(file.read())
file.json
data.txt
config.json¶
This system file identifies apps to be installed in the container, scripts to be executed in the container, variables to use in the scripts, and values to export for testing.
Best practice: additional packages¶
Use the "dependencies": property to specify a list of software packages
to install in the container.
If you install additional Python packages, you should pin each to a specific version
using the requirement specifier syntax package==version. This ensures that containers will not attempt to download the latest
package at order runtime. If your order runs use different versions of Python apps, you may see
inconsistent results.
Note
Note, assigning fixed versions may cause dependency discrepancies as new packages are released. Set up a schedule to routinely update package versions. See Pinned Package Dependencies for more information.