General Purpose Container¶
Automation container nodes work by referencing container images that become Docker containers at runtime, allowing you to build recipes with processes that are containerized and portable.
DataKitchen provides a pre-configured container node called the General Purpose Container (GPC). The GPC is available to use as a base container or to reference when building your own custom containers.
The GPC:
- Sources existing assets, configurations, and settings.
- Uses Python3 exclusively.
- Versions of the GPC specific to python3.9 and python3.10 are available for use, noted as
v0.###_python3.9andv0.###_python3.10.
- Versions of the GPC specific to python3.9 and python3.10 are available for use, noted as
- Is set up to handle many basic use cases, such as running Python or shell scripts, provisioning with Ansible, and generating data visualizations in Tableau.
- Supports running .sh, .py, .and ipynb scripts located in the node's
docker-sharedirectory. - Has pre-installed tools, such as pandas and NumPy, to perform common data analysis actions or work with cloud computing services.
- Supports lists of parameters passed into the container through the
config.jsonfile. - Defines a standard structure for installing dependencies, passing parameters and variables, retrieving files, and logging.
Tip
Given its configuration, the GPC allows users to start fast in DataOps and helps users leverage Docker containers without having any prior expertise. Custom code doesn't have to be written and users aren't required to make complex set-up decisions.
Get the GPC¶
The GPC is publically available as a container image on Docker Hub open_in_new.
Pre-installed packages¶
The GPC comes with several pre-installed packages. For the full list, see GPC Pre-Installed Packages.
You can reference any of the pre-installed apps in your scripts. Additionally, the GPC configuration lets you specify other apt-get and Python packages to be installed at order runtime, allowing you to iterate fast without having to build new containers.
Container configuration¶
When you add any container node to a recipe graph—either in the user interface or the command line interface—you must configure the connection to use the correct GPC parameters.
For information on how to, see GPC File Structure and Configuration.