Recipe Nodes¶
Recipe nodes are isolated steps within a variation graph. Nodes are modular, mini-processes that perform work during an order run. To view all nodes that exist in a recipe, select the Nodes tab, or, to see the nodes in a particular variation, open that variation to view its graph.
Recipe nodes and their containers are defined and separated by the specific functions they are created to do. Multiple tools can be used within a single recipe node container. For example, a recipe node can include all of the tools, settings, data, code, and variables needed to complete the process of collecting a dataset, creating a database schema, creating a database table, and populating the new table with the collected data for downstream querying.
Tip
DataOps best practice: Reuse and containerize. Containerization of recipe graph nodes lets you use your favorite tools while working with Automation. Because recipe node processing occurs within a container, the node can run tools in isolation from other nodes—and that node is guaranteed to run exactly the same anywhere it is deployed.
Node Editor¶
Set up and configure nodes from the Node Editor. To open the Node Editor for any node, select a specific kitchen > recipe > variation, then:
- Select a node in the graph. Click Edit Node in the Node Details.
- Double-click a node in the graph.
Once in the Node Editor, the configuration and settings are available in several tabs.
Note
Node Editor is not available for advanced Jinja. The Node Editor cannot parse any node with JSON files using advanced Jinja that break standard JSON format. In this case, the File Editor will open for editing nodes.
Node types¶
Automation provides six types of nodes, each designed to achieve distinct work. Each node type is identified by the type property in its description.json file.
| Node type | Type property value | Notebook? | Data sources/sinks? | Description |
|---|---|---|---|---|
| Synchronize | DKNode_NoOp | file required; content optional | —/— | A node without any objects (key/value pairs). These nodes run without configured data sources or sinks. They are often used as a convergence node for graphs with parallel nodes upstream. See Synchronize Nodes for configuration information. |
| DataMapper | DKNode_DataMapper | optional | required/required | Maps data between data sources and data sinks. Source and target files can be mapped by explicitly configuring filenames, and sets of files can be mapped using wildcards. See Configure DataMapper Nodes for more information. |
| Action | DKNode_Action | required | required/— | Runs data sources to connect to infrastructure for use cases where data sinks are not needed. For example, connecting to a database to perform administrative operations. The /data-sources directory is named /actions for action nodes. See Configure Action Nodes for more information. |
| Container | DKNode_Container | required | optional/optional | Runs a Docker container based on parameterizable image names and tags. Code (using scripts and calling various tools) can be embedded and run within these nodes. DataKitchen provides a number of container images with useful features that may be leveraged directly or customized. The standard image is the General Purpose Container. See Create a Container Node and related topics for more information. |
| Ingredient | DKNode_Ingredient | required | —/— | Runs a recipe variation that has been declared as an ingredient. A distinct order run is created for the ingredient node, which is run in an auto-generated child kitchen. See Configure Ingredient Nodes for more information. |
| Conditional | DKNode_NoOp | file required; content optional | —/— | Serves as an operational decision point that determines the flow of graph processing. See Create a Conditional Node and Conditional Node Processing for more information. |