Azure Blob Storage¶
Azure Blob Storage data sources and sinks are in the file-based category of I/O connectors.
Tool documentation¶
Connector type values¶
The "type": value to use in the source or sink JSON files.
| Connector type | Value |
|---|---|
| Data Source | DKDataSource_AzureBlob |
| Data Sink | DKDataSink_AzureBlob |
Connection properties¶
The properties to use when connecting to an Azure Blob instance from Automation.
| Field | Scope | Type | Required? | Description |
|---|---|---|---|---|
connection_string |
source/sink | string | yes | Secret access string used to identify Azure account and permissions. |
container |
source/sink | string | yes | Name of container to operate on. |
Connections¶
See Connection Properties for more details on connection configurations.
Defined in kitchen-level variables¶
azureblobConfig in Kitchen Overrides
{
"azureblobConfig": {
"connection_string": "#{vault://azure/connection-string}",
"container": "datakitchen-staging"
}
}
The Connection tag in a Node Editor¶

Expanded connection syntax¶
For a data source¶
azureblob_datasource.json
{
"type": "DKDataSource_AzureBlob",
"name": "azureblob_datasource",
"config":{
"connection_string": "{{azureblobConfig.connection-string}}",
"container": "{{azureblobConfig.container}}"
},
"keys": {
"blob_source": {
"file-key": "test_upload.json",
"use-only-file-key": true,
"set_runtime-vars": {
"md5": "pre_upload_md5"
}
}
}
}
For a data sink¶
azureblob_datasink.json
{
"type": "DKDataSink_AzureBlob",
"name": "azureblob_datasink",
"config":{
"connection_string": "{{azureblobConfig.connection-string}}",
"container": "{{azureblobConfig.container}}"
},
"keys": {
"blob_sink": {
"file-key": "test_upload.json",
"use-only-file-key": true,
"overwrite-blob": true,
"set-runtime-vars": {
"md5": "post_download_md5"
}
}
}
}
Condensed connection syntax¶
For a data source¶
azureblob_datasource.json
{
"type": "DKDataSource_AzureBlob",
"name": "azureblob_datasource",
"config-ref": "azureblobConfig",
"keys": {},
"tests": {}
}
For a data sink¶
azureblob_datasink.json
{
"type": "DKDataSink_AzureBlob",
"name": "azureblob_datasink",
"config-ref": "azureblobConfig",
"keys": {},
"tests": {}
}
Local connections¶
You can access your Azure Storage account settings to find access keys and connection strings. Access keys are basically the credentials for your storage, and connection strings contain the information needed for Automation to connect and access data.
See Microsoft instructions to view and copy a connection string.
Other configuration properties¶
See the following topics for common properties, wildcards, and runtime variables:
Additional Azure Blob Storage step (key) properties¶
| Field | Type | Required? | Description |
|---|---|---|---|
overwrite-blob |
Boolean | no | Supported for Azure Blob Storage data sinks only. Determines if the upload to a data sink should overwrite any existing blobs. Default value is false. |
File encoding requirements¶
Files used with data sources and data sinks must be encoded in UTF-8 in order to avoid non-Unicode characters causing problems with sinking data to database tables and errors when running related tests
For CSV and other delimited files, use Save as in the program and select the proper encoding, or consider using a text editor with encoding options.
Data source example¶
The AzureBlob data source below loads all JSON blob files present in the wildcard/ directory with a wildcard key.
It also loads the specific test_upload.json blob with a file key. The azureblobConfig
variable defines the source account and container for these files.
The source, when finished loading the file, stores the file’s md5 hash in the post_download_md5 runtime variable.
As a file integrity test, the source then compares post_download_md5 to a predefined
pre_upload_md5 variable.
source.json
{
"name": "source",
"type": "DKDataSource_AzureBlob",
"config-ref": "azureblobConfig",
"wildcard": "*.json",
"wildcard-key-prefix": "wildcard/",
"keys": {
"azure_source": {
"file-key": "test_upload.json",
"use-only-file-key": true,
"set-runtime-vars": {
"md5": "post_download_md5"
}
}
},
"tests": {
"verify_data": {
"action": "stop-on-error",
"test-variable": "pre_upload_md5",
"type": "test-contents-as-string",
"test-logic": "pre_upload_d5 == {{post_download_md5}}"
}
}
}
You could also run a test to compare the pre_upload and post_download runtime variables. See Tests for more information and examples.
Data sink example¶
The AzureBlob data sink below uploads a single file named test_upload.json to a blob on the Azure account
and container defined by theazureblobConfigvariable. If a blob exists at the specified location it is
overwritten, as indicated by the value of theoverwrite-blobattribute. After uploading, the sink stores
the md5 hash of the file in thepre_upload_md5` runtime variable for later use.
sink.json