Skip to content

Amazon S3

Amazon S3 data sources and sinks are in the file-based category of I/O connectors.

Tool documentation

Connector type values

The "type": value to use in the source or sink JSON files.

Connector type Value
Data Source DKDataSource_S3
Data Sink DKDataSink_S3

Connection properties

The properties to use when connecting to an Amazon S3 instance from Automation.

Field Scope Type Required? Description
bucket source/sink string yes Bucket name.
public-bucket source/sink Boolean no When public, s3-access-key and s3-secret-key are not required.
access-key source/sink string yes if public-bucket is false AWS access key.
secret-key source/sink string yes if public-bucket is false AWS secret key.
aws-session-token source/sink string no AWS session token optionally passed with access and secret keys in order to assume an IAM (AWS Identity and Access Management) role.
region source/sink string no AWS region name.

Connections

See Connection Properties for more details on connection configurations.

Defined in kitchen-level variables

s3config in kitchen overrides

{
    "s3config": {
        "secret-key": "#{vault://s3/secret_key}",
        "access-key": "#{vault://s3/access_key}",
        "bucket": "#{vault://s3/bucket}"
    }
}

The Connection tab in a Node Editor

(Screenshot of the Connector tab)

Expanded connection syntax

For a data source

s3_datasource.json

{
    "type": "DKDataSource_S3",
    "name": "s3_datasource",
    "config": {
        "access-key": "{{s3config.accesskey}}",
        "secret-key": "{{s3config.secretkey}}",
        "bucket": "{{s3config.bucketname}}"
    },
    "keys": {},
    "tests": {}
}

For a data sink

s3_datasink.json

{
    "type": "DKDataSink_S3",
    "name": "s3_datasink",
    "config": {
        "access-key": "{{s3config.accesskey}}",
        "secret-key": "{{s3config.secretkey}}",
        "bucket": "{{s3config.bucketname}}"
    },
    "keys": {},
    "tests": {}
}

Condensed connection syntax

Note

Note: Do not use quotes for your condensed connection configuration variables.

For a data source

s3_datasource.json

{
    "type": "DKDataSource_S3",
    "name": "s3_datasource",
    "config-ref": "s3config",
    "keys": {},
    "tests": {}
}

For a data sink

s3_datasink.json

{
    "type": "DKDataSink_S3",
    "name": "s3_datasink",
    "config-ref": "s3config",
    "keys": {},
    "tests": {}
}

Local connectionss

S3 bucket contents can be viewed locally by configuring connections with file-transfer applications like Transmit.

Other configuration properties

See the following topics for common properties, wildcards, and runtime variables:

File encoding requirements

Files used with data sources and data sinks must be encoded in UTF-8 in order to avoid non-Unicode characters causing problems with sinking data to database tables and errors when running related tests

For CSV and other delimited files, use Save as in the program and select the proper encoding, or consider using a text editor with encoding options.

Data source examples

Example source 1

s3_datasource.json

{
    "type": "DKDataSource_S3",
    "name": "s3_datasource",
    "config": {{s3config}},
    "keys": {
            "example-key": {
                "file-key": "",
                "decrypt-key": "",
                "decrypt-passphrase": "",
            }
    },
    "tests": {}
}

Example source 2

s3_datasource.json

{
    "type": "DKDataSource_S3",
    "name": "s3_datasource",
    "public-bucket": true,
    "bucket": "datakitchen-public",
    "wildcard": "*.csv",
    "wildcard-key-prefix": "dk-public/",
    "set-runtime-vars": {
        "key_count": "total_csv",
        "size": "real_size"
    },
    "tests": {
        "test-key-count": {
            "test-logic": {
                "test-compare": "equal-to",
                "test-metric": 3
            },
            "action": "stop-on-error",
            "type": "test-contents-as-integer",
            "test-variable": "total_csv",
            "keep-history": false
        }
    }
}

Example source 3

To concatenate all files in a given path, create a key with the name of the path and set the value of file-key to CONCATENATE_ALL_FILES.

s3_datasource.json

{
    "type": "DKDataSource_S3",
    "name": "s3_datasource",
    "public-bucket": true,
    "bucket": "datakitchen-public",
    "wildcard": "",
    "set-runtime-vars": {
        "key_count": "total_csv_concat"
    },
    "keys": {
        "dk-public/concat_file_test/": {
            "file-key": "CONCATENATE_ALL_FILES"
        }
    },
    "tests": {
        "test-key-count-concat": {
            "test-logic": {
                "test-compare": "equal-to",
                "test-metric": 1
            },
            "action": "stop-on-error",
            "type": "test-file-count",
            "test-variable": "total_csv_concat",
            "keep-history": false,
        },
    }
}

Data sink examples

Example sink 1

Push all files (matching the * wildcard) within the /vendor-name directory to a public S3 bucket.

s3_datasource.json

{
    "type": "DKDataSink_S3",
    "name": "s3_datasource.json",
    "public-bucket": true,
    "bucket": "my-public-bucket",
    "wildcard": "*",
    "wildcard-key-prefix": "vendor_name/",
    "keys": {}
}

Example sink 2

s3_datasink.json

{
    "type": "DKDataSink_S3",
    "name": "s3_datasink",
    "config": {{s3config}},
    "keys": {
        "raw/vendor_name/{{vendor_name_date}}/accounts": {
            "file-key": "",
            "encrypt-key": "#{vault://secret-key}"
        },
    },
    "tests": {}
}