Amazon S3 Sink

Learn more about the Amazon S3 Sink connector

The Amazon S3 Sink connector allows users to stream Kafka data into S3 buckets.

Configuration example

A simple configuration example for the Amazon S3 Sink connector.

The following is a simple configuration example for the Amazon S3 Sink connector. Short descriptions of the properties set in this example are also provided. For a full properties reference, see the Amazon S3 Sink properties reference.

{
    "aws.s3.bucket": "bring-me-the-bucket",
    "aws.s3.service_endpoint": "http://myendpoint:9090/",
    "aws.access_key_id": "EXAMPLEID",
    "aws.secret_access_key": “EXAMPLEKEY",
    "connector.class": "com.cloudera.dim.kafka.connect.s3.S3SinkConnector",
    "tasks.max": 1,
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
    "value.converter.passthrough.enabled": true,
    "value.converter.schema.registry.url": "http://schema-registry:9090/api/v1",
    "topics": "avro_topic",
    "output.storage": "com.cloudera.dim.kafka.connect.s3.S3PartitionStorage",
    "output.writer": "com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter",
    "output.avro.passthrough.enabled": true
  }

aws.s3.bucket: Target S3 bucket name.
aws.s3.service_endpoint: Target S3 host and port.
aws.access_key_id: The AWS secret key ID used for authentication.
aws.secret_access_key: The AWS secret access key used for authentication.
connector.class: Class name of the Amazon S3 Sink connector.
tasks.max: Maximum number of tasks.
key.converter: The converter capable of understanding the data format of the key of each record on this topic.
value.converter: The converter capable of understanding the data format of the value of each record on this topic.
note
When the AvroConverter is used, you can specify Schema Registry properties to be used by the AvroConverter’s Schema Registry client. This is done by adding the required Schema Registry property as a suffix to the value.converter property. For example, value.converter.schema.registry.url. Properties defined this way are passed on to the Schema Registry client used by the AvroConverter.
value.converter.passthrough.enabled: This property controls whether or not data is converted into the Kafka Connect intermediate data format before writing into an output file. Because in this example the input and output format is the same, the property is set to true, that is, data is not converted.
value.converter.schema.registry.url: The URL to Schema Registry. This is a mandatory property if the topic has records encoded in Avro format.
topics: List of topics to consume data from.
output.storage: The S3 storage implementation class.
output.writer: Determines the output file format. Because in this example the output format is Avro, AvroPartitionWriter is used.
output.avro.passthrough.enabled: This property has to match the configuration of the value.converter.passthrough.enabled property because both the input and output formats are Avro.

Amazon S3 Sink properties reference

Amazon S3 Sink connector properties reference.

The following table collects connector properties that are specific for the Amazon S3 Sink Connector. For properties common to all sink connectors, see the upstream Apache Kafka documentation.


Property Name	Description	Type	Default Value	Accepted Values	Recommended Value
aws.s3.bucket	The target S3 bucket name.	String	none	Any valid S3 bucket name.
aws.s3.service_endpoint	The target S3 host and port.	String	none	Any valid S3 endpoint.
aws.access_key_id	The AWS secret key ID to authenticate.	String	none	Any valid secret key issued by AWS.
aws.secret_access_key	The AWS secret access key to authenticate.	String	none	Any valid access key issued by AWS.
value.converter	Value conversion class.	String	none		com.cloudera.dim.kafka.connect.converts.AvroConverter
value.converter.passthrough.enabled	Configures whether the AvroConverter translates an Avro record into Kafka Connect Data or transparently passes the Avro encoded bytes as payload.	Boolean	true	true, false	True if input and output are both Avro.
value.converter.schema.registry.url	The URL to the Schema Registry server.	String	none
output.storage	The S3 storage implementation class.	String	none		com.cloudera.dim.kafka.connect.s3.S3PartitionStorage
output.writer	The output file writer which determines the type of file to be written. The value of this property should be the FQCN of a class that implements the `PartitionWriter` interface.	String	none	com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter com.cloudera.dim.kafka.connect.partition.writers.json.JsonPartitionWriter com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter com.cloudera.dim.kafka.connect.partition.writers.txt.TxtPartitionWriter	com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter
output.avro.passthrough.enabled	Configures Whether the output writer expects an Avro encoded Kafka Connect data record. Must match the configuration of`value.converter.passthrough.enabled`.	Boolean	none	true, false	True if input and output are both Avro.