HTTP Sink connector
The HTTP Sink connector is a Stateless NiFi dataflow developed by Cloudera that is running in the Kafka Connect framework. Learn about the connector, its properties, and configuration.
The HTTP Sink connector obtains messages from a Kafka topic and transfers their content in a
HTTP POST requests to a specified endpoint. The topic the connector receives messages from is
determined by the value of the topics
parameter in the configuration. The
connector can forward the data it reads from Kafka as is (raw data) or can be configured to
execute record processing. When record processing is enabled, the connector expects the
incoming data in Avro or JSON format. In case of Avro, the connector can either read the
record schema from the Avro file it receives (provided that the schema is embedded) or it can
fetch the schema from Schema Registry. In case of JSON, it can either infer the schema or
fetch it from Schema Registry. What strategy is used to retrieve the schema is determined by
the Schema Access Strategy
property.
If Schema Registry is used, and it is on a Kerberized cluster, the krb5.file
property must point to the krb5.conf
file that provides access to the cluster
on which Schema Registry is present. This means that the krb5.conf
file must
be on the same cluster node that the connector runs on. The connections to Schema Registry and
the HTTP server can be secured by TLS. The keystore and truststore files necessary for
securing these connections must also be on the same cluster node that the connector runs
on.
Properties and configuration
Configuration is passed to the connector in a JSON file during creation. The properties of the connector can be categorized into three groups. These are as follows:
- Common connector properties
- These are the properties of the Kafka Connect framework that are accepted by all connectors. For a comprehensive list of these properties, see the Apache Kafka documentation.
- Stateless NiFi Sink properties
- These are the properties that are specific to the Stateless NiFi Sink connector. All Stateless NiFi Sink connectors share and accept these properties. For a comprehensive list of these properties, see the Stateless NiFi Sink properties reference.
- Connector/dataflow-specific properties
- These properties are unique to this specific connector. Or to be more precise, unique
to the dataflow running within the connector. These properties use the following
prefix:
For a comprehensive list of these properties, see the HTTP Sink properties reference.parameter.[***CONNECTOR NAME***] Parameters:
Notes and limitations
- Required properties must be assigned a valid value even if they are not used in the particular configuration. If a required property is not used, either leave its default value, or completely remove the property from the configuration JSON.
- If a property that has a default value is completely removed from the configuration JSON, the system uses the default value.
- Properties not marked as required must be completely removed from the configuration JSON if not set.
Schema Branch
andSchema Version
can not be specified at the same time.- The value of the
Schema Access Strategy
property is not independent of the value of theKafka Message Data Format
property. As a result, you must exercise caution when configuringSchema Access Strategy
.- If the value of
Kafka Message Data Format
isAVRO
, the possible values forSchema Access Strategy
areSchema Registry
,Embedded Schema
orHWX Content-Encoded Schema Reference
. - If the value of
Kafka Message Data Format
isJSON
, the possible values forSchema Access Strategy
areSchema Registry
orInfer Schema
.
- If the value of
Configuration example
In this example, the connector receives data in any format and forwards the raw data as the content of an HTTP POST request.
{
"connector.class": "org.apache.nifi.kafka.connect.StatelessNiFiSinkConnector",
"meta.smm.predefined.flow.name": "HTTP Sink",
"meta.smm.predefined.flow.version": "1.0.0",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"tasks.max": "1",
"nexus.url": "https://repository.cloudera.com/artifactory/repo",
"extensions.directory": "/tmp/nifi-stateless-extensions",
"working.directory": "/tmp/nifi-stateless-working",
"input.port": "Input from Kafka",
"failure.ports": "Failure",
"topics": "[***KAFKA TOPIC NAME***]",
"parameter.HTTP Sink Parameters:Forward Raw Data": "true",
"parameter.HTTP Sink Parameters:Remote URL": "http://[***SERVER HOSTNAME***]:[***PORT***]/[***PATH***]"
}
topics
- The name of the Kafka topic the connector fetches messages from.
Forward Raw Data
- Specifies whether messages from Kafka should be forwarded as is or converted to
JSON. In this example, the property is set to
true
, meaning that the connector does not process any records. It forwards incoming data as is. Remote URL
- Identifies the HTTP endpoint that receives the messages sent by this connector. In
this example, SSL is not used. As a result, the URL starts with
http
nothttps
. For example,http://my.http-server.com:22000/contentListener
.
Stateless NiFi Sink properties reference
Review the following reference for a comprehensive list of the connector properties that are specific to the Stateless NiFi Sink connector.
In addition to the properties listed here, Stateless NiFi connectors also accept the properties of the Kafka Connect framework. For a comprehensive list of these properties, see the Apache Kafka documentation.
attribute.prefix
- Description
- The prefix to add to the key of each header that matches the regular expression
specified in
headers.as.attributes.regex
. For example, if the header key isMyHeader
, its value isMyValue
,headers.as.attributes.regex
is set toMy.*
, and this property is set tokafka
, the flowfile that is created for the Kafka message will have an attribute namedkafka.MyHeader
with a value ofMyValue
. - Default Value
- Accepted Values
- Required
- false
dataflow.timeout
- Description
- Specifies the maximum amount of time to wait for the dataflow to complete. If the dataflow does not complete before this timeout, the thread is interrupted and the dataflow is considered as a failure. The session is rolled back and the connector retriggers the flow. Defaults to 60 seconds if not specified.
- Default Value
- 60 seconds
- Accepted Values
- Required
- false
extensions.directory
- Description
- Specifies the directory that stores downloaded extensions. Extensions are the NAR (NiFi Archive) files containing the processors and controller services a flow might use. Since Stateless NiFi is only the NiFi engine, it does not contain any of the processors and controller services you might use in your flow. When deploying the connector with the custom flow, the system needs to download the specific extensions that your flow uses from Nexus (unless they are already present in this directory). These extensions are stored in this directory. Because the default directory might not be writable, and to aid in upgrade scenarios, Cloudera recommends that you always specify an extensions directory.
- Default Value
- /tmp/nifi-stateless-extensions
- Accepted Values
- Required
- true
failure.ports
- Description
- A comma separated list of output ports that are considered as failure conditions. If
any flowfile is routed to an output port specified in this property, the dataflow is
considered a failure and the session is rolled back. After a set amount of time, the
dataflow reattempts to process the Kafka record. Any data transferred to an output port
that is not in the list of failure ports is discarded.
Because of how Stateless NiFi Sink connectors behave, even if a single flowfile ends up in an output port that is marked as failure, the entire sessions is rolled back with all messages in the batch. Furthermore, if a flowfile ends up in a failure port in each subsequent iteration, the result is an endless loop. With some sink connectors (for example. MQTT Sink) this is the desired behavior. For more information regarding this behavior, see Dataflow execution and scheduling.
- Default Value
- Accepted Values
- Required
- false
flow.snapshot
- Description
- Specifies the dataflow to run. When using Streams Messaging Manager to deploy a connector, the value you set in this property must be a JSON object. URLs, file paths, or escaped JSON strings are not supported when using Streams Messaging Manager. Alternatively, if using the Kafka Connect REST API to deploy a connector, this can be a file containing the dataflow, a URL that points to a dataflow, or a string containing the entire dataflow as an escaped JSON. Cloudera however, does not recommend using the Kafka Connect REST API to interact with this connector or Kafka Connect.
- Default Value
- Accepted Values
- Required
- true
headers.as.attributes.regex
- Description
- A Java regular expression that is evaluated against all Kafka record headers. Headers
are added to the flowfile as an attribute if the header key matches the regular
expression. The header key is used as the attribute name. The header value is used as
the attribute value. Additionally, the name of the attribute can also contain an
optional prefix which is defined by the
attribute.prefix
property. - Default Value
- Accepted Values
- Required
- false
input.port
- Description
- The name of the input port in the NiFi dataflow that Kafka records are sent to. If the dataflow contains exactly one input port, this property is optional and can be omitted. However, if the dataflow contains multiple input ports, this property must be specified.
- Default Value
- Accepted Values
- Required
- false
krb5.file
- Description
- Specifies the
krb5.conf
file to use if the dataflow interacts with any services that are secured using Kerberos. Defaults to/etc/krb5.conf
if not specified. - Default Value
- /etc/krb5.conf
- Accepted Values
- Required
- false
name
- Description
- The name of the connector. On the Streams Messaging Manager UI, the connector names are specified using the Enter Name field. The name that you enter in the Enter Name field is automatically set as the value of the name property when the connector is deployed. Because of this, the name property is omitted from the configuration template provided in Streams Messaging Manager. If you manually add the name property to the configuration in Streams Messaging Manager, ensure that the value you set matches the connector name specified in the Enter Name field. Otherwise, the connector fails to deploy.
- Default Value
- Accepted Values
- Required
- true
nexus.url
- Description
- Specifies the Base URL of the Nexus instance to source extensions from. If configuring
a Nexus instance that has multiple repositories, include the name of the repository in
the URL. For example,
https://nexus-private.myorganization.org/nexus/repository/my-repository/
. If the property is not specified, the necessary extensions (the ones used by the flow) must be provided in the extensions directory before deploying the connector. - Default Value
- Accepted Values
- Required
- true
parameter.[***FLOW PARAMETER NAME***]
- Description
- Specifies a parameter to use in the dataflow. For example, assume that you have the
following entry in your connector configuration
"parameter.Directory": "/mydir".
In a case like this, any parameter context in the dataflow that has a parameter namedDirectory
gets the specified value (/mydir
). If the dataflow has child process groups, and those child process groups have their own parameter contexts, the value is used for all parameter contexts that contain a parameter namedDirectory
. Parameters can also be applied to specific parameter contexts only. This can be done by prefixing the parameter name (Directory
) with the name of the parameter context followed by a colon. For example,parameter.My Context:Directory
only applies the specified value for theDirectory
parameter in the Parameter Context named My Context. - Default Value
- Accepted Values
- Required
- false
working.directory
- Description
- Specifies a directory on the Connect server that NiFi should use for unpacking
extensions that it needs to perform the dataflow. The contents of
extensions.directory
are unpacked here. Defaults to/tmp/nifi-stateless-working
if not specified. - Default Value
- /tmp/nifi-stateless-working
- Accepted Values
- Required
- false
HTTP Sink properties reference
Review the following reference for a comprehensive list of the connector properties that are specific to the HTTP Sink connector.
parameter.[***CONNECTOR NAME***] Parameters:
In addition to the properties listed here, this connector also accepts certain properties of the Kafka Connect framework as well as the properties of the NiFi Stateless Sink connector. When creating a new connector using the Streams Messaging Manager UI, all valid properties are presented in the default configuration template. You can view the configuration template to get a full list of valid properties. In addition, for more information regarding the accepted properties not listed here, you can review the Apache Kafka documentation and the Stateless NiFi Sink properties reference.
Basic Authentication Password
- Description
- The password to be used for authentication when connecting to the URL specified in the
Remote URL
property. If an authentication method other than Basic Authentication is used, this property must be completely removed from the configuration JSON. - Default Value
- Accepted Values
- Required
- false
Basic Authentication Username
- Description
- The username used for authentication when connecting to the URL specified in the
Remote URL
property. Cannot include control characters (0-31), ':', or DEL (127). If an authentication method other than Basic Authentication is used, this property must be completely removed from the configuration JSON. - Default Value
- Accepted Values
- Required
- false
Content-Type
- Description
- The Content-Type to specify in the HTTP header of the POST request sent by this connector.
- Default Value
- application/octet-stream
- Accepted Values
- Required
- true
Date Format
- Description
- Specifies the format to use when reading date fields from JSON. If
Forward Raw Data
is set tofalse
, the format defined here also applies to the date fields in the output JSON message. - Default Value
- yyyy-MM-dd
- Accepted Values
- Required
- true
Forward Raw Data
- Description
- Specifies whether messages from Kafka should be forwarded as is or converted to JSON.
If set to
false
theKafka Message Data Format
parameter must be specified. - Default Value
- true
- Accepted Values
- true, false
- Required
- true
Kafka Message Data Format
- Description
- Specifies the format of the messages the connector receives from Kafka. If the
Forward Raw Data
property is set totrue
then this property is ignored. However, even in a case like this, this property must be assigned a valid value. - Default Value
- AVRO
- Accepted Values
- AVRO, JSON
- Required
- true
Kerberos Keytab for Schema Registry
- Description
- The fully-qualified filename of the Kerberos keytab associated with the principal for accessing Schema Registry.
- Default Value
- The location of the default keytab which is empty and can only be used for unsecure connections.
- Accepted Values
- Required
- true
Kerberos Principal for Schema Registry
- Description
- The Kerberos principal used for authenticating to Schema Registry.
- Default Value
- default
- Accepted Values
- Required
- true
Keystore Filename for Secure HTTP
- Description
- The fully-qualified filename of a keystore. This keystore is used to establish a
secure connection with the HTTP server using mutual TLS.
If the HTTP server does not require client certificate authentication, this property must be completely removed from the configuration JSON.
- Default Value
- Accepted Values
- Required
- false
Keystore Key Password for Secure HTTP
- Description
- The password used to access the key stored in the keystore file configured in the
Keystore Filename for Secure HTTP
property.If the HTTP server does not require client certificate authentication, this property must be completely removed from the configuration JSON.
- Default Value
- Accepted Values
- Required
- false
Keystore Password for Secure HTTP
- Description
- The password used to access the contents of the keystore configured in the
Keystore Filename for Secure HTTP
property.If the HTTP server does not require client certificate authentication, this property must be completely removed from the configuration JSON.
- Default Value
- Accepted Values
- Required
- false
Keystore Type for Secure HTTP
- Description
- The type of the keystore configured in the
Keystore Filename for Secure HTTP
property.If the HTTP server does not require client certificate authentication, this property must be completely removed from the configuration JSON.
- Default Value
- Accepted Values
- BCFKS, PKCS12, JKS
- Required
- false
Output Grouping for JSON
- Description
- Specifies how JSON objects are grouped in the connector output.
If set to
output-array
, the output will consist of an array of JSON objects.f set to
output-oneline
, each line of the output data becomes one JSON object. That is, each JSON object occupies one line in the output.This property is only used if forwarding raw data is disabled (
Forward Raw Data
is set tofalse
). This is because when forwarding raw data is disabled, the data coming from Kafka gets converted to JSON. However, this property must still be set even if forwarding raw data is enabled. - Default Value
- output-oneline
- Accepted Values
-
output-array, output-oneline
- Required
- true
Remote URL
- Description
- The remote URL to connect to. Must include scheme, host, port, and path.
- Default Value
- https://localhost:22000/contentListener
- Accepted Values
- Required
- true
Schema Access Strategy
- Description
- Specifies the strategy used for determining the schema of the Kafka record. The value
you set here depends on the data format set in
Kafka Message Data Format.
- If set to
Schema Registry
, the schema is read from Schema Registry. This setting can be used with both Avro and JSON formats. - If set to
Infer Schema
, the schema is inferred based on the input file. This setting can only be used ifKafka Message Data Format
isJSON
. - If set to
Embedded Schema
, the schema embedded in the input is used. This setting can only be used ifKafka Message Data Format
isAvro
. - If set to
HWX Content-Encoded Schema Reference
, the schema is read from Schema Registry. This setting can only be used ifKafka Message Data Format
isAvro
. In this case the Avro messages are expected to have a reference to the schema in Schema Registry encoded within the message content.
Forward Raw Data
is set totrue
). - If set to
- Default Value
- Schema Registry
- Accepted Values
- Schema Registry, Infer Schema, Embedded Schema, HWX Content-Encoded Schema Reference
- Required
- true
Schema Branch
- Description
- The name of the branch to use when looking up the schema in Schema Registry.
Schema Branch
andSchema Version
cannot be specified at the same time. If one is specified, the other needs to be removed from the configuration. If Schema Registry is not used, this property must be completely removed from the configuration. - Default Value
- Accepted Values
- Required
- false
Schema Name
- Description
- The schema name to look up in Schema Registry.
If the
Schema Access Strategy
property is set toSchema Registry
, this property must contain a valid schema name.If
Schema Registry
is not used, this property must be completely removed from the configuration JSON. - Default Value
- Accepted Values
- Required
- false
Schema Registry URL
- Description
- The URL of the Schema Registry server. If Schema Registry is not used, use the default value.
- Default Value
- http://localhost:7788/api/v1
- Accepted Values
- Required
- true
Schema Version
- Description
- The version of the schema to look up in Schema Registry. If Schema Registry is used
and a schema version is not specified, the latest version of the schema is retrieved.
Schema Branch
andSchema Version
cannot be specified at the same time. If one is specified, the other needs to be removed from the configuration. If Schema Registry is not used, this property must be completely removed from the configuration. - Default Value
- Accepted Values
- Required
- false
Time Format
- Description
- Specifies the format to use when reading time fields from JSON. If
Forward Raw Data
is set to false, the format defined here also applies to the time fields in the output JSON message. - Default Value
- HH:mm:ss
- Accepted Values
- Required
- true
Timestamp Format
- Description
- Specifies the format to use when reading timestamp fields from JSON. If
Forward Raw Data
is set to false, the format defined here also applies to the timestamp fields in the output JSON message. - Default Value
- HH:mm:ss.SSS
- Accepted Values
- Required
- true
Truststore Filename for Schema Registry
- Description
- The fully-qualified filename of a truststore. This truststore is used to establish a secure connection with Schema Registry using TLS.
- Default Value
- The location of the default truststore which is empty and can only be used for unsecure connection.
- Accepted Values
- Required
- true
Truststore Filename for Secure HTTP
- Description
- The fully-qualified filename of a truststore. This truststore is used to establish a secure connection with the HTTP server using TLS
- Default Value
- The location of the default truststore which is empty and can only be used for unsecure connections.
- Accepted Values
- Required
- true
Truststore Password for Schema Registry
- Description
- The password used to access the contents of the truststore configured in the
Truststore Filename for Schema Registry
property. - Default Value
- password
- Accepted Values
- Required
- true
Truststore Password for Secure HTTP
- Description
- The password used to access the contents of the truststore configured in the
Truststore Filename for Secure HTTP
property. - Default Value
- password
- Accepted Values
- Required
- true
Truststore Type for Schema Registry
- Description
- The type of the truststore configured in the
Truststore Filename for Schema Registry
property. - Default Value
- JKS
- Accepted Values
- BCFKS, PKCS12, JKS
- Required
- true
Truststore Type for Secure HTTP
- Description
- The type of the truststore configured in the
Truststore Filename for Secure HTTP
property. - Default Value
- JKS
- Accepted Values
- BCFKS, PKCS12, JKS
- Required
- true