InfluxDB Sink connector

The InfluxDB Sink connector is a Stateless NiFi dataflow developed by Cloudera that is running in the Kafka Connect Framework. Learn about the connector, its properties, and configuration.

The InfluxDB Sink Connector fetches messages from Kafka and loads them to InfluxDB. The topic this connector receives messages from is determined by the value of the topics parameter in the configuration. The messages the connector receives from Kafka can be in either Avro or JSON format.

If the connector input is in Avro format, then it can either read the schema from the Avro file it receives (provided that the schema is embedded) or it can fetch the schema from Schema Registry. If the connector’s input is in JSON format, then it can either infer the schema or fetch it from Schema Registry. The strategy that is used to retrieve the schema is determined by the Schema Access Strategy property.

The connector can authenticate to InfluxDB using username and password.

Properties and configuration

Configuration is passed to the connector during creation. The properties of the connector can be categorized into three groups. These are as follows:

Common connector properties

These are the properties of the Kafka Connect framework that are accepted by all connectors. For a comprehensive list of these properties, see the Apache Kafka documentation.

Stateless NiFi Sink properties

These are the properties that are specific to the Stateless NiFi Sink connector. All Stateless NiFi Sink connectors share and accept these properties. For a comprehensive list of these properties, see the Stateless NiFi Sink properties reference.

Connector/dataflow-specific properties

These properties are unique to this specific connector. Or to be more precise, unique to the dataflow running within the connector. These properties use the following prefix:

parameter.[***CONNECTOR NAME***] Parameters:

For a comprehensive list of these properties, see the InfluxDB Sink properties reference.

Notes and limitations

If a property that has a default value is completely removed from the configuration, the system uses the default value.
Properties not marked as required must be completely removed from the configuration if not set.
Schema Branch and Schema Version can not be specified at the same time.
The value of the Schema Access Strategy property is not independent of the value of the Kafka Message Data Format property. As a result, you must exercise caution when configuring Schema Access Strategy.
- If the value of Kafka Message Data Format is AVRO, the possible values for Schema Access Strategy are Schema Registry, Embedded Schema or HWX Content-Encoded Schema Reference.
- If the value of Kafka Message Data Format is JSON, the possible values for Schema Access Strategy areSchema Registry or Infer Schema.
The Line Protocol Query parameter contains a query expression which creates a string from input record fields. The query depends on the record schema. The result of the query must comply with the line protocol format.

Configuration example

In this example, the connector receives data in JSON format (which is the default setting), transforms the data to line protocol format, and inserts it to InfluxDB. Schema Registry is not used in the example. As a result, this example does not include Schema Registry key and truststore configurations.

{
 "connector.class": "org.apache.nifi.kafka.connect.StatelessNiFiSinkConnector",
 "meta.smm.predefined.flow.name": "InfluxDB Sink",
 "meta.smm.predefined.flow.version": "1.0.0",
 "key.converter": "org.apache.kafka.connect.storage.StringConverter",
 "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
 "tasks.max": "1",
 "extensions.directory": "/tmp/stateless-extensions",
 "working.directory": "/tmp/nifi-stateless-working",
 "nexus.url": "https://repository.cloudera.com/artifactory/repo/",
 "failure.ports": "Retry from PutInfluxDB",
 "topics": "[***TOPIC NAME***]",
 "parameter.InfluxDB Connection URL": "http://[**SERVER  HOSTNAME**]:[**PORT**]",
 "parameter.InfluxDB Database Name": "[***DATABASE NAME***]",
 "parameter.InfluxDB User Name": "[***USER NAME***]",
 "parameter.InfluxDB User Password": "[***USER PASSWORD***]",
 "parameter.Line Protocol Query": "select 'temp,device_id=' || device_id || ',device_state=' || device_state || ',part_id=' || part_id || ',code=' || code || ',result=' || result_code || ' temperature=' || temperature || ' ' || ts || '000000' as payload from FLOWFILE",
 "parameter.Schema Access Strategy":"Infer Schema"
}

The following list collects the properties from the configuration example that must be customized for this use case.

topics: The name of the Kafka topic the connector fetches messages from.
InfluxDB Connection URL: The URL used to connect to InfluxDB.
InfluxDB Database Name: The name of the InfluxDB database.
InfluxDB User Name: The username used to connect to InfluxDB.
InfluxDB User Password: The password used to connect to InfluxDB.
Line Protocol Query: Specifies a query based on the Avro schema of the records and the measurement used.
Schema Access Strategy: Specifies the strategy used for determining the schema of the Kafka record. In this example, the property is set to Infer Schema. This means that the schema is determined (inferred) from the JSON data.

Stateless NiFi Sink properties reference

Review the following reference for a comprehensive list of the connector properties that are specific to the Stateless NiFi Sink connector.

In addition to the properties listed here, Stateless NiFi connectors also accept the properties of the Kafka Connect framework. For a comprehensive list of these properties, see the Apache Kafka documentation.

attribute.prefix

Description: The prefix to add to the key of each header that matches the regular expression specified in headers.as.attributes.regex. For example, if the header key is MyHeader, its value is MyValue, headers.as.attributes.regex is set to My.*, and this property is set to kafka, the flowfile that is created for the Kafka message will have an attribute named kafka.MyHeader with a value of MyValue.
Default Value
Accepted Values
Required: false

dataflow.timeout

Description: Specifies the maximum amount of time to wait for the dataflow to complete. If the dataflow does not complete before this timeout, the thread is interrupted and the dataflow is considered as a failure. The session is rolled back and the connector retriggers the flow. Defaults to 60 seconds if not specified.
Default Value: 60 seconds
Accepted Values
Required: false

extensions.directory

Description: Specifies the directory that stores downloaded extensions. Extensions are the NAR (NiFi Archive) files containing the processors and controller services a flow might use. Since Stateless NiFi is only the NiFi engine, it does not contain any of the processors and controller services you might use in your flow. When deploying the connector with the custom flow, the system needs to download the specific extensions that your flow uses from Nexus (unless they are already present in this directory). These extensions are stored in this directory. Because the default directory might not be writable, and to aid in upgrade scenarios, Cloudera recommends that you always specify an extensions directory.
Default Value: /tmp/nifi-stateless-extensions
Accepted Values
Required: true

failure.ports

Description: A comma separated list of output ports that are considered as failure conditions. If any flowfile is routed to an output port specified in this property, the dataflow is considered a failure and the session is rolled back. After a set amount of time, the dataflow reattempts to process the Kafka record. Any data transferred to an output port that is not in the list of failure ports is discarded.
Because of how Stateless NiFi Sink connectors behave, even if a single flowfile ends up in an output port that is marked as failure, the entire sessions is rolled back with all messages in the batch. Furthermore, if a flowfile ends up in a failure port in each subsequent iteration, the result is an endless loop. With some sink connectors (for example. MQTT Sink) this is the desired behavior. For more information regarding this behavior, see Dataflow execution and scheduling.
Default Value
Accepted Values
Required: false

flow.snapshot

Description: Specifies the dataflow to run. When using Streams Messaging Manager to deploy a connector, the value you set in this property must be a JSON object. URLs, file paths, or escaped JSON strings are not supported when using Streams Messaging Manager. Alternatively, if using the Kafka Connect REST API to deploy a connector, this can be a file containing the dataflow, a URL that points to a dataflow, or a string containing the entire dataflow as an escaped JSON. Cloudera however, does not recommend using the Kafka Connect REST API to interact with this connector or Kafka Connect.
Default Value
Accepted Values
Required: true

headers.as.attributes.regex

Description: A Java regular expression that is evaluated against all Kafka record headers. Headers are added to the flowfile as an attribute if the header key matches the regular expression. The header key is used as the attribute name. The header value is used as the attribute value. Additionally, the name of the attribute can also contain an optional prefix which is defined by the attribute.prefix property.
Default Value
Accepted Values
Required: false

input.port

Description: The name of the input port in the NiFi dataflow that Kafka records are sent to. If the dataflow contains exactly one input port, this property is optional and can be omitted. However, if the dataflow contains multiple input ports, this property must be specified.
Default Value
Accepted Values
Required: false

krb5.file

Description: Specifies the krb5.conf file to use if the dataflow interacts with any services that are secured using Kerberos. Defaults to /etc/krb5.conf if not specified.
Default Value: /etc/krb5.conf
Accepted Values
Required: false

name

Description: The name of the connector. On the Streams Messaging Manager UI, the connector names are specified using the Enter Name field. The name that you enter in the Enter Name field is automatically set as the value of the name property when the connector is deployed. Because of this, the name property is omitted from the configuration template provided in Streams Messaging Manager. If you manually add the name property to the configuration in Streams Messaging Manager, ensure that the value you set matches the connector name specified in the Enter Name field. Otherwise, the connector fails to deploy.
Default Value
Accepted Values
Required: true

nexus.url

Description: Specifies the Base URL of the Nexus instance to source extensions from. If configuring a Nexus instance that has multiple repositories, include the name of the repository in the URL. For example, https://nexus-private.myorganization.org/nexus/repository/my-repository/. If the property is not specified, the necessary extensions (the ones used by the flow) must be provided in the extensions directory before deploying the connector.
Default Value
Accepted Values
Required: true

parameter.`[FLOW PARAMETER NAME]`

Description: Specifies a parameter to use in the dataflow. For example, assume that you have the following entry in your connector configuration "parameter.Directory": "/mydir". In a case like this, any parameter context in the dataflow that has a parameter named Directory gets the specified value (/mydir). If the dataflow has child process groups, and those child process groups have their own parameter contexts, the value is used for all parameter contexts that contain a parameter named Directory. Parameters can also be applied to specific parameter contexts only. This can be done by prefixing the parameter name (Directory) with the name of the parameter context followed by a colon. For example, parameter.My Context:Directory only applies the specified value for the Directory parameter in the Parameter Context named My Context.
Default Value
Accepted Values
Required: false

working.directory

Description: Specifies a directory on the Connect server that NiFi should use for unpacking extensions that it needs to perform the dataflow. The contents of extensions.directory are unpacked here. Defaults to /tmp/nifi-stateless-working if not specified.
Default Value: /tmp/nifi-stateless-working
Accepted Values
Required: false

InfluxDB Sink properties reference

Review the following reference for a comprehensive list of the connector properties that are specific to the InfluxDB Sink connector.

The properties listed in this reference must be added to the connector configuration with the following prefix:

parameter.[***CONNECTOR NAME***] Parameters:

In addition to the properties listed here, this connector also accepts certain properties of the Kafka Connect framework as well as the properties of the NiFi Stateless Sink connector. When creating a new connector using the Streams Messaging Manager UI, all valid properties are presented in the default configuration template. You can view the configuration template to get a full list of valid properties. In addition, for more information regarding the accepted properties not listed here, you can review the Apache Kafka documentation and the Stateless NiFi Sink properties reference.

Batch Size

Description: Maximum size of records merged together and put to InfluxDB in one batch. Example values: 100 MB, 1 GB
Default Value: 1 MB
Accepted Values
Required: true

Consistency Level

Description: Specifies the InfluxDB consistency level
Default Value: ONE
Accepted Values: ONRE ANY, ALL, QUORUM
Required: true

Date Format

Description: Specifies the format to use when reading date fields from JSON. If Forward Raw Data is set to false, the format defined here also applies to the date fields in the output JSON message.
Default Value: yyyy-MM-dd
Accepted Values
Required: true

InfluxDB Connection URL

Description: Specifies the InfluxDB URL to connect to.
Default Value: http://localhost:8086
Accepted Values
Required: true

InfluxDB Database Name

Description: The name of the InfluxDB database to connect to.
Default Value
Accepted Values
Required: true

InfluxDB User Name

Description: The username for accessing InfluxDB.
Default Value
Accepted Values
Required: false

InfluxDB User Password

Description: The password for InfluxDB user.
Default Value
Accepted Values
Required: false

Kafka Message Data Format

Description: Specifies the format of the messages the connector receives from Kafka. If the Forward Raw Data property is set to true then this property is ignored. However, even in a case like this, this property must be assigned a valid value.
Default Value: AVRO
Accepted Values: AVRO, JSON
Required: true

Kerberos Keytab for Schema Registry

Description: The fully-qualified filename of the Kerberos keytab associated with the principal for accessing Schema Registry.
Default Value: The location of the default keytab which is empty and can only be used for unsecure connections.
Accepted Values
Required: true

Kerberos Principal for Schema Registry

Description: The Kerberos principal used for authenticating to Schema Registry.
Default Value: default
Accepted Values
Required: true

Line Protocol Query

Description: A record schema based query which returns the attributes in line protocol format.
Default Value
Accepted Values
Required: true

Retention Policy

Description: Specifies the retention policy for saving records in InfluxDB.
Default Value: autogen
Accepted Values
Required: true

Schema Access Strategy

Description

Specifies the strategy used for determining the schema of the Kafka record. The value you set here depends on the data format set in

Kafka Message Data
              Format.

If set to Schema Registry, the schema is read from Schema Registry. This setting can be used with both Avro and JSON formats.
If set to Infer Schema, the schema is inferred based on the input file. This setting can only be used if Kafka Message Data Format is JSON.
If set to Embedded Schema, the schema embedded in the input is used. This setting can only be used if Kafka Message Data Format is Avro.
If set to HWX Content-Encoded Schema Reference, the schema is read from Schema Registry. This setting can only be used if Kafka Message Data Format is Avro. In this case the Avro messages are expected to have a reference to the schema in Schema Registry encoded within the message content.

Default Value

Schema Registry

Accepted Values

Schema Registry, Infer Schema, Embedded Schema, HWX Content-Encoded Schema Reference

Required

true

Schema Branch

Description: The name of the branch to use when looking up the schema in Schema Registry. Schema Branch and Schema Version cannot be specified at the same time. If one is specified, the other needs to be removed from the configuration. If Schema Registry is not used, this property must be completely removed from the configuration.
Default Value
Accepted Values
Required: false

Schema Name