Defining and adding clusters for replication

Before you can start replicating data with Streams Replication Manager, you must define the clusters that take part in the replication process and add them to Streams Replication Manager’s configuration. Defining and adding the clusters provides Streams Replication Manager with the necessary connection and security information it needs to replicate data.

In order for Streams Replication Manager to replicate data between clusters, you have to define these clusters in Cloudera Manager and add them to the Streams Replication Manager service’s configuration. In order to correctly configure your clusters, you need to complete multiple configuration tasks. The configuration workflow is the following:

1 Defining external clusters

In any given scenario, you start with defining your external clusters. External clusters are defined by creating and configuring Kafka credentials. A Kafka credential is an item that contains the connection properties required by Streams Replication Manager to establish a connection with a cluster. You can think of a Kafka credential as the definition of a single cluster. It contains the name (alias), address (bootstrap servers), and credentials that Streams Replication Manager can use to access a specific cluster.

Kafka credentials are created and configured in Cloudera Manager on the Administration > External Accounts > Kafka Credentials page. How you configure each Kafka credential depends on the security configuration of the cluster that it defines. Once a Kafka credential is created, the properties needed to access that cluster become available to Streams Replication Manager.

2 Defining co-located clusters

Co-located Kafka clusters can be defined in either of two ways. You can define these clusters using a service dependency or by using Kafka credentials. Cloudera recommends that you follow the recommendations provided and choose the method based on the security configuration of the co-located Kafka cluster. Additionally, Cloudera recommends that you use the service dependency method whenever possible. The method you choose also impacts how you configure the srm-control tool.

2a Defining co-located clusters using a service dependency

The service dependency method works by enabling a service dependency between the Streams Replication Manager service and the co-located Kafka cluster, specifying an alias for the co-located cluster, and configuring security related properties.

Cloudera recommends that you use this method if:

The co-located Kafka cluster is not secured.
The co-located Kafka cluster uses Kerberos for authentication.

2b Defining co-located clusters using Kafka credentials

Kafka credentials can be used to define co-located clusters. The steps to define a co-located cluster with a Kafka credential are identical to the steps you take when defining external clusters.

Cloudera recommends that you use this method if the co-located Kafka cluster uses an authentication mechanism that is not Kerberos, for example PLAIN.

3 Adding clusters to Streams Replication Manager’s configuration: Once both external and co-located clusters are defined, you must add all defined clusters to Streams Replication Manager’s configuration. This is done by configuring various configuration properties in Cloudera Manager.

Continue reading for step-by-step instructions on how to complete each of the configuration tasks.