Migrate Kafka Using Streams Replication Manager

Learn about the different options you have when migrating Kafka from HDF to Cloudera Private Cloud Base using Streams Replication Manager (SRM).

Kafka data is migrated from HDF to Cloudera Private Cloud Base using SRM. SRM can replicate data in various ways. How the data is replicated, and in this case migrated, is determined by the replication policy that is in use.

There are three replication policies that you can use when migrating data. These are the DefaultReplicationPolicy, the IdenityReplicationPolicy, and the MigratingReplicationPolicy. The following gives an overview of each policy and provides recommendations on which policy to use in different scenarios. Review the following sections and choose the policy that is best suited for your requirements.

DefaultReplicationPolicy

The DefaultReplicationPolicy is the default and Cloudera-recommended replication policy. This policy prefixes the remote (replicated) topics with the cluster name (alias) of the source topics. For example, the topic1 topic from the us-west source cluster creates the us-west.topic1 remote topic on the target cluster. Use this policy if topics getting renamed during the migration is acceptable for your deployment.

Additional notes:
  • Remote topics will have different names in the target cluster. As a result, you must reconfigure existing Kafka clients to use the remote topic names.
  • If you decide to, you can repurpose the SRM service you set up for migration and continue using it for replication.

IdentityReplicationPolicy

The IdentityReplicationPolicy does not change the names of remote topics. When this policy is in use, topics retain the same name on both source and target clusters. For example, the topic1 topic from the us-west source cluster creates the topic1 remote topic on the target cluster. Use this policy if you are on Cloudera Runtime 7.1.8 or higher and do not want remote topics to get renamed during migration.

Additional notes:
  • In Cloudera Runtime 7.1.8 replication monitoring with this policy is not supported. This means that you will not be able to validate or monitor replications during the migration process. Support for replication monitoring is, however, available in Cloudera Runtime 7.1.9 or higher.
  • If you decide to, you can repurpose the SRM service you set up for migration and continue using it for replication.
  • If you want to continue using SRM after migration, review the limitations of this policy in the SRM Known Issues of the appropriate Cloudera Runtime version. Different limitations might apply depending on the Cloudera Runtime version.

MigratingReplicationPolicy

The MigratingReplicationPolicy is a custom replication policy that Cloudera provides the code for, but is not shipped with SRM like the IdentityReplicationPolicy or the DefaultReplicationPolicy. As a result, you must implement, compile, and package it as a JAR yourself.

This policy behaves similarly to the IdentityReplicationPolicy and does not rename replicated topics on target clusters. However, unlike the IdentityReplicationPolicy, this policy is only supported in data migration scenarios. Use this policy if you are using Cloudera Runtime 7.1.7 or lower and you do not want replicated topics to get renamed.

Additional notes:
  • If you are using Cloudera Runtime 7.1.8 or later, Cloudera recommends that you use the IdentityReplicationPolicy instead.
  • Other than implementing, compiling, and packaging the policy, you also need to carry out advanced configuration steps to use the policy.
  • Replication monitoring with this policy is not supported. This means that you will not be able to validate or monitor replications during the migration process.
  • This replication policy is only supported with a unidirectional data replication setup where replication happens from a single source cluster to a single target cluster. Configuring additional hops or bidirectional replication is not supported and can lead to severe replication issues.
  • Using an SRM service configured with this policy for any other scenario than data migration is not supported. Once migration is complete, the SRM instance you set up must be reconfigured to use the IdenityReplicationPolicy or DefaultReplicationPolicy. Alternatively, you can delete SRM from the cluster.