Replication Manager on
Replication Manager can replicate HDFS directories, Hive external tables, Impala data, Hive ACID tables, Iceberg tables, Ranger policies and roles for HDFS, Hive, and HBase services, and data in Ozone buckets.
- Replicate from
CDH and Cloudera Private Cloud Base source clusters section lists the cluster and
runtime versions to:
- replicate data from CDH source clusters
- replicate data between clusters using same storage
- replicate data between clusters using different storage
- Replicate HDFS and Hive data to cloud storage
- Replicate from HDP 2 and HDP 3 source clusters
- Kerberos
- Replication Manager supports the following replication scenarios when Kerberos
authentication is used on a cluster:
- Secure source to a secure destination.
- Insecure source to an insecure destination.
- Insecure source to a secure destination. The following requirements must be met
for this scenario:
- When a destination cluster has multiple source clusters, all the source clusters must either be secure or insecure. Replication Manager does not support a mix of secure and insecure source clusters.
- The destination cluster must run 7.x or higher.
- The source cluster must run a compatible version.
- This replication scenario requires additional configuration. For more information, see Replicating from unsecure to secure clusters .
- Transport Layer Security (TLS)
- You can use TLS with Replication Manager. Additionally, Replication Manager supports replication scenarios where TLS is enabled for non-Hadoop services (Hive/Impala) and TLS is disabled Hadoop services (such as HDFS,YARN, and MapReduce).
- Apache Knox
- When is configured with Knox and the source and target clusters are Knox-SSO enabled, you must ensure that you use the port in the peer URL when you add the source and target clusters as peers.
Replicate from CDH and source clusters
The following tables list the source and destination clusters, lowest supported versions of , and the services that are available for each supported cloud provider for CDH and source clusters; ensure that the target database name is the same as the source database name, otherwise issues appear during or after data replication:
Source cluster | Lowest supported source version | Lowest supported source Cloudera Runtime version | Lowest supported destination cluster version | Supported services on Replication Manager |
---|---|---|---|---|
CDH 5 CDH 6 |
6.3.0 | 5.10 | 7.0.3 | HDFS, Sentry to Ranger*, Hive external tables |
*To perform Sentry to Ranger replication using HDFS and Hive external table replication policies, you must have installed version 6.3.1 and higher on the source cluster and version 7.1.1 and higher on the target cluster. |
Source cluster | Lowest supported source version | Lowest supported source Cloudera Runtime version | Destination cluster | Supported services on Replication Manager |
---|---|---|---|---|
7.1.1 | 7.1.1 |
|
||
7.7.1 | 7.1.8 |
|
||
7.7.1 CHF4 | 7.1.8 | Ozone buckets | ||
7.11.3 | 7.1.9 |
|
||
7.11.3 CHF7 | 7.1.9 SP1 | Atlas replication policies*** | ||
|
Source cluster | Lowest supported source version | Lowest supported source Cloudera Runtime version | Destination cluster | Supported services on Replication Manager |
---|---|---|---|---|
7.11.3 CHF1 | 7.1.9 | Replicate the data and metadata for Hive external tables from:
|
||
7.11.3 CHF2 | 7.1.9 | Replicate Hive ACID tables and Iceberg tables from:
|
||
7.11.3 CHF7 | 7.1.9 SP1 | Replicate metadata-only for Ozone storage-backed Hive external tables using Hive external table replication policies. You must replicate the data using Ozone replication policies. |
Replicate HDFS and Hive data to cloud storage
- Replicate to and from Amazon S3 from CDH 5.14+ and version 5.13+.
Replication Manager does not support S3 as a source or destination when S3 is configured to use SSE-KMS.
- Replicate to and from Microsoft ADLS Gen1 from CDH 5.13+ and 5.15, 5.16, 6.1+.
- Replicate to Microsoft ADLS Gen2 (ABFS) from CDH 5.13+ and 6.1+.
- Supports snapshots from CDH 5.15+ and 5.15+.
- Replicate HDFS and Hive external tables from 7.1.9 CHF3 and higher clusters using Dell EMC Isilon storage to Cloudera Public Cloud clusters on AWS, Azure, and GCP.
- Replicate HDFS and Hive external tables from 7.1.9 SP1 and higher to Cloudera Public Cloud clusters on GCP.
Starting in 6.1.0, Replication Manager ignores Hive tables backed by Kudu during replication. The change does not affect functionality since Replication Manager does not support tables backed by Kudu. This change was made to guard against data loss due to how the Hive Metastore, Impala, and Kudu interact.
Replicate from HDP 2 and HDP 3 source clusters
Replicating to and from HDP to 7.x is not supported by Replication Manager. However, you can replicate data using other methods. The following table lists the methods and the supported data replications to clusters that are supported:
Lowest supported source version | Services that require alternate replication methods |
---|---|
HDP 2.6.5 | HDFS. Use DistCp to replicate data. |
HDP 3.1.1 | HDFS. Use DistCp to replicate data. |
HDP 3.1.1 |
|
HDP 3.1.5 | Hive ACID tables to Cloudera 7.1.6 and higher clusters. Use REPL commands to replicate data. |