HDFS replication policy considerations
Before you create an HDFS replication policy, you must understand how source data
gets affected if the source data is added or deleted during replication, network latency
issues, performance and scalability limitations, snapshot diff-based replication guidelines,
and how to bypass Sentry ACLs during replication.
How HDFS replication policy works Replication Manager replicates HDFS data depending on the "Source Path" and "Destination Path" you specify in the "Create HDFS Replication Policy" wizard. Additionally, you must follow a few guidelines to maintain the source data for successful data replication.Improve network latency during replication job run High latency among clusters can cause replication jobs to run more slowly, but does not cause them to fail.Performance and scalability limitations to consider for replication policies Before you create an HDFS replication policy, you must consider a few performance and scalability limitations.Guidelines to use snapshot diff-based replication By default, Replication Manager uses snapshot differences ("diff") to improve performance by comparing HDFS snapshots and only replicating the files that are changed in the source directory. While Hive metadata requires a full replication, the data stored in Hive tables can take advantage of snapshot diff-based replication. HDFS replication in Sentry-enabled clusters When you run an HDFS replication policy on a Sentry-enabled source cluster, the replication policy copies files and tables along with their permissions. Cloudera Manager version 6.3.1 and above is required to run HDFS replication policies on a Sentry-enabled source cluster. Specifying hosts to improve HDFS replication policy performance If your cluster has clients installed on hosts with limited resources, HDFS replication may use these hosts to run commands for the replication, which can cause performance degradation. You can limit HDFS replication to run only on selected DataNodes by specifying a "whitelist" of DataNode hosts.