How HDFS replication policy works
Replication Manager replicates HDFS data depending on the "Source Path" and "Destination Path" you specify in the "Create HDFS Replication Policy" wizard. Additionally, you must follow a few guidelines to maintain the source data for successful data replication.
How Replication Manager handles HDFS replication
The following scenarios explain how Replication Manager handles HDFS replication:
- The base directory of the source path and destination path are the same. In
this scenario, the source path content is copied into the specified target/destination
path.
Example: The source path is /source/base, the target path is /target/base, and the source files to be replicated are /source/base/file-1.txt and /source/base/subdir-1/file-11.txt. Replication Manager replicates the files to /target/base/file-1.txt and /target/base/subdir-1/file-11.txt.
- The base directory of the source path and target path are different. In this
scenario, a new folder is created under the specified target path and the content of the
source path is copied into the new folder.
Example: The source path is /source/base-src, the target path is /target/base-tgt, and the source files to be replicated are /source/base-src/file-1.txt and /source/base-src/subdir-1/file-11.txt. Replication Manager replicates the files to /target/base-tgt/base-src/file-1.txt and /target/base-tgt/base-src/subdir-1/file-11.txt.
- The source path is globbed. In this scenario, the content of the source path
is copied under the target path.
Example: If the source path is /source/base-src/*, the target path is /target/base-tgt, and the source files to be replicated are /source/base-src/file-1.txt and /source/base-src/subdir-1/file-11.txt. Replication Manager replicates the files to /target/base-tgt/file-1.txt and /target/base-tgt/subdir-1/file-11.txt.
Guidelines to add or delete data during replication job run
If you want to add or delete source data during the replication job run, you must follow these guidelines:
- Do not modify the source directory. This is because a file that is added to the source directory during the replication job run does not get replicated, and if you delete a file during replication, the replication job fails.
- All the files in the directory must be in closed state. This is because the replication job fails if any source files are open.