Moving data from HDFS to Ozone using the distcp command
Use the hadoop distcp
command to move the content from the HDFS
source cluster.
distcp
command:- Execute the
distcp
command from the destination cluster. - Ensure that the
distcp
user can run a MapReduce job on YARN. Otherwise, you must tweak the following configurations to enable thedistcp
user:- allowed.system.users
- banned.users
- min.user.id
- If the source directories have a high file count, you can create a manual copy
listing as specified in the following
example.
> hdfs dfs -ls hdfs://<hdfs-nameservice>/user/john.doe/application1/* > src_files
The copy listing output file can be read and submitted as input one by one to a
distcp
job.
Consider the example of a user
john.doe
whose data is from the
/user/john.doe/application1/ directory and you want to
transfer to Ozone, run the distcp
command as specified.
> hadoop distcp -direct hdfs://<hdfs-nameservice>/user/john.doe/application1 ofs://<ozone.service.id>/user/john.doe/
hadoop distcp \
-Ddfs.checksum.combine.mode=COMPOSITE_CRC \ -Dozone.client.checksum.type=CRC32C \
-Dozone.om.kerberos.principal.pattern=* \
hdfs://ns1/tmp/ \
ofs://ozone1707264383/v1/b1/dst/