Apache Tez processing of Hive jobs
If you were running Hive on HDP or Cloudera, you have been running Hive queries using the Apache Tez execution engine. Hive in Cloudera Data Warehouse Private Cloud also uses Tez to run queries and is a HiveServer2 endpoint as it is in HDP or Cloudera. Learn how Tez processes Hive jobs in Cloudera and Cloudera Data Warehouse and understand the tasks that you need to perform after migrating your workloads to Cloudera Data Warehouse.
Hive is fundamentally the same technology in HDP, Cloudera Private Cloud Base, and Cloudera Data Warehouse Private Cloud. Hive syntax and semantics are basically the same after upgrading from HDP to Cloudera Private Cloud or to Cloudera Data Warehouse Private Cloud.
- Container mode — Every time you run a Hive query, Tez requests a container from YARN.
- LLAP mode — Every time you run a Hive query, Tez asks the LLAP daemon for a free thread, and starts running a fragment.
In Cloudera Data Warehouse, the Hive execution mode is LLAP. In Cloudera Data Hubon Cloudera Public Cloud and Cloudera Private Cloud Base, the Hive execution mode is container, and LLAP mode is not supported. When Apache Tez runs Hive in container mode, it has traditionally been called Hive on Tez.
Considerations
There are certain differences between Hive on Tez and LLAP that you need to be aware of before migrating to Cloudera Data Warehouse Private Cloud.
- The HiveServer2 endpoints authenticate using LDAP instead of Kerberos.
- Your old Hive JDBC drivers need to be replaced with the latest drivers.
- If you have Hive User-Defined Functions (UDFs) in Cloudera Private Cloud Base then the UDF JARs have to be added to the Cloudera Data Warehouse Hive classpath and registered.
Post-migration tasks
After migrating to Cloudera Data Warehouse Private Cloud, perform the following tasks:
- Download the latest Hive JDBC drivers from the Hive JDBC driver download page and follow the driver installation instructions on the download page.
- Update the JDBC client connection URL to point to the Virtual Warehouse instance of HiveServer2.
- If your previous connection in Cloudera Private Cloud Base used Kerberos for authentication, you must modify the connection URL accordingly.
- Ensure that the UDF JARs are added to the
CDW_HIVE_AUX_JARS_PATH
environment variable.