Hive

You can review the list of reported issues and their fixes for Hive in 7.3.1.200.

CDPD-78342/CDPD-72605: Optimized partition authorization in HiveMetaStore to reduce overhead
The add_partitions() API in HiveMetastore was authorizing both new and existing partitions, leading to unnecessary processing and increased load on the authorization service.
The issue was addressed by modifying the add_partitions() API to authorize only new partitions, improving performance and reducing authorization overhead.
CDPD-77990: Upgraded MySQL Connector/J to 8.2.0 to fix CVE-2023-22102
The existing MySQL Connector/J version was vulnerable to CVE-2023-22102.
The issue was addressed by upgrading mysql-connector-j to version 8.2.0 in packaging/src/docker/Dockerfile.
CDPD-62654/CDPD-77985: Hive Metastore now sends a single AlterPartitionEvent for bulk partition updates
HiveMetastore previously sent individual AlterPartitionEvent for each altered partition, leading to inefficiencies and pressure on the back db.
The issue was addressed by modifying Hive Metastore to send a single AlterPartitionEvents containing a list of partitions for bulk updates, hive.metastore.alterPartitions.notification.v2.enabledto turn on this feature.

Apache Jira:HIVE-27746

CDPD-73669: Secondary pool connection starvation caused by updatePartitionColumnStatisticsInBatch API
Hive queries intermittently failed with Connection is not available, request timed out errors. The issue occurred because the updatePartitionColumnStatisticsInBatch method in ObjectStore used connections from the secondary pool, which had a pool size of only two, leading to connection starvation.
The fix ensures that the updatePartitionColumnStatisticsInBatch API now requests connections from the primary connection pool, preventing connection starvation in the secondary pool.

Apache Jira: HIVE-28456

CDPD-61676/CDPD-78341: Drop renamed external table fails due to missing update in PART_COL_STATS
When hive.metastore.try.direct.sql.ddl is set to false, dropping an external partitioned table after renaming it fails due to a foreign key constraint error in the PART_COL_STATS table. The table name in PART_COL_STATS is not updated during the rename, causing issues during deletion.
The issue was addressed by ensuring that the PART_COL_STATS table is updated during the rename operation, making partition column statistics usable after the rename and allowing the table to be dropped successfully.

Apache Jira: HIVE-27539

CDPD-79469: Selecting data from a bucketed table with a decimal column throws NPE
When hive.tez.bucket.pruning is enabled, selecting data from a bucketed table with a decimal column type fails with a NullPointerException. The issue occurs due to a mismatch in decimal precision and scale while determining the bucket number, causing an overflow and returning null.
The issue was addressed by ensuring that the correct decimal type information is used from the actual field object inspector instead of the default type info, preventing the overflow and NullPointerException.

Apache Jira: HIVE-28076

CDPD-74095: Connection timeout while inserting Hive partitions due to secondary connection pool limitation
Since HIVE-26419, Hive uses a secondary connection pool (size 2) for schema and value generation. However, this pool also handles nontransactional connections, causing the updatePartitionColumnStatisticsInBatch request to fail with a Connection is not available, request timed out error when the pool reaches its limit during slow insert or update operations.
The issue was addressed by ensuring that time-consuming API requests use the primary connection pool instead of the secondary pool, preventing connection exhaustion.

Apache Jira: HIVE-28456

CDPD-78331: HPLSQL built-in functions fail in insert statement
After the HIVE-27492 fix, some HPLSQL built-in functions like trim and lower stopped working in INSERT statements. This happened because UDFs already present in Hive were removed to avoid duplication, but HPLSQL's local and offline modes still required them.
The issue was addressed by restoring the removed UDFs in HPLSQL and fixing related function issues to ensure compatibility in all execution modes.

Apache Jira: HIVE-28143

CDPD-78343: Syntax error in HPL/SQL error handling
In HPL/SQL, setting hplsql.onerror using the SET command resulted in a syntax error because the grammar file (Hplsql.g4) only allowed identifiers without dots (.).
The issue was addressed by updating the grammar to support qualified identifiers, allowing the SET command to accept dot (.) notation.

Example: EXECUTE 'SET hive.merge.split.update=true';

Apache Jira: HIVE-28253

CDPD-78330: HPL/SQL built-in functions like sysdate not working
HPL/SQL built-in functions that are not available in Hive, such as sysdate, were failing with a SemanticException when used in queries. Only functions present in both HPL/SQL and Hive were working.
The issue was addressed by modifying the query parsing logic. Now, HPL/SQL built-in functions are executed directly, and only functions also available in Hive are forwarded to Hive for execution.

Apache Jira: HIVE-27492

CDPD-78345: Signalling CONDITION HANDLER is not working in HPLSQL
The user-defined CONDITION HANDLERs in HPLSQL are not being triggered as expected. Instead of running the handlers, the system only logs the conditions, so the handlers aren't available when needed.
The issue was addressed by ensuring that user-defined condition handlers are properly registered and invoked when a SIGNAL statement raises a corresponding condition.

Apache Jira: HIVE-28215

CDPD-78333: EXECUTE IMMEDIATE throwing ClassCastException in HPL/SQL
When executing a select count(*) query, it returns a long value, but HPLSQL expects a string. This mismatch causes the following error:
Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String
   at org.apache.hive.service.cli.operation.hplsql.HplSqlQueryExecutor$OperationRowResult.get
The issue was addressed by converting the result to a string when the expected type is a string.

Apache Jira: HIVE-28215

CDPD-79844: EXECUTE IMMEDIATE displaying error despite successful data load
Running EXECUTE IMMEDIATE 'LOAD DATA INPATH ''/tmp/test.txt'' OVERWRITE INTO TABLE test_table' displayed an error on the console, even though the data was successfully loaded into the table. This occurred because HPL/SQL attempted to check the result set metadata after execution, but LOAD DATA queries do not return a result set, leading to a NullPointerException.
The issue was addressed by ensuring that result set metadata is accessed only when a result set is present.

Apache Jira: HIVE-28766

CDPD-67033: HWC for Spark 3 compatibility with Spark 3.5
The Spark 3.5, based on Cloudera on cloud 7.2.18 libraries, caused a failure in the HWC for Spark 3 build. Canary builds indicate that broke compatibility.
The issue was addressed by updating HWC for Spark 3 to align with Spark 3.5 changes and ensuring compatibility with Cloudera on cloud 7.2.18 dependencies
CDPD-80097: Datahub recreation fails due to Hive Metastore schema validation error
Datahub recreation on Azure fails because Hive Metastore schema validation cannot retrieve the schema version due to insufficient permissions on the VERSION table.
This issue is now fixed.