Fixed Issues in Hive

Fixed issues for Hive are addressed in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.2

Cloudera Runtime 7.3.2 resolves Hive issues and incorporates fixes from the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all fixes in Cloudera Runtime 7.3.1.x, see Fixed Issues.

CDPD-97246: Avro schema literal logging at INFO level

7.3.2

Previously, the Avro deserializer logged the schema literal at the INFO level.

This issue is now resolved by changing the log level to DEBUG.

Apache Jira: HIVE-22606

CDPD-96649: Incorrect aggregate statistics when direct SQL batch retrieval is enabled

7.3.2

Previously, when hive.metastore.direct.sql.batch.size config was greater than 0, the system failed to merge column statistics correctly if the number of partitions or columns exceeded that batch size.

This issue is now resolved by ensuring the statistics are properly merged during the retrieval process, preventing redundant entries.

Apache Jira: HIVE-29203

CDPD-95834: Hive Metastore backend database schema out of date

7.3.2

Previously, the Hive Metastore (HMS) backend database schema files required updates for compatibility with the latest version.

This issue is now resolved by updating the schema files to version 7.3.2.0 to ensure a successful upgrade.

CDPD-95681: Hive Beeline connection failure due to SSL certificate hostname mismatch

7.3.2

Previously, the Beeline client failed to establish a connection to HiveServer2 in ZooKeeper High Availability (HA) setups.

This issue is now resolved by ensuring that the SSL certificates correctly match the DNS hostnames, allowing the secure connection to be verified and established.

CDPD-93766: Missing results during anti-join conversion

7.3.2

Previously, queries involving specific join conditions returned empty results instead of the expected data when anti-join conversion was enabled.

This issue is now resolved by preventing anti-join conversion in these specific scenarios to ensure query result accuracy.

Apache Jira: HIVE-29175

CDPD-93756: SHOW COMPACTIONS output filtering

7.3.2

Previously, the SHOW COMPACTIONS output included all historical information, which caused the display to become unwieldy when many partitions or history lines existed.

This issue is resolved by adding the ability to filter the command output by database, table, partition, and compaction type or state.

Apache Jira: HIVE-13353

CDPD-93617: Duplicate records during minor compaction

7.3.2

Previously, when the Hive Metastore (HMS) crashed, active compaction jobs were incorrectly reset.

This issue is now fixed by updating the compactor cleaner to address duplicate directories.

Apache Jira: HIVE-29210

CDPD-93432: Incorrect results for anti-join queries

7.3.2

Previously, the HiveAntiJoin rule incorrectly replaced IS NULL filters on nullable columns, which resulted in missing records in the query output.

This issue is now fixed by improving the logic within the HiveAntiSemiJoinRule to ensure accurate query plans.

Apache Jira: HIVE-29176

CDPD-93431: Incorrect results for n-way joins

7.3.2

Previously, queries produced incorrect results when an n-way join contained a combination of both anti and outer joins.

This issue is now resolved by extending the CommonJoinOperator to properly support anti joins when they are used alongside outer joins in n-way join operations.

Apache Jira: HIVE-29290

CDPD-93428: Manifest files appearing in read queries

7.3.2

Previously, temporary directories containing direct insert manifest files used a prefix that allowed them to be included in concurrent read queries.

This issue is now resolved by hiding the direct insert manifest directory from read queries, ensuring only valid data files are processed.

Apache Jira: HIVE-29297

CDPD-93425: Data loss during minor compaction

7.3.2

Previously, query-based minor compaction incorrectly used the minimum open write ID as a lower bound for selecting data files.

This issue is now fixed by removing the incorrect check against the minimum open write ID, allowing the high-watermark to correctly define the compaction range.

Apache Jira: HIVE-29272

CDPD-92923 / CDPD-92586: Memory leak in Hive Metastore REST Catalog

7.3.2

Previously, the Hive Metastore (HMS) REST Catalog leaked memory during ALTER TABLE authorization checks. Each check created new catalog instances and handler pools with JMX enabled, which prevented the system from reclaiming memory.

This issue is now fixed by reusing existing catalog instances and disabling JMX for the handler pool to allow the system to recover resources.

CDPD-92499: Hive LDAP authentication and ZooKeeper connections

7.3.2

Previously, Hive Lightweight Directory Access Protocol (LDAP) authentication failed when connecting to a SASL-enforced ZooKeeper instance.

This issue is now resolved by updating the service components to support simultaneous authentication configurations and ensuring correct credential handling during service discovery.

Apache Jira: HIVE-29138

CDPD-92478: Timestamp processing in MetaStoreUtils

7.3.2

Previously, Hive Metastore utilities used local time zone settings to convert between timestamps and strings.

This issue is now fixed by using UTC time zone and the java.time.Instant class to process timestamps, which ensures that time points are represented accurately regardless of local time zone rules.

Apache Jira: HIVE-28337

CDPD-91415: WebHCat and Python script compatibility with Python 3

7.3.2

Previously, hcat.py and various Python scripts used in q files contained syntax that was incompatible with Python 3.

This issue is now resolved by updating the Python scripts to use Python 3-compatible syntax.

Apache Jira: HIVE-25817

CDPD-90670: Incorrect results for queries with multiple lateral view operations

7.3.2

Previously, queries that used two or more LATERAL VIEW explode operations along with a WHERE clause returned incorrect results when Cost-Based Optimization (CBO) was enabled.

This issue is now resolved by updating the logic to correctly identify separate table aliases for lateral view columns, ensuring that filters are applied accurately.

Apache Jira: HIVE-29084

CDPD-90303: Incorrect results from a CASE expression

7.3.2

A query that used a CASE expression to conditionally return values produced an incorrect result. The query plan incorrectly folded the CASE statement into a COALESCE function, which led to a logic error that filtered out some of the expected results.

This issue is addressed by adding a more strict check when converting CASE expressions into COALESCE during query optimization.

Apache Jira: HIVE-24902

CDPD-89462: Performance degradation for wide tables in DirectSqlUpdatePart

7.3.2

Previously, updating or inserting partition statistics for tables with a high number of columns and partitions was slow.

This issue is now resolved by improving the hashing logic to ensure faster data retrieval and insertion, even for tables with thousands of columns and partitions.

Apache Jira: HIVE-29165

CDPD-88987: Improved performance for adding columns to partitioned tables

7.3.2

Previously, metadata operations for adding columns to tables with a high number of partitions and columns were slow because the system utilized a less optimized implementation for partition updates.

This issue is addressed by implementing a more efficient batch processing method for partition updates, which utilizes optimized metadata queries to improve performance for tables with many partitions.

Apache Jira: HIVE-28956

CDPD-88981: Performance degradation during column addition with cascade

7.3.2

Previously, adding columns to a table using the CASCADE command resulted in slower performance after optimizations for metadata storage were enabled.

This issue is now resolved. The fix includes an optimized method to reuse column descriptors across partitions, which restores performance levels during the column addition process.

Apache Jira: HIVE-29042

CDPD-88166: Query failure during JDBC filter optimization

7.3.2

Previously, certain queries failed with a class-cast error during the optimization phase.

This issue is now resolved by updating the query optimization rules to ensure that relational operators are correctly identified and processed.

Apache Jira: HIVE-25356

CDPD-87266: Query failure during Tez execution

7.3.2

Previously, Hive queries failed during execution after an upgrade. This resulted in a vertex failure and prevented queries from completing successfully.

This issue is now resolved by updating the execution engine to correctly instantiate internal query split generators during the initialization process.

CDPD-84149: MariaDB connector recognition failure

7.3.2

Previously, Hive failed to recognize the MariaDB connector even when the driver was present.

This issue is now fixed.

CDPD-83461: Query failure when using stack function with union operations

7.3.2

Previously, queries utilizing the STACK function in combination with UNION operations failed with an internal error during the compilation phase.

This issue is now resolved by updating the query optimization logic to correctly handle the stack function during union operations, preventing the internal processing error.

Apache Jira: HIVE-29029

CDPD-83334: Improving performance for alter partition operations

7.3.2

When altering partitions, the system used Java Data Objects (JDO) updates, which required fetching all fields of old partitions and produced redundant queries, leading to slower performance.

This issue is resolved by implementing direct SQL for altering partitions.

Apache Jira: HIVE-27530

Direct SQL failure during partition alterations

7.3.2

Previously, direct SQL for partition alterations failed in certain database environments due to Character Large Object (CLOB) casting errors and missing boolean type conversion checks.

This issue is resolved by updating the direct SQL logic to handle CLOB types correctly and ensuring proper boolean type conversions during batch updates.

Apache Jira: HIVE-28271

CDPD-80146: Group by alias query failures

7.3.2

Previously, the hive.runtime.dialect.enable property was enabled by default, which caused the hive.groupby.position.alias property to be ignored.

This issue is resolved by setting the hive.runtime.dialect.enable property to false by default. Hive now correctly respects the hive.groupby.position.alias configuration.

CDPD-79144: Incorrect schema version in Hive schema initialization script for MySQL

7.3.2

Cluster creation with 7.3.1 fails due to an incorrect database schema in the CDH_VERSION table.

The issue was addressed by correcting the schema version in the Hive schema initialization script, ensuring successful cluster creation.

CDPD-78337: Merge task not invoked for external CTAS queries on object stores

7.3.2

Previously, the merge task was not invoked for external Create Table As Select (CTAS) queries when using S3 or other object stores.

This issue is now resolved by ensuring the merge task is correctly invoked after optimization for external CTAS queries.

Apache Jira: HIVE-27536

CDPD-78329: HiveServer2 runs out of memory with multiple parallel queries with fetch task

7.3.2

Previously, HiveServer2 may run out of memory when multiple parallel queries use fetch task caching (hive.fetch.task.caching=true). This causes queries to fail and HiveServer2 to crash.

The issue was addressed by reducing the default value of hive.fetch.task.conversion.threshold from one GB to 200MB, preventing excessive memory usage and improving stability.

CDPD-77869: Iceberg table data written to HDFS instead of S3 in RAZ-enabled clusters

7.3.2

Previously, when you configured a cluster with Ranger Remote Authorization Board (RAZ) and updated configurations to use S3, data for Iceberg tables was unexpectedly written to the HDFS external table location. This occurred because Iceberg tables, which are treated as external tables, defaulted to the database LOCATION property that still pointed to HDFS if the database was created prior to the S3 switch.

This issue is now fixed by ensuring that external tables correctly align with the intended S3 storage paths. You can now update the database location to point to the S3 bucket to ensure all future external tables default to the cloud storage.

CDPD-75665: Importing a table generates a DDL with an incorrect location

7.3.2

When creating a table using the IMPORT command, this table's partitions could point to an incorrect location, that is, an external imported table can have his partitions located under the managed warehouse directory, which violates the metastore.warehouse.external.dir and metastore.warehouse.dir that intend to host different type of tables.

Apache Jira: HIVE-28580

Hive query execution failure due to AM container exit on lost node with Exit code -100

7.3.2

Hive query failed when ApplicationMaster container was lost

Previously, when running a Hive query, a failed ApplicationMaster (AM) container did not trigger a DAG retry and caused the query execution to fail if the failure message included diagnostic information with a line break.

This issue is now resolved by automatically re-executing the DAG if the AM fails.

Apache Jira: HIVE-28093

CDPD-74539: MariaDB falls back to MySQL in Hive

7.3.2

Hive downstream had errors in supporting MariaDB.

The issue was addressed by making MariaDB automatically fall back to MySQL.

CDPD-66731: Hive Metastore query failure during Zero Downtime Upgrade

7.3.2

Previously, during a Zero Downtime Upgrade (ZDU) from version 7.2.17 to 7.2.18, long-running queries such as INSERT INTO statements failed with a MetaException.

This issue is now fixed by ensuring that transaction blocks are correctly managed during the statistics update task, preventing the "current transaction is aborted" error.

CDPD-60770: Passwords with special characters fail to connect with Beeline

7.3.2

When you used a password containing special characters like #, ^, or ; in a JDBC URL for a Beeline connection, the connection failed with a 401 error. This happened because Beeline did not correctly interpret these special characters in the password.

This issue is resolved by introducing a new method to reparse the password from the original JDBC URL, allowing Beeline to correctly handle and authenticate passwords containing special characters.

Apache Jira: HIVE-28805

CDPD-58428: Bucket Map Join hangs when source vertex parallelism changes

7.3.2

Previously, a Bucket Map Join could hang if the parallelism of a source vertex was modified by automated reducer parallelism.

This issue is now fixed by disabling automated reducer parallelism for vertices that serve as a source for a Bucket Map Join.

Apache Jira: HIVE-27078

CDPD-58428: Incorrect results in map-side Sort-Merge Bucket Join with different bucket sizes

7.3.2

Previously, map-side Sort-Merge Bucket (SMB) Joins returned incorrect results when joining two tables with different bucket counts (for example, joining a table with two buckets to a table with three buckets).

This issue is now fixed by implementing a new routing algorithm that ensures bucket N of a small table is correctly mapped to bucket M of a large table based on the greatest common divisor of their bucket sizes.

Apache Jira: HIVE-27357

CDPD-50060: Configurable filter for partition metadata properties in Hive Metastore

7.3.2

Previously, Hive Metastore (HMS) API calls failed with a TTransportException (MaxMessageSize reached) when processing tables with large partition metadata.

This issue is now fixed by providing a configurable filter that excludes unnecessary properties from listPartitions API responses. This change reduces the metadata payload size, prevents connection timeouts, and improves the performance of metadata operations.

Apache Jira: HIVE-27114

CDPD-44551: Avro table import or download fails with ODBC driver due to missing property

7.3.2

The absence of metastore.storage.schema.reader.impl caused Avro table import or download failures in Cloudera runtime 7.1.7 when using the ODBC driver.

The issue was addressed by setting metastore.storage.schema.reader.impl to org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader by default.

Apache Jira: HIVE-26952

CDPD-92208: Query failure when selecting data from views with bracketed definitions

7.3.2

Previously, a SELECT query against a view failed with a SemanticException if the view was created with specific column names and a definition enclosed in brackets.

This issue is now fixed by ensuring that the compiler does not add extra brackets if the view definition is already enclosed.

Apache Jira: HIVE-26493

CDPD-83530: Task commits were allowed despite an exception being thrown in the Tez processor

7.3.2

A communication failure between the coordinator and executor caused a running task to terminate, resulting in a java.lang.InterruptedException being thrown by the ReduceRecordProcessor.init(). Despite this exception, the process still allowed the task to be committed and generated a commit manifest.

This issue has now been resolved. The fix ensures that outputs are not committed if an exception is thrown in the Tez processor.

Apache Jira: HIVE-28962

CDPD-89414: Incorrect results for window functions with IGNORE NULLS

7.3.2

When you used the FIRST_VALUE and LAST_VALUE window functions with the IGNORE NULLS clause while vectorization was enabled, the results were incorrect. This occurred because the vectorized execution engine did not properly handle the IGNORE NULLS setting for these functions.

This issue is addressed by modifying the vectorized processing for FIRST_VALUE and LAST_VALUE to correctly respect the IGNORE NULLS clause, ensuring the same results are produced whether vectorization is enabled or disabled.

Apache Jira: HIVE-29122

CDPD-85600: Select queries with ORDER BY fail due to compression error

7.3.2

When you ran a Hive SELECT query with an ORDER BY clause, it failed with a java.io.IOException and java.lang.UnsatisfiedLinkError related to the zlib decompressor.

The issue was addressed by ensuring the zlib native library is correctly loaded.

Apache Jira: HIVE-28805

CDPD-90301: Stack overflow error from queries with OR and MIN filters

7.3.2

Queries, cause a stack overflow error when they contained multiple OR conditions on the same expression, such as

MINUTE(date_) = 2 OR
              MINUTE(date_) = 10

This issue is addressed by modifying the HivePointLookupOptimizerRule to keep the original order of expressions and to check if a merge can be performed before creating a new expression.

Apache Jira: HIVE-29208

DWX-20754: Invalid column reference in lateral view queries

7.3.2

The virtual column BLOCK__OFFSET__INSIDE__FILE fails to be correctly referenced in queries using lateral views, resulting in the error:

FAILED: SemanticException Line 0:-1 Invalid column reference 'BLOCK_OFFSET_INSIDE_FILE.

This issue is now resolved.

Apache Jira:HIVE-28938