Behavioral changes in Apache Hive

Learn about the change in certain functionality of Hive that has resulted in a change in behavior from the previously released version to this version of Cloudera Runtime.

Summary:
Change in the way dates are parsed from string by ignoring trailing invalid characters
Previous behavior:
HIVE-20007 introduced changes in the way dates were parsed from strings. SQL functions or date operations involving invalid dates returned "null".
New behavior:
HIVE-27586 extracts and returns a valid date from a string value if there is a valid date prefix in the string. This fix partially restores the behavior changes introduced as part of HIVE-20007 and also makes the current behavior of handling trailing invalid characters more consistent.
The following table illustrates the behavior changes before and after the fix:
Strong value Behavior (before HIVE-20007) Previous behavior (after HIVE-20007) Current behavior (after HIVE-27586)
2023-08-03_16:02:00 2023-08-03 null 2023-08-03
2023-08-03-16:02:00 2023-08-03 null 2023-08-03
2023-08-0316:02:00 2024-06-11 null 2023-08-03
03-08-2023 0009-02-12 null 0003-08-20
2023-08-03 GARBAGE 2023-08-03 2023-08-03 2023-08-03
2023-08-03TGARBAGE 2023-08-03 2023-08-03 2023-08-03
2023-08-03_GARBAGE 2023-08-03 null 2023-08-03

This change affects various Hive SQL functions and operators that accept dates from string values, such as CAST (V AS DATE), CAST (V AS TIMESTAMP), TO_DATE, DATE_ADD, DATE_DIFF, WEEKOFYEAR, DAYOFWEEK, and TRUNC.

Summary:
Handling invalid date formats in to_date function.

The behavior of the to_date function has changed between CDP Private Cloud Base versions 7.1.7 SP2 and 7.1.9 when handling invalid date formats.

Previous behavior:
Valid dates (e.g., YYYY-MM-DD): Returned correct results.

Invalid dates: Returned random, unexpected dates instead of NULL.

New behavior:
Following the changes introduced inHIVE-28483:

Valid dates: Continue to return correct results.

Invalid dates (e.g., DD-MM-YYYY): Now return NULL.

Summary:
Change in the way date and timestamp values are parsed.
Previous behavior:
Some of the Hive date and timestamp functions, such as unix_timestamp(), from_unixtime(), date_format(), and cast() use the DateTimeFormatter class for printing and parsing date and timestamp objects. Prior to the CDP Private Cloud Base 7.1.7 SP2 version, these functions used the SimpleDateFormat class.
New behavior:
Starting from CDP Private Cloud Base 7.1.9 version Cumulative hotfix 1, a new configurable hive.datetime.formatter property is introduced through HIVE-25576 that enables you to choose between SimpleDateFormat and DateTimeFormatter for the unix_timestamp and from_unixtime SQL functions.

Although the DateTimeFormatter class is an improvement over SimpleDateFormat, some users may want to retain the old behavior to ensure compatibility after migration, therefore, making it necessary for introducing this property.

The possible values for the hive.datetime.formatter property are 'DATETIME' and 'SIMPLE' representing DateTimeFormatter and SimpleDateFormat respectively. The default value is set to 'DATETIME'.

Summary:
Change in default value of the hive.driver.parallel.compilation.global.limit property
Previous behavior:
The default value for the hive.driver.parallel.compilation.global.limit property is set to "3".
New behavior:
The default value for the hive.driver.parallel.compilation.global.limit property is changed to "5", which helps in preventing queries from getting stuck because of a limit on the number of queries that can be compiled in parallel on a HiveServer (HS2) instance.
Summary:
Change in default value of the hive.server2.tez.initialize.default.sessions property
Previous behavior:
The default value for the hive.server2.tez.initialize.default.sessions property is set to "true"
New behavior:
The default value for the hive.server2.tez.initialize.default.sessions property is changed to "false" to prevent queries from waiting on the same Tez AM pool and thereby improving query performance.

If there are multiple queries running, you might notice that the queries are taking longer to complete because the default value for hive.server2.tez.sessions.per.default.queue is 1, which means only one query can run at a time. Therefore, depending on your resource availability and query concurrency/load on the server, you can set hive.server2.tez.initialize.default.sessions to "true" and increase the value of hive.server2.tez.sessions.per.default.queue.

Summary:
HIVE-23100 introduces stricter type checking for comparison operations between columns and constants, affecting type combinations governed by the hive.strict.checks.type.safety property. This change ensures safer comparisons by enforcing stricter rules before any implicit type conversion occurs.
The table below outlines the behavior changes before and after HIVE-23100 for expressions of the form E1 op E2 or E2 op E1, where op represents comparison operators such as <, <=, =, >, >=, !=, E1 is a column reference, and E2 is a constant.
Case E1 Type E2 Type Before After Snippet
A BIGINT STRING OK ERROR/WARN c_bigint = '9223372036854775807'
B DOUBLE BIGINT OK WARN c_double = 9223372036854775807
Previous behavior:
Hive implicitly converted the constant (E2) to the column's type (E1) before enforcing strict type checking, allowing queries to complete without errors or warnings. However, this approach carried risks of unintended results due to unsafe type conversions.
New behavior:
Strict type checking is performed before any implicit type conversion, and the outcome depends on the hive.strict.checks.type.safety property setting:
  • If true, the query will throw a compilation error.
  • If false, the query will execute but will generate a warning in the logs.
The error or warning will highlight that such comparisons are generally unsafe, with the message:
Unsafe compares between different types are disabled for safety reasons.