Learn about the change in certain functionality of Hive that has resulted in a change in
behavior from the previously released version to this version of Cloudera Runtime.
Summary:
Change in the way dates are parsed from string by ignoring
trailing invalid characters
Previous behavior:
HIVE-20007 introduced changes in the way dates
were parsed from strings. SQL functions or date operations involving invalid dates
returned "null".
New behavior:
HIVE-27586 extracts and returns a valid date from
a string value if there is a valid date prefix in the string. This fix partially restores
the behavior changes introduced as part of HIVE-20007 and also makes the current behavior
of handling trailing invalid characters more consistent.
The following table illustrates
the behavior changes before and after the fix:
Strong value
Behavior (before HIVE-20007)
Previous behavior (after HIVE-20007)
Current behavior (after HIVE-27586)
2023-08-03_16:02:00
2023-08-03
null
2023-08-03
2023-08-03-16:02:00
2023-08-03
null
2023-08-03
2023-08-0316:02:00
2024-06-11
null
2023-08-03
03-08-2023
0009-02-12
null
0003-08-20
2023-08-03 GARBAGE
2023-08-03
2023-08-03
2023-08-03
2023-08-03TGARBAGE
2023-08-03
2023-08-03
2023-08-03
2023-08-03_GARBAGE
2023-08-03
null
2023-08-03
This change affects various Hive SQL functions and operators that
accept dates from string values, such as CAST (V AS DATE), CAST (V AS TIMESTAMP),
TO_DATE, DATE_ADD, DATE_DIFF, WEEKOFYEAR, DAYOFWEEK, and TRUNC.
Summary:
Handling invalid date formats in to_date
function.
The behavior of the to_date function has changed between
CDP Private Cloud Base versions 7.1.7 SP2 and
7.1.9 when handling invalid date formats.
Previous behavior:
Valid dates (e.g., YYYY-MM-DD): Returned correct
results.
Invalid dates: Returned random, unexpected dates instead of NULL.
Invalid dates (e.g., DD-MM-YYYY): Now return NULL.
Summary:
Change in the way date and timestamp values are parsed.
Previous behavior:
Some of the Hive date and timestamp functions, such as
unix_timestamp(), from_unixtime(),
date_format(), and cast() use the
DateTimeFormatter class for printing and parsing date and timestamp
objects. Prior to the CDP Private Cloud Base 7.1.7 SP2 version, these functions used the
SimpleDateFormat class.
New behavior:
Starting from CDP Private Cloud Base
7.1.9 version Cumulative hotfix 1, a new configurable
hive.datetime.formatter property is introduced through HIVE-25576 that enables you to choose between SimpleDateFormat
and DateTimeFormatter for the unix_timestamp and
from_unixtime SQL functions.
Although the
DateTimeFormatter class is an improvement over
SimpleDateFormat, some users may want to retain the old behavior to
ensure compatibility after migration, therefore, making it necessary for introducing
this property.
The possible values for the
hive.datetime.formatter property are 'DATETIME' and 'SIMPLE'
representing DateTimeFormatter and SimpleDateFormat
respectively. The default value is set to 'DATETIME'.
Summary:
Change in default value of the
hive.driver.parallel.compilation.global.limit property
Previous behavior:
The default value for the
hive.driver.parallel.compilation.global.limit property is set to "3".
New behavior:
The default value for the
hive.driver.parallel.compilation.global.limit property is changed to
"5", which helps in preventing queries from getting stuck because of a limit on the number
of queries that can be compiled in parallel on a HiveServer (HS2) instance.
Summary:
Change in default value of the
hive.server2.tez.initialize.default.sessions property
Previous behavior:
The default value for the
hive.server2.tez.initialize.default.sessions property is set to
"true"
New behavior:
The default value for the
hive.server2.tez.initialize.default.sessions property is changed to
"false" to prevent queries from waiting on the same Tez AM pool and thereby improving
query performance.
If there are multiple queries running, you might notice that the
queries are taking longer to complete because the default value for
hive.server2.tez.sessions.per.default.queue is 1, which means only
one query can run at a time. Therefore, depending on your resource availability and
query concurrency/load on the server, you can set
hive.server2.tez.initialize.default.sessions to "true" and increase
the value of hive.server2.tez.sessions.per.default.queue.
Summary:
HIVE-23100 introduces stricter type checking for
comparison operations between columns and constants, affecting type combinations governed
by the hive.strict.checks.type.safety property. This change ensures safer
comparisons by enforcing stricter rules before any implicit type conversion occurs.
The
table below outlines the behavior changes before and after HIVE-23100 for expressions of
the form E1 op E2 or E2 op E1, where op represents comparison operators such as <,
<=, =, >, >=, !=, E1 is a column reference, and E2 is a constant.
Case
E1 Type
E2 Type
Before
After
Snippet
A
BIGINT
STRING
OK
ERROR/WARN
c_bigint = '9223372036854775807'
B
DOUBLE
BIGINT
OK
WARN
c_double = 9223372036854775807
Previous behavior:
Hive implicitly converted the constant (E2) to the column's type
(E1) before enforcing strict type checking, allowing queries to complete without errors or
warnings. However, this approach carried risks of unintended results due to unsafe type
conversions.
New behavior:
Strict type checking is performed before any implicit type
conversion, and the outcome depends on the hive.strict.checks.type.safety
property setting:
If true, the query will throw a compilation error.
If false, the query will execute but will generate a warning in the logs.
The error or warning will highlight that such comparisons are generally unsafe,
with the
message:
Unsafe compares between different types are disabled for safety reasons.