Fixed issues for Impala are addressed in Cloudera Runtime 7.3.2, its
service packs and cumulative hotfixes.
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 resolves Impala issues and incorporates fixes from
the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a
comprehensive record of all fixes in Cloudera Runtime 7.3.1.x, see Fixed Issues.
- CDPD-98207: Impala crashing on the Web UI for failed queries
- 7.3.2
- Previously, Impala crashed when you accessed the query summary or JSON plan through the Web UI for queries that failed before execution. This occurred during scenarios such as a Create Table As Select (CTAS) failure or when admission control rejected a query.
- This issue is addressed by ensuring the system correctly handles missing execution summaries. This issue is now fixed.
Apache Jira:
IMPALA-14791
- CDPD-97786: Excessive partition events during table-level operations
- 7.3.2
- Previously, certain table-level operations, such as dropping incremental statistics or setting/unsetting cached properties, triggered an individual ALTER PARTITION event for every partition in a table.
- This issue is addressed by implementing bulk updates for partitions.
Apache Jira:
IMPALA-13599
- CDPD-97187: Deprecation warnings in impala-shell with Python 3.11
- 7.3.2
- Previously, when running impala-shell by using Python 3.11 or newer, the command output displayed
DeprecationWarning messages related to ssl.PROTOCOL_TLS and ssl.match_hostname(). These warnings were triggered by underlying library dependencies.
- This issue is now resolved by updating the handling of SSL protocols and validating logic to be compatible with newer Python versions, which eliminates these warning messages from the shell output.
Apache Jira:
IMPALA-12219
- CDPD-91994: Stale query IDs in catalog logs
- 7.3.2
- Previously, catalog logs for
getPartialCatalogObject certain metadata requests displayed incorrect query IDs.
- This issue is addressed by ensuring that each request is associated with its correct query ID. The system now automatically clears the identification after the request finishes to prevent stale information from appearing in later logs.
Apache Jira:
IMPALA-14494
- CDPD-82673: RSASSA-PSS certificate signature schema is now supported for server certificates
- 7.3.2
- Previously, if you used a certificate with the RSASSA-PSS signature algorithm for kRPC communication, the connection failed.
- The fix includes an updated OpenSSL function that correctly identifies the hash algorithm for RSASSA-PSS certificates.
Apache Jira:
IMPALA-14038
- CDPD-89852: Crash when casting timestamp strings with timezone offsets to DATE
- 7.3.2
- Attempting to cast a timestamp string that included a timezone offset (like "+08:00" in "2025-08-31 06:23:24.9392129 +08:00" ) to the
DATE data type would cause a crash.
- This issue is addressed by adding a check to ensure that the timestamp string length does not exceed the maximum length of the default date-time format. Longer strings will now use a lazily-created format, which prevents the crash.
Apache Jira:
IMPALA-14383
- CDPD-89730: Impala daemon crashed during scans with high logging levels
- 7.3.2
- Previously, the Impala daemon experienced a null pointer dereference in the
BaseSequenceScanner component when the logging level was set to 2 or higher, leading to crashes in release builds.
- This issue is resolved by correcting the pointer handling in the sequence scanner to ensure safe memory access when high-level logging is active.
Apache Jira:
IMPALA-14382
- CDPD-89346: Enhanced join strategy selection for large clusters
- 7.3.2
- The query planner's cost model for broadcast joins can be skewed
by the number of nodes in a cluster. This lead to suboptimal join strategy choices,
especially in large clusters with skewed data where a partitioned join was chosen over a
more efficient broadcast join.
- This issue is now resolved by introducing the broadcast_cost_scale_factor query option as an additional tuning option besides query hint to override query planner decision.
Apache Jira:
IMPALA-14263
- CDPD-89132: Tables incorrectly dropped by stale HMS events after global metadata invalidation
- 7.3.2
- Previously, a stale event such as DropTable or AlterTableRename
post global INVALIDATE METADATA command could cause tables to be
unintentionally dropped
- This issue is resolved by tracking the
createEventId as the current HMS event ID for all tables during a global reset.
Apache Jira:
IMPALA-14330
- CDPD-79111: Authentication failure in impala-shell with 76 character LDAP passwords
- 7.3.2
- Previously, when you used impala-shell with the HS2-HTTP protocol and a 76 character LDAP password, the connection failed with a value error.
- This issue is resolved by an updated encoding method that handles long password strings without inserting line breaks, ensuring that the authorization header remains valid for the server.
Apache Jira:
IMPALA-13746
- CDPD-92001: Metadata loading performed sequentially in local catalog mode
- 7.3.2
- Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala triggered metadata loading for those tables sequentially.
- This issue is resolved by parallelizing table loading during query compilation. A new startup flag, max_stmt_metadata_loader_threads, is introduced to control the number of threads used for loading metadata, with a default value of 8 threads per query. If only one table requires loading or if the thread pool is unavailable, the system automatically falls back to sequential loading.
IMPALA-14447
- CDPD-79241: Incorrect query results for Iceberg V2 tables
- 7.3.2
- Previously, when you ran complex queries involving multiple subqueries on Iceberg V2 tables, the system sometimes returned incorrect results.
- This issue is now resolved. The fix includes a new internal mechanism to track and apply count optimizations.
- Cookie-Based authentication support for JWT tokens
- 7.3.2
- When JWT tokens are used for authentication, every
HTTP request within a session requires token verification. If these
tokens have a short lifespan, it can lead to authentication failures and disrupt session
continuity.
- This issue is now resolved by using authentication cookies,
which generally have a longer lifespan (configured through the
max_cookie_lifetime_s flagfile option) and can remain valid for the
duration of the session. This enables subsequent authentication requests to rely on
cookies rather than repeatedly verifying the JWT token.Apache Jira: IMPALA-13813
- CDPD-80798: Stable Catalogd initialization in HA mode
- 7.3.2
- Catalogd initialization previously might timeout to complete in
high availability mode. This happened because metadata operations started prematurely,
blocking Catalogd from becoming active.
- This issue is resolved by ensuring Catalogd determines HA state
before starting metadata operations in HA mode. This prevents blocking issues and
ensures a stable startup.
Apache Jira: IMPALA-13850
- CDPD-83059: Optimized Impala Catalog cache warmup
- 7.3.2
- Impala's Catalogd previously started with an empty cache. This
led to slow query startup for important tables and affected high availability
failovers.
- This issue is resolved by adding new settings to pre-load
specific tables into the Catalogd cache in the background. This ensures faster query
startup and smoother high availability failovers.
Apache
Jira: IMPALA-14074
- CDPD-87222: Consistent TRUNCATE operations for external
tables
- 7.3.2
- Impala's
TRUNCATE operations on external tables
previously did not consistently delete files in subdirectories, even when recursive
listing was enabled.
- This issue is resolved by ensuring Impala uses the HMS API for
TRUNCATE operations by default.Apache
Impala: IMPALA-14189, IMPALA-14224
- DWX-21855: Impala Executors fail to gracefully shutdown
- 7.3.2
- During graceful shutdown Impala executors wait for running
queries to finish up to the graceful shutdown deadline
(
--shutdown_deadline_s). During graceful shutdown the istio-proxy
container on Impala executor pod was getting terminated immediately and as a result the
executors were not reachable and were removed from the Impala cluster membership
resulting in cancellation of running queries.
- This issue is now resolved by making sure istio-proxy
container’s lifecycle doesn’t impact executor’s cluster membership.
- IMPALA-14263: Enhanced join strategy for large clusters
- 7.3.2
- The query planner's cost model for broadcast joins can be skewed
by the number of nodes in a cluster. This could lead to suboptimal join strategy
choices, especially in large clusters with skewed data where a partitioned join was
chosen over a more efficient broadcast join.
- This issue is now resolved by introducing the
broadcast_cost_scale_factor query option as an additional tuning
option besides query hint to override query planner decision. To set it cluster-wide for
all queries, add the following key-value to the default_query_options
startup option: broadcast_cost_scale_factor=<less than 1.0>
Apache Jira: IMPALA-14263
- IMPALA-11402: Fetching metadata for tables with huge numbers of
files no longer fails with OutOfMemoryError
- 7.3.2
- Previously, when Impala Coordinator tried to fetch file metadata
for extremely large tables (those with millions of files or partitions), the Impala
Catalog service would attempt to return all the file details at once. This often
exceeded the Java memory limits, causing the service to crash with an
OutOfMemoryError.
- This issue is addressed by configuring the Catalog service to
limit the number of file descriptors included in a single
getPartialCatalogObject response. A new configuration flag,
catalog_partial_fetch_max_files, is introduced to define the maximum
number of file descriptors allowed per response (with a default of 1,000,000
files).
- If a request exceeds this limit, the Catalog service will
truncate the response and return metadata for only a subset of the requested partitions.
The coordinator is now designed to detect this truncated response and automatically send
new batch requests to fetch the remaining partitions until all required metadata is
retrieved. This change ensures that the coordinator can successfully fetch and process
the metadata for extremely large tables without crashing due to memory limits.
-
Apache Jira: IMPALA-11402
- CDPD-77261: Impala can now read Parquet integer data as DECIMAL
after schema changes
- 7.3.2
- Previously, if you changed a column type from an integer
(
INT or BIGINT) to a DECIMAL using
ALTER TABLE, Impala could fail to read the original Parquet data
files. This happened because the files lacked the specific metadata (logical types)
Impala expected for decimals, resulting in an error.
- Impala is now more flexible when reading Parquet files following
schema evolution. If Impala encounters an integer type but the schema expects a
DECIMAL, it automatically assumes a suitable decimal precision and
scale, allowing you to successfully query the updated table:
INT32 is read as DECIMAL(9, 0).
INT64 is read as DECIMAL(18, 0).
This change supports common schema evolution practices by allowing you to update
column types without manually rewriting old data files. Apache Jira: IMPALA-13625
- IMPALA-12927: Impala can now correctly read BINARY columns in
JSON tables
- 7.3.2
- Previously, Impala couldn't correctly read
BINARY columns in JSON tables, often resulting in errors or incorrect
data. This happened because Impala assumed the data was always Base64 encoded, which
wasn't true for files written by older Hive versions.
- Impala now supports a new table property,
'json.binary.format' (BASE64 or
RAWSTRING), and a query option, JSON_BINARY_FORMAT,
to explicitly define the binary encoding. This ensures Impala reads the data correctly.
If no format is specified, Impala will now return an error instead of risking silent
data corruption.
-
JIRA Issue: IMPALA-12927