Fixed issues in Cloudera Data Warehouse on premises 1.5.5 SP3

Fixed issues for Cloudera Data Warehouse 1.5.5 Service Pack 3 provide resolutions for identified bugs across the Cloudera Data Warehouse components.

Fixed issues in Cloudera Data Warehouse on premises

DWX-22414: Log router reloader crashes on large clusters
Previously, the log router's configuration reloader loaded all configmaps across the entire cluster into memory to discover relevant namespaces for log routing. In large deployments, this caused the component to exceed its 512 MB memory limit and crash, leading to unstable log routing. This occurred because the third-party operator mistakenly cached all configmaps in the cluster, including unrelated ones from namespaces such as istio-system.

This issue is now fixed. The third-party reloader is replaced with a custom Cloudera Data Warehouse reloader that only targets relevant namespaces, eliminating the global memory overhead. Log routing configurations are now generated directly within the log router, and a startup migration automatically updates existing private cloud namespaces during upgrade to ensure smooth, stable log routing.

DWX-23068: Kubernetes liveness probe failure in Hive Virtual Warehouse with active-passive HiveServer2 HA

Previously, in a Hive Virtual Warehouse configured with active-passive HiveServer2 high availability (HA), the Kubernetes liveness probe would fail. This triggered a warning stating that the health check script or directory could not be found, resulting in a stat /healthz-ha.sh | grep -q "IS_LEADER": no such file or directory error message. This issue was caused by invalid syntax in the probe commands specified within the HiveServer2 StatefulSet deployment.

This issue is now resolved. The corrected syntax for the liveness and readiness probes within the HiveServer2 StatefulSet now properly runs the health check script.

DWX-22464: CDP CLI validation error when updating Virtual Warehouse t-shirt size to waa in Cloudera Data Warehouse
Previously, the CDP CLI incorrectly rejected updates to an Impala Virtual Warehouse that included waa (Workload Aware Autoscaling) as the t-shirt size, even when the Virtual Warehouse was already configured with waa. The CLI validation only allowed xsmall, small, medium, and large as valid values, causing updates to fail unnecessarily with the following error:
Invalid value for parameter tShirtSize, value: waa, valid values: xsmall, small, medium,
          large.

This issue is resolved by ensuring that CDP CLI updates to Virtual Warehouses already configured with waa as the t-shirt size are no longer rejected.

DWX-22609: HiveServer2 authentication fails with SSL handshake error on legacy LDAP servers
Previously, authentication to HiveServer2 (HS2) pods would fail with an SSL handshake error when connecting to an LDAP server that uses legacy TLS_RSA_* cipher suites. This occurred because a base image update introduced an upstream OpenJDK change that disabled these older cipher suites by default, breaking connectivity for environments still relying on them.
This issue is now resolved. An update to the java.security configuration reenables TLS_RSA_* cipher suites in the JVM, restoring backward compatibility with those LDAP servers.

Fixed issues in Cloudera Data Explorer (Hue) on Cloudera Data Warehouse on premises

CDPD-88276: Missing Data Explorer SAML logout landing page
Previously, Data Explorer did not include a dedicated landing page for SAML logout sessions. This resulted in a lack of clarity regarding the session status after you logged out. This issue is now resolved. Data Explorer now includes a SAML logout landing page that confirms your session ended and provides an option to log in again.
CDPD-81753: Missing configurable flag to optionally reenable data preview on database views in Data Explorer
Previously, data preview for database views was disabled by default due to resource strains caused by complex or long-running views, impacting data validation and analysis.
This issue is now resolved by adding the new allow_sample_data_from_views flag, with the default value set to false. Setting this flag to true enables Data Explorer to fetch sample data for database views and thus restore the data preview functionality in the SQL assist panel. You can enable the flag by performing the following steps:
  1. Go to Virtual warehouse > Data Explorer (Hue) > Configuration.
  2. In the hue_safety_valve field, specify the following parameter:
    [metastore]
    allow_sample_data_from_views=true
  3. Click Apply Changes.
DWX-12703: Uneven Impala coordinator workload distribution in active-active configurations
Previously, the Data Explorer client used only one coordinator when Cloudera Data Warehouse was configured in active-active mode. This occurred because Data Explorer did not include support for cookie-based sticky sessions. As a result, the system failed to distribute the workload across multiple coordinators, which impacted production stability and workload balancing.
This issue is now resolved. Data Explorer now supports cookie-based sticky sessions, which allows for effective workload distribution across all available coordinators in active-active Cloudera Data Warehouse environments.
CDPD-101746: Performance degradation when listing many files or folders in S3
Previously, the S3 file browser became slow or unresponsive when you listed a large number of files or folders. The directory listing process triggered redundant API requests for each file, causing significant delays. This issue is now resolved by optimizing the directory listing process to retrieve file information more efficiently.
CDPD-45130: Truncating excessive length queries to prevent database indexing errors
Previously, large or complex SQL statements, such as lengthy INSERT queries, were indexed by the Query Processor. This resulted in increased load times for the Job Browser. You can now configure query truncation by using the hue.query-processor.query.max-length property in the Query Processor configuration in the dasConf section. By default, no truncation is performed to ensure backward compatibility.
CDPD-90906: Database connection errors in metrics module
Previously, the metrics module encountered the cx_Oracle.DatabaseError: DPI-1010: disconnect error when the Cloudera Manager metrics API queried active users. This caused database connections to be lost. This issue is now fixed. The resolution improves connection stability and robustness against database disconnect errors.
CDPD-58142: Query pre-population fail in the SQL editor
Previously, when you clicked Re-Execute to rerun a query from the Job Browser > Queries > Query Details page, the query was not displayed in the SQL editor as expected. This issue occurred because the editor loaded multiple times, overwriting the query text. Additionally, the page change was not reflected in the URL. This issue is now fixed. The SQL editor now correctly populates and retains the query text when you navigate from the Job Browser.
CDPD-92857: Export all failures in secured multi-tenant clusters
Previously, the Export All feature in Data Explorer failed in secured multi-tenant clusters that used tenant colocation policies. This resulted in access errors during the export process. This issue is now resolved.
CDPD-100326: Data Explorer service authorization failure with Trino
Previously, when you used the Shared Data Explorer service with Trino, an authorization error occured. This happened because the service misconfigured the authentication settings and attempted to use an external ingress URL with incomplete credentials instead of the required in-cluster endpoint. This issue is now resolved.

Fixed issues in Hive on Cloudera Data Warehouse on premises

CDPD-104153: Hive CTAS cross join query failures
Previously, running a Hive Create Table As Select query that included a cross join failed with a transaction aborted exception in Cloudera Private Cloud Data Services. The transaction manager prematurely aborted the transaction during execution, which caused query compilation and heartbeat tasks to fail.
This issue is now resolved by ensuring that the transaction state is correctly maintained and monitored throughout cross join compilation and execution.
CDPD-98818: Incorrect column reference in subqueries throws misleading error
Previously, when a query contained a column reference within a subquery that did not exist, Hive threw a misleading error message stating that only one subquery expression is supported.
FAILED: SemanticException Line 0:-1 Unsupported SubQuery Expression 'col1': Only 1 SubQuery expression is supported.
This issue is now resolved. This is addressed by improving error handling to provide an accurate semantic exception when an invalid column reference is used in a subquery.
CDPD-95083: Concurrency issues when updating partition column statistics
Previously, the process for updating partition column statistics involved fetching, modifying, and then updating the statistics. This method caused concurrency issues when multiple clients attempted to update statistics for the same partition at the same time, potentially leading to inaccurate metadata.
This issue is now resolved.

Apache Jira: HIVE-29316

CDPD-95082: Direct SQL failures during partition operations cause stale database entries
Previously, if a failure occurred during direct SQL processing for operations, such as adding or dropping partitions, partial database modifications failed to roll back before the system fell back to Java Data Objects (JDO) processing. This leftover state led to data inconsistencies in the underlying database.
This issue is now resolved by setting a savepoint in the transaction before direct SQL processing begins. If a failure occurs, the system now rolls back to that savepoint before initiating the JDO fallback, ensuring database consistency.

Apache Jira: HIVE-26976

CDPD-94765: Slow performance for adding columns to tables with many partitions
Previously, adding a column to a table using the CASCADE command could be slow for tables containing a high number of partitions or columns due to unoptimized metadata processing.
This issue is now resolved by optimizing the underlying metadata operations and implementing batching. These changes improve performance and efficiency when you run the ALTER TABLE ADD COLUMN CASCADE command on large tables.

Apache Jira: HIVE-28956

CDPD-94764: Performance degradation when adding columns with the cascade option
Previously, after you enabled directSQL, performance degraded when adding columns to a table by using the CASCADE command.
This issue is resolved by implementing a mechanism to reuse new column descriptors between storage descriptors that share the same original column descriptor ID.

Apache Jira: HIVE-29042

CDPD-93175: Incorrect results for n-way joins containing both anti and outer joins
Previously, when you enabled Cost-Based Optimization (CBO) and n-way joins, Hive returned incorrect results for queries that combined anti-joins with outer joins.
This issue is resolved by extending the CommonJoinOperator to support the combination of anti-joins and outer joins within n-way joins.

Apache Jira: HIVE-29290

CDPD-93166: Incorrect results when an anti-join replaces an IS NULL filter on a nullable column
Previously, when you enabled the automatic conversion of joins with an IS NULL filter to anti-joins, Hive returned incorrect results for certain queries. Specifically, when the HiveAntiSemiJoinRule optimizer replaced an IS NULL filter on a nullable column with an anti-join, it generated an incorrect query plan. This resulted in missing rows in the final output compared to when the optimization was disabled.
This issue is resolved by improving the logic within the HiveAntiSemiJoinRule optimizer. The rule correctly handles nullable columns during the transformation process, ensuring accurate query plans and correct data retrieval regardless of whether anti-join conversion is active.

Apache Jira: HIVE-29176

CDPD-79799: High metadata overhead during migration due to table and partition statistics
Previously, during migration or replication, copying statistics for tables with a large number of partitions and columns took a significant amount of time.
This issue is now resolved by implementing changes in the Hive Metastore to support dropping statistics for tables, partitions, and columns.

Apache Jira: HIVE-28655

CDPD-55133: Query failure on views containing grouping sets and grouping functions
Previously, queries run against a view failed with a RuntimeException error if the view definition included GROUPING functions and GROUPING SETS.
This issue is resolved. Column names within the GROUPING function are correctly expanded to include their associated table and schema aliases during view creation. This allows the parser to successfully match the function arguments with the GROUP BY clause when you select data from the view.

Apache Jira: HIVE-27280

Fixed issues in Iceberg on Cloudera Data Warehouse on premises

CDPD-79241, CDPD-72383: COUNT(*) optimization returns incorrect results in UNION queries on Iceberg V2 tables
Previously, a faulty COUNT(*) query optimization caused incorrect results in UNION queries on Iceberg V2 tables that contained delete files, leading to inconsistent query outputs.

This issue is now fixed. The optimization logic ensures accurate results across UNION queries, including scenarios involving post data changes.

Apache Jira: IMPALA-13756 IMPALA-13249

CDPD-72845: Incorrect partition count for Iceberg tables
Previously, query profiles reported the number of partitions as 1 for Iceberg tables, which is misleading as it did not reflect the actual partition count.

This issue is now fixed. The reporting logic accurately counts and displays the number of scanned and total partitions in query profiles.

Apache Jira: IMPALA-13267

CDPD-88668: Iceberg partitioning is case sensitive for column names
Previously, Impala only accepted lowercase column names in partition specifications for Iceberg tables, causing CREATE TABLE and ALTER TABLE operations with uppercase column names to fail.

This issue is now fixed. Column names are handled properly in partition specifications, allowing partition operations to succeed regardless of case.

Apache Jira: IMPALA-14290

CDPD-100780, CDPD-97613: Materialized view rebuild fails for Iceberg tables
Previously, materialized view rebuild failed when the base table was an Iceberg table and the materialized view was implicitly created as an ACID table due to metastore.create.as.acid=true resulting in an error. During rebuild, no transaction started but an entry was created in the MATERIALIZATION_REBUILD_LOCKS table and not cleared, causing subsequent attempts to fail due to an existing lock entry.

This issues is now fixed by setting the metastore.create.as.acid property value to false.

This issue also occurs when the materialized view is an Iceberg table and is tracked in upstream as part of HIVE-29436

CDPD-101214: Semantic error in MERGE statements when using backticks with restricted keywords as column names
Previously, Hive failed to run MERGE statements when you use backticks with restricted keywords as column names. The query displayed a SemanticException error during parsing, preventing successful execution of the MERGE operation.

This issue is now fixed by avoiding using restricted keywords as column names, even when enclosed in backticks.

Fixed issues in Impala on Cloudera Data Warehouse on premises

CDPD-103485: Catalogd deadlock during startup
Previously, a deadlock condition occurred in catalogd during startup when the initial global metadata reset operation took place under a heavy workload. The thread performing the initial global reset periodically released the write lock to allow metadata operations on loaded databases to proceed.
This issue is now resolved by reversing the order of operations. The getOrLoadTable functionality now waits for the initial database metadata to load for the specified database before it acquires the version read lock.

Apache Jira: IMPALA-14949

CDPD-101743: Impala query execution failure due to missing SASL plugin
Previously, running queries in an Impala virtual warehouse through Data Explorer failed with a client connection negotiation error. The system could not find the PLAIN Simple Authentication and Security Layer plugin when establishing a client connection to the local loopback address on port 27000.
This issue is now fixed by ensuring the required authentication plugins are correctly initialized within the runtime environment.
CDPD-94500: Impala query failure when reading Parquet collections with late materialization
Previously, when an Impala query selected the last row containing a collection value in a row group, the readahead state was not reset. This caused subsequent query failures.
This issue is now resolved by resetting the readahead flag in the column reader whenever it advances to a new row group.

Apache Jira: IMPALA-14619

IMPALA-14605: Memory leak in global admissiond for cancelled queued queries
Previously, a memory leak occurred in the global admissiond when queries in the admission queue were cancelled due to backpressure. The system identified the cancellation but did not remove the query from the admission state map.
This issue is now fixed by introducing an asynchronous cleanup mechanism. Cancelled queued queries are now added to a queue for a background process to safely clear them from the admission state map.

Apache Jira: IMPALA-14605

CDPD-91155: CatalogD reports misleading TableNotFoundException for workload management tables
Previously, when you used workload management for the first time, CatalogD generated an org.apache.impala.catalog.TableNotFoundException error for the sys.impala_query_log and sys.impala_query_live tables. This occurred because the system attempted to refresh metadata for these tables to check their schema version before they were created.
This issue is now resolved by checking for the existence of the workload management tables before initiating a metadata reset. This change ensures that error messages are no longer generated during a successful initialization process.

Apache Jira: IMPALA-14468

IMPALA-14383: Crash when casting timestamp strings with timezone offsets to DATE
Previously, attempting to cast a timestamp string that included a timezone offset, such as "+08:00" in "2025-08-31 06:23:24.9392129 +08:00", to the DATE data type would cause a crash.
This issue is now resolved by adding a check to ensure that the timestamp string length does not exceed the maximum length of the default date-time format. Longer strings now use a lazily-created format, which prevents the crash.

Apache Jira: IMPALA-14383

IMPALA-14791: Impala crashes when viewing failed query plans
Previously, Impala crashed when you used the Web UI to view the plan of a query that failed before execution started.
This issue is now resolved by updating the function to handle missing execution summaries. This fix ensures that the Web UI remains stable even when query summaries are unavailable.

Apache Jira: IMPALA-14791

CDPD-99070: CVE-2025-15467 false positive alerts in Impala images
Previously, the Impala Docker image included the openssl-config package when built on top of Chainguard images. This package contained outdated signatures that triggered false positive alerts in CVE scanners, even though the package itself was effectively empty.
This issue is now resolved by removing the openssl-config package from the Impala Docker image build process. This removal eliminates the false positive scanner hits without affecting functionality, as the necessary basic configuration is provided by the openssl package.

CDPD-99070

IMPALA-14447: Parallel metadata loading in local catalog mode
Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala loaded the metadata for those tables one after another. This sequential process caused significant latency and performance regressions compared to the legacy catalog mode.
This issue is now resolved by parallelizing the table loading process. The fix allows Impala to load and gather metadata for multiple tables simultaneously. You can control the maximum number of threads used for this process by using the new max_stmt_metadata_loader_threads flag, which defaults to 8 threads per query compilation.

Apache Jira: IMPALA-14447

Fixed issues in Trino on Cloudera Data Warehouse on premises

DWX-22514: Trino entity lock held for more than 1 minute after manual Start/Update operation
Previously, after performing a manual operation such as Start or Update on a Trino Virtual Warehouse, the entity lease remained locked for an additional minute. If you attempted another manual operation during this window, you encountered an error, Compute entity is currently 'leased' by another internal operation. This occurred due to an intentional but excessive one-minute delay in the background scaling logic.

This issue is now resolved. Removing this unnecessary delay reduces the entity lock hold time immediately after manual operations complete.

DWX-23269: Creation of schema in Teradata through Trino fails with a syntax error
Previously, when attempting to execute DDL statements such as CREATE SCHEMA or DML statements such as INSERT or UPDATE against a Teradata connector in Trino, the operation failed with a JDBC_ERROR and the following message:
Syntax error, expected something like a 'METHOD' keyword between the 'CREATE' keyword and the 'SCHEMA' keyword.

This issue is resolved. The Trino Teradata connector supports read operations SELECT only. DDL operations CREATE, ALTER, and DROP and DML operations INSERT, UPDATE, and DELETE are not supported in the current version.

DWX-22607: Absence of a simplified method to create a Trino JMX metrics catalog
Previously, the interface lacked a direct method to initialize a JMX catalog. This prevented the use of SQL to query critical JMX metrics, such as file system caching data, which are essential for monitoring Trino performance.

This issue is now resolved. A configuration flow is now supported that allows for the creation of a JMX catalog through the federation connectors interface, enabling full SQL access to JMX metrics within the Virtual Warehouse.

DWX-22599: Unsupported HDFS connector shown in Trino connector list
Previously, the Trino connector list in the Cloudera Data Warehouse UI incorrectly included an unsupported HDFS connector. This connector does not exist and is non-functional.

This issue is now resolved and the invalid HDFS connector entry is removed from the UI.

DWX-22634: Lack of direct access and inconsistent terminology for Trino monitoring
Previously, the Trino Monitoring interface was not easily accessible within the Virtual Warehouse list. Furthermore, the interface lacked a consistent identity, as it was referred to as both Trino Monitoring and Trino Web UI in different parts of the application.

This issue is now resolved and all Trino Monitoring references are standardized to Trino Web UI.

DWX-22618: Unable to delete Trino configuration keys from the user interface
Previously, once a Trino configuration key-value pair was added through the UI, it could not be deleted without direct backend intervention due to API limitations.

This issue is now resolved. The UI now includes a Delete button on each row of the Trino configuration table, allowing you to remove individual configuration keys.