What's new in Cloudera Data Warehouse on premises 1.5.5 SP3

Cloudera Data Warehouse on premises 1.5.5 Service Pack 3 introduces new features and improvements for Cloudera Data Warehouse, Cloudera Data Explorer (Hue), Hive, Iceberg, Impala, and Trino.

Cloudera Data Warehouse on premises

Configuration change detection
Cloudera Data Warehouse introduces a new configuration change detection feature to significantly enhance cluster stability, manageability, and security tracking. Administrators can now proactively monitor and identify unexpected or impactful adjustments across essential services. By alerting administrators to these modifications, the feature reduces the risk of errors, streamlines troubleshooting, and strengthens compliance and operational security. For more information, see Configuration change detection.
Automated Apache Ozone storage integration for Virtual Warehouses
Cloudera Data Warehouse introduces automated integration with Apache Ozone storage through the S3A protocol, eliminating the need for manual configuration on the Cloudera Data Warehouse. Credentials and endpoints are now managed centrally through the Cloudera Management Console UI and automatically propagated to all relevant Virtual Warehouse components and query engines, including Hive, Impala, and Trino. This enhancement simplifies storage administration and secures credential management through automated JCEKS and vault delivery. For more information, see Automated Apache Ozone storage integration.
Rollback for Virtual Warehouse upgrades
You can now roll back a failed Hive or Impala Virtual Warehouse upgrade to restore it to its last known working state. If an upgrade fails and the Backup Virtual Warehouse namespaces before an upgrade server setting is enabled in Advanced Configuration, a Rollback Upgrade option is displayed in the Virtual Warehouse actions menu. After you confirm the rollback, the Virtual Warehouse enters a RollingBack state and Cloudera Data Warehouse automatically reverts it to the pre-upgrade configuration.

This helps reduce downtime and manual intervention when Virtual Warehouse upgrades encounter issues, giving administrators a reliable recovery path directly from the Cloudera Data Warehouse UI. For more information, see Rolling back of Virtual Warehouses.

Enhanced JVM crash and heap dump collection for Virtual Warehouses
Previously, Cloudera Data Warehouse collected Out-Of-Memory (OOM) heap dumps for Hive and Impala, but lacked coverage for Java Virtual Machine (JVM) crash dumps and did not collect JVM diagnostic files for Trino workloads.

This release introduces a unified solution that standardizes and significantly expands this diagnostic coverage. It now automatically captures both OOM heap dumps and JVM crash dumps and is enabled across all Hive, Impala, and Trino Virtual Warehouses. Collected files are automatically forwarded to the environment's diagnostic storage location and included in generated diagnostic bundles. This ensures vital diagnostic information, such as stack traces, crash analysis, and memory state, is immediately available without manual pod inspection. Consequently, this reduces the troubleshooting time for Java Virtual Machine-related failures. For more information, see JVM crash and heap dump collection.

Migration from NGINX to Istio in Cloudera Data Warehouse
In Cloudera Data Services on premises 1.5.5 SP3 and higher releases, Cloudera Data Warehouse on the RKE platform transitions from the NGINX Ingress Controller to the Kubernetes Gateway API using an Istio gateway controller. This change currently only affects the RKE platform; OpenShift Container Platform (OCP) environments are not affected.

For new and upgraded workloads (Hive, Impala, or Trino) on 1.5.5 SP3, the ingress layer is managed by Istio, and these workloads utilize gateway resources (HTTPRoute and XListenerSet) instead of traditional ingress resources. Consequently, Cloudera Data Warehouse workloads from 1.5.5 SP2 or earlier releases, which were designed to work only with NGINX Ingress resources, cannot be deployed in these updated environments.

The NGINX Ingress controller continues to manage ingress traffic for existing workloads until they are upgraded. Upgrading the workload automatically upgrades the underlying networking stack to the Istio Gateway API as well. This change ensures long-term support, security, and better memory stability in data-service-heavy environments, as the ingress-nginx project has reached its end of support.

Removal of Unified Analytics and migration of stored Hive queries
The legacy Unified Analytics framework, including the Impala Virtual Warehouse implementation, is fully removed from Cloudera Data Warehouse. All remaining Unified Analytics components, configuration paths, and UI flows are cleaned up, and Virtual Warehouses automatically revert to the standard Impala virtual warehouse architecture.

As a result of this change, saved queries in Data Explorer originally created as a Hive snippet type under Unified Analytics will fail with a configuration error. New migration steps are available to update and resolve these stored queries. For more information, see Migrating stored Hive queries after Unified Analytics removal.

Renaming of Edit option to Details in the Cloudera Data Warehouse action menu
The Edit option in the action menu () for Environments, Virtual Warehouses, and Database Catalogs is renamed to Details. This change is a UI label update and has no functional impact.
Cloudera Data Visualization updated to version 8.1.2

The Cloudera Data Visualization component bundled with Cloudera Data Warehouse on premises is updated to version 8.1.2-b41. This update brings the latest improvements, bug fixes, and enhancements from the Cloudera Data Visualization 8.1.2 release to Cloudera Data Warehouse.

What's new in Cloudera Data Explorer (Hue) on Cloudera Data Warehouse on premises

Product Branding Update
The product component previously known as Data Explorer is now renamed to Cloudera Data Explorer (Hue). This change reflects UI and an updated branding initiative and is rolled out in phases.
As part of this release, you can notice:
  • A new logo displayed in the UI
  • The service name updated to Data Explorer in the UI
  • The new product name reflected in documentation
Some UI references might still display the previous name as the branding update is completed incrementally in future releases. This change has no functional impact. All existing configurations, workflows, and integrations continue to work as before.
Fact supports in SQL AI Assistant
You can now define custom system instructions to guide the SQL AI Assistant in generating more accurate queries based on your specific business logic. This enhancement supports complex, cross-database workflows by allowing you to persist organizational context in the Assistant settings.

For more information, see Facts support for SQL query.

Data Explorer support for the boto3 SDK
Data Explorer now supports the boto3 SDK for accessing AWS S3. This update replaces the legacy connector framework to provide improved performance and compatibility with AWS services.
To ensure a smooth transition, the system automatically converts your existing configurations to the new connector system. This feature is enabled by default, but you can manually disable the feature flag if necessary.
For more information, see Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse with RAZ and Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse without RAZ.

What's new in Hive on Cloudera Data Warehouse on premises

Small file warnings in console
The MSCK and ANALYZE commands now display a warning in the console if the average file size for a table or partition is below the threshold. This helps you identify small files that might affect performance.

For more information, see Statistics generation and viewing commands in Cloudera Data Warehouse.

Performance improvement for column changes
The ALTER CHANGE COLUMN command is now faster for tables that have many partitions. This change prevents the command from performing a separate Metastore service call to update column statistics for every partition, which previously caused long execution times and timeouts. For large partitioned tables, the execution time is reduced from hours to minutes.

Apache Jira: HIVE-28346

What's new in Iceberg on Cloudera Data Warehouse on premises

Table repair feature support for Iceberg tables
Impala introduces the repair_metadata() function for Iceberg tables. This function provides a self-service recovery path to recover Iceberg tables that are inaccessible due to missing data files after manual file deletions in the underlying storage. For more information, see Table repair feature.
Support for SHOW FILES IN table PARTITION for Iceberg
Impala now supports the SHOW FILES IN command with the PARTITION clause to list data files for specific partitions in Iceberg tables. This enhancement extends metadata capabilities by enabling inspection of partition-level physical data directly from Impala. For more information, see Describe table metadata feature.
Support for additional partition transform functions for Iceberg tables
Iceberg now supports additional partition transform functions such as BUCKET, TRUNCATE, IDENTITY, and VOID. These transformations extend partitioning capabilities by enabling hashing, value truncation, direct partitioning, and handling of null partitions. For more information, see Partition transform feature.
Support for partition columns in WHERE clause predicates
Hive Iceberg compaction now supports WHERE clause predicates on partition columns. This enhancement allows you to selectively compact data by filtering partition columns, improving efficiency and control over compaction operations. For more information, see Data compaction.

What's new in Impala on Cloudera Data Warehouse on premises

Caching intermediate query results
Impala now supports caching intermediate results to improve query performance and resource efficiency for repetitive workloads. By storing results at various locations within the SQL plan tree, the system can reuse computation for similar queries even when they are not identical, provided the underlying data and settings remain unchanged. For more information and instructions on enabling this feature, see Caching intermediate results.
User role management
You can now grant and revoke roles directly to and from individual users in Impala, providing more granular control over security management. This feature includes support for the GRANT ROLE, REVOKE ROLE, and SHOW ROLE GRANT USER statements, aligning Impala with Apache Hive's role-related functionality.

For more information, see impala role, impala grant role, impala show roles and impala revoke role

Apache Jira: IMPALA-14085.

Native geospatial query acceleration
Cloudera Data Warehouse now introduces native implementations for specific geospatial functions to accelerate simple queries. This feature reduces processing overhead by avoiding transitions to the Java Virtual Machine and optimizing file-level filtering for Parquet and Iceberg tables. For more information, see Impala Geospatial query acceleration.
OpenTelemetry integration for Impala
Cloudera Data Warehouse now provides OpenTelemetry (OTel) support to help you monitor query performance and troubleshoot issues. This new feature, collects and exports query telemetry data as OpenTelemetry traces to a central OpenTelemetry compatible collector. The integration is designed to have a minimal impact on performance because it uses data already being collected and handles the export in a separate process. For more information, see OpenTelemetry support for Impala.

Apache Jira: IMPALA-13234

Filtering SHOW PARTITIONS output
You can now use the WHERE clause with the SHOW PARTITIONS statement to filter results based on partition column values. This enhancement helps you manage tables with a large number of partitions by narrowing down the output using comparison operators, IN lists, BETWEEN clauses, IS NULL predicates, and logical expressions. For more information, see SHOW PARTITIONS statement.

Apache Jira: IMPALA-14065

Parallelizing JDBC External Table queries
You can now run queries on JDBC tables simultaneously to improve performance for joins and aggregations. Impala now estimates the number of rows in a JDBC table by running a COUNT query during query preparation. This estimation allows the planner to assign multiple scanner threads, introduce exchange nodes, and produce more efficient join orders. You can also use the --min_jdbc_scan_cardinality backend flag to set a lower bound for these estimates. For more information, see Parallelizing JDBC External Table queries.
Recreating tables with statistics
You can use the WITH STATS clause in the SHOW CREATE TABLE statement to generate the SQL required to recreate a table along with its column statistics and partition metadata. See, SHOW CREATE TABLE WITH STATS statement.

Apache Jira: IMPALA-13066

Quoting reserved words in column names
You can now explicitly quote all column names projected in SQL queries generated for JDBC external tables. Column names are wrapped with the following quote characters based on the JDBC driver being used:
  • Backticks (`) for Cloudera Runtime Hive, Impala, and MySQL
  • Double quotes (") for all other databases
This supports the use of case-sensitive or reserved column names. For more information, see Quoting reserved words in column names.

Apache Jira: IMPALA-13066

New catalogd flag to disable HMS synchronization by default
You can now use the disable_hms_sync_by_default catalogd startup flag to set a global default for the impala.disableHmsSync property. This feature allows you to skip event processing for all databases and tables by default while opting in specific elements as needed.

For more information, see: Catalogd Daemon startup flag.

Apache Jira: IMPALA-14131

Specifying compression levels for LZ4, ZLIB, and ZSTD
You can now specify compression levels for the LZ4, ZLIB, GZIP, and ZSTD codecs to achieve higher compression ratios. This includes support for high compression modes in LZ4 (levels 3–12) and negative compression levels for ZSTD. These levels are supported by using the compression_codec query option.

For more information, see compression_codec query option.

Apache Jira: IMPALA-10630, IMPALA-14082

Batch processing for reload events
Catalogd now supports batch processing of RELOAD events on the same table. This enhancement allows you to load partitions simultaneously and reduces duplicate reloads. By minimizing the number of times a table lock is acquired and reducing table version changes, this feature improves the performance of coordinators in local-catalog mode and reduces query planning retries.

Apache Jira: IMPALA-14082

Consolidated event processing for partition changes
Catalogd now supports the ALTER_PARTITIONS event type, which consolidates multiple partition changes into a single event. By processing one batch event instead of numerous individual ALTER_PARTITION events, the event processor can synchronize metadata more quickly and reduce the processing load on the Catalogd cache.

Apache Jira: IMPALA-13593

What's new in Trino on Cloudera Data Warehouse on premises

Improved Trino connector UI labels
The Trino connector UI is updated. Connectors previously labeled as Optimized are now labeled as Certified across the UI. On the connector list page, the Data source column label is replaced with Catalog Name.
New federation connectors
Introduced Hive Engine and Impala Engine to the federation connectors list, providing dedicated JDBC-based connectivity options. For more information, see Trino federation connector.
Teradata general availability
The Trino-Teradata connector, previously available as a technical preview, is now generally available in Cloudera Data Warehouse. The connector supports read-only SELECT operations on Teradata sources and operates in ANSI Mode.
Trino runtime upgraded
The Trino runtime is upgraded from version 476 to 479. This upgrade introduces stability improvements and performance optimizations.