What's new in Cloudera Data Warehouse on premises 1.5.5 SP3
Cloudera Data Warehouse on premises 1.5.5 Service Pack 3 introduces new features and improvements for Cloudera Data Warehouse, Cloudera Data Explorer (Hue), Hive, Iceberg, Impala, and Trino.
Cloudera Data Warehouse on premises
- Configuration change detection
- Cloudera Data Warehouse introduces a new configuration change detection feature to significantly enhance cluster stability, manageability, and security tracking. Administrators can now proactively monitor and identify unexpected or impactful adjustments across essential services. By alerting administrators to these modifications, the feature reduces the risk of errors, streamlines troubleshooting, and strengthens compliance and operational security. For more information, see Configuration change detection.
- Automated Apache Ozone storage integration for Virtual Warehouses
- Cloudera Data Warehouse introduces automated integration with Apache Ozone storage through the S3A protocol, eliminating the need for manual configuration on the Cloudera Data Warehouse. Credentials and endpoints are now managed centrally through the Cloudera Management Console UI and automatically propagated to all relevant Virtual Warehouse components and query engines, including Hive, Impala, and Trino. This enhancement simplifies storage administration and secures credential management through automated JCEKS and vault delivery. For more information, see Automated Apache Ozone storage integration.
- Rollback for Virtual Warehouse upgrades
- You can now roll back a failed Hive or Impala Virtual Warehouse upgrade to restore it to
its last known working state. If an upgrade fails and the Backup Virtual Warehouse
namespaces before an upgrade server setting is enabled in Advanced
Configuration, a Rollback Upgrade option is displayed in
the Virtual Warehouse actions menu. After you confirm the rollback, the Virtual Warehouse
enters a RollingBack state and Cloudera Data Warehouse
automatically reverts it to the pre-upgrade configuration.
This helps reduce downtime and manual intervention when Virtual Warehouse upgrades encounter issues, giving administrators a reliable recovery path directly from the Cloudera Data Warehouse UI. For more information, see Rolling back of Virtual Warehouses.
- Enhanced JVM crash and heap dump collection for Virtual Warehouses
- Previously, Cloudera Data Warehouse collected Out-Of-Memory (OOM) heap dumps
for Hive and Impala, but lacked coverage for Java Virtual Machine (JVM) crash dumps and did
not collect JVM diagnostic files for Trino workloads.
This release introduces a unified solution that standardizes and significantly expands this diagnostic coverage. It now automatically captures both OOM heap dumps and JVM crash dumps and is enabled across all Hive, Impala, and Trino Virtual Warehouses. Collected files are automatically forwarded to the environment's diagnostic storage location and included in generated diagnostic bundles. This ensures vital diagnostic information, such as stack traces, crash analysis, and memory state, is immediately available without manual pod inspection. Consequently, this reduces the troubleshooting time for Java Virtual Machine-related failures. For more information, see JVM crash and heap dump collection.
- Migration from NGINX to Istio in Cloudera Data Warehouse
- In Cloudera Data Services on premises 1.5.5 SP3 and higher releases, Cloudera Data Warehouse on the RKE
platform transitions from the NGINX Ingress Controller to the Kubernetes Gateway API using an
Istio gateway controller. This change currently only affects the RKE platform; OpenShift
Container Platform (OCP) environments are not affected.
For new and upgraded workloads (Hive, Impala, or Trino) on 1.5.5 SP3, the ingress layer is managed by Istio, and these workloads utilize gateway resources (
HTTPRouteandXListenerSet) instead of traditional ingress resources. Consequently, Cloudera Data Warehouse workloads from 1.5.5 SP2 or earlier releases, which were designed to work only with NGINX Ingress resources, cannot be deployed in these updated environments.The NGINX Ingress controller continues to manage ingress traffic for existing workloads until they are upgraded. Upgrading the workload automatically upgrades the underlying networking stack to the Istio Gateway API as well. This change ensures long-term support, security, and better memory stability in data-service-heavy environments, as the ingress-nginx project has reached its end of support.
- Removal of Unified Analytics and migration of stored Hive queries
- The legacy Unified Analytics framework, including the Impala Virtual Warehouse
implementation, is fully removed from Cloudera Data Warehouse. All remaining
Unified Analytics components, configuration paths, and UI flows are cleaned up, and Virtual
Warehouses automatically revert to the standard Impala virtual warehouse architecture.
As a result of this change, saved queries in Data Explorer originally created as a Hive snippet type under Unified Analytics will fail with a configuration error. New migration steps are available to update and resolve these stored queries. For more information, see Migrating stored Hive queries after Unified Analytics removal.
- Renaming of Edit option to Details in the Cloudera Data Warehouse action menu
- The Edit option in the action menu (
) for
Environments, Virtual Warehouses, and
Database Catalogs is renamed to Details. This
change is a UI label update and has no functional impact.
- Cloudera Data Visualization updated to version 8.1.2
-
The Cloudera Data Visualization component bundled with Cloudera Data Warehouse on premises is updated to version 8.1.2-b41. This update brings the latest improvements, bug fixes, and enhancements from the Cloudera Data Visualization 8.1.2 release to Cloudera Data Warehouse.
What's new in Cloudera Data Explorer (Hue) on Cloudera Data Warehouse on premises
- Product Branding Update
- The product component previously known as Data Explorer is now renamed to Cloudera Data Explorer (Hue). This change reflects UI and an updated branding initiative and is rolled out in phases.
- Fact supports in SQL AI Assistant
- You can now define custom system instructions to guide the SQL AI Assistant in generating
more accurate queries based on your specific business logic. This enhancement supports
complex, cross-database workflows by allowing you to persist organizational context in the
Assistant settings.
For more information, see Facts support for SQL query.
- Data Explorer support for the boto3 SDK
- Data Explorer now supports the boto3 SDK for accessing AWS S3. This update replaces the legacy connector framework to provide improved performance and compatibility with AWS services.
What's new in Hive on Cloudera Data Warehouse on premises
- Small file warnings in console
- The MSCK and ANALYZE commands now display a warning
in the console if the average file size for a table or partition is below the threshold. This
helps you identify small files that might affect performance.
For more information, see Statistics generation and viewing commands in Cloudera Data Warehouse.
- Performance improvement for column changes
- The ALTER CHANGE COLUMN command is now faster for tables that have many
partitions. This change prevents the command from performing a separate Metastore service call
to update column statistics for every partition, which previously caused long execution times
and timeouts. For large partitioned tables, the execution time is reduced from hours to
minutes.
Apache Jira: HIVE-28346
What's new in Iceberg on Cloudera Data Warehouse on premises
- Table repair feature support for Iceberg tables
- Impala introduces the
repair_metadata()function for Iceberg tables. This function provides a self-service recovery path to recover Iceberg tables that are inaccessible due to missing data files after manual file deletions in the underlying storage. For more information, see Table repair feature. - Support for
SHOW FILES INtablePARTITIONfor Iceberg - Impala now supports the
SHOW FILES INcommand with thePARTITIONclause to list data files for specific partitions in Iceberg tables. This enhancement extends metadata capabilities by enabling inspection of partition-level physical data directly from Impala. For more information, see Describe table metadata feature. - Support for additional partition transform functions for Iceberg tables
- Iceberg now supports additional partition transform functions such as BUCKET, TRUNCATE, IDENTITY, and VOID. These transformations extend partitioning capabilities by enabling hashing, value truncation, direct partitioning, and handling of null partitions. For more information, see Partition transform feature.
- Support for partition columns in WHERE clause predicates
- Hive Iceberg compaction now supports WHERE clause predicates on partition columns. This enhancement allows you to selectively compact data by filtering partition columns, improving efficiency and control over compaction operations. For more information, see Data compaction.
What's new in Impala on Cloudera Data Warehouse on premises
- Caching intermediate query results
- Impala now supports caching intermediate results to improve query performance and resource efficiency for repetitive workloads. By storing results at various locations within the SQL plan tree, the system can reuse computation for similar queries even when they are not identical, provided the underlying data and settings remain unchanged. For more information and instructions on enabling this feature, see Caching intermediate results.
- User role management
- You can now grant and revoke roles directly to and from individual users in Impala,
providing more granular control over security management. This feature includes support for
the GRANT ROLE, REVOKE ROLE, and SHOW ROLE
GRANT USER statements, aligning Impala with Apache Hive's role-related
functionality.
For more information, see impala role, impala grant role, impala show roles and impala revoke role
Apache Jira: IMPALA-14085.
- Native geospatial query acceleration
- Cloudera Data Warehouse now introduces native implementations for specific geospatial functions to accelerate simple queries. This feature reduces processing overhead by avoiding transitions to the Java Virtual Machine and optimizing file-level filtering for Parquet and Iceberg tables. For more information, see Impala Geospatial query acceleration.
- OpenTelemetry integration for Impala
- Cloudera Data Warehouse now provides OpenTelemetry (OTel) support to help you
monitor query performance and troubleshoot issues. This new feature, collects and exports
query telemetry data as OpenTelemetry traces to a central OpenTelemetry compatible collector.
The integration is designed to have a minimal impact on performance because it uses data
already being collected and handles the export in a separate process. For more information,
see OpenTelemetry support for Impala.
Apache Jira: IMPALA-13234
- Filtering SHOW PARTITIONS output
- You can now use the
WHEREclause with the SHOW PARTITIONS statement to filter results based on partition column values. This enhancement helps you manage tables with a large number of partitions by narrowing down the output using comparison operators,INlists,BETWEENclauses,IS NULLpredicates, and logical expressions. For more information, see SHOW PARTITIONS statement.Apache Jira: IMPALA-14065
- Parallelizing JDBC External Table queries
- You can now run queries on JDBC tables simultaneously to improve performance for joins and
aggregations. Impala now estimates the number of rows in a JDBC table by running a
COUNTquery during query preparation. This estimation allows the planner to assign multiple scanner threads, introduce exchange nodes, and produce more efficient join orders. You can also use the --min_jdbc_scan_cardinality backend flag to set a lower bound for these estimates. For more information, see Parallelizing JDBC External Table queries. - Recreating tables with statistics
- You can use the
WITH STATSclause in the SHOW CREATE TABLE statement to generate the SQL required to recreate a table along with its column statistics and partition metadata. See, SHOW CREATE TABLE WITH STATS statement.Apache Jira: IMPALA-13066
- Quoting reserved words in column names
- You can now explicitly quote all column names projected in SQL queries generated for JDBC
external tables. Column names are wrapped with the following quote characters based on the
JDBC driver being used:
- Backticks (`) for Cloudera Runtime Hive, Impala, and MySQL
- Double quotes (") for all other databases
Apache Jira: IMPALA-13066
- New catalogd flag to disable HMS synchronization by default
- You can now use the disable_hms_sync_by_default
catalogd startup flag to set a global default for the
impala.disableHmsSync property. This feature allows you to skip event
processing for all databases and tables by default while opting in specific elements as
needed.
For more information, see: Catalogd Daemon startup flag.
Apache Jira: IMPALA-14131
- Specifying compression levels for LZ4, ZLIB, and ZSTD
- You can now specify compression levels for the LZ4, ZLIB, GZIP, and ZSTD codecs to achieve
higher compression ratios. This includes support for high compression modes in LZ4 (levels
3–12) and negative compression levels for ZSTD. These levels are supported by using the
compression_codec query option.
For more information, see compression_codec query option.
Apache Jira: IMPALA-10630, IMPALA-14082
- Batch processing for reload events
- Catalogd now supports batch processing of
RELOADevents on the same table. This enhancement allows you to load partitions simultaneously and reduces duplicate reloads. By minimizing the number of times a table lock is acquired and reducing table version changes, this feature improves the performance of coordinators in local-catalog mode and reduces query planning retries.Apache Jira: IMPALA-14082
- Consolidated event processing for partition changes
- Catalogd now supports the
ALTER_PARTITIONSevent type, which consolidates multiple partition changes into a single event. By processing one batch event instead of numerous individualALTER_PARTITIONevents, the event processor can synchronize metadata more quickly and reduce the processing load on the Catalogd cache.Apache Jira: IMPALA-13593
What's new in Trino on Cloudera Data Warehouse on premises
- Improved Trino connector UI labels
- The Trino connector UI is updated. Connectors previously labeled as Optimized are now labeled as Certified across the UI. On the connector list page, the Data source column label is replaced with Catalog Name.
- New federation connectors
- Introduced Hive Engine and Impala Engine to the federation connectors list, providing dedicated JDBC-based connectivity options. For more information, see Trino federation connector.
- Teradata general availability
- The Trino-Teradata connector, previously available as a technical preview, is now generally available in Cloudera Data Warehouse. The connector supports read-only SELECT operations on Teradata sources and operates in ANSI Mode.
- Trino runtime upgraded
- The Trino runtime is upgraded from version 476 to 479. This upgrade introduces stability improvements and performance optimizations.
