Behavior changes
This release of the Cloudera Data Warehouse service on Cloudera on cloud has the following behavior changes:
Summary: Changes to Unified Analytics availability in Cloudera Data Warehouse
Before this release: When creating a new Impala virtual warehouse, you could select the Enable Unified Analytics option.
After this release: The Enable Unified Analytics option is no longer available when creating a new Impala virtual warehouse. This option is disabled in the user interface. However, you can continue to create and manage Impala Unified Analytics Virtual Warehouses using the CDP CLI.
Summary: Increased Batch Sizes for COMPUTE STATS
Before this release: The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.
After this release: To resolve this, we enable the batch retrieval or insertion of the object metadata by default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.
Summary: Parquet late materialization behavior has changed
Parquet late materialization feature is enabled by default for all types including collections.
Before this release: Parquet late materialization feature
was disabled by default. You would use the
parquet_late_materialization_threshold query option to set the minimum
number of consecutive filtered rows required to trigger late materialization. The default
value was -1. The feature was not supported for collection columns.
After this release: Parquet late materialization feature is
enabled by default. The parquet_late_materialization_threshold is now set
to 1 if the query option is greater than or equal to 0 and there is a collection value that
can be skipped. Otherwise, the value is the same as the query option, which defaults to
20
Apache Jira: IMPALA-3841
Summary: TCP Keepalive is now enabled by default for client connections
Before this release: TCP keepalive was disabled by default
for client connections. Idle connections dropped by load balancers remained active in
Impala, consuming service threads (fe\_service\_threads).
After this release: TCP keepalive is now enabled by default for all client connections, enhancing stability and availability. Impala is configured to check idle connections aggressively, every 10 minutes.
JIRA Issue: IMPALA-14031
Summary: Support for load-based routing in impala-proxy
Before this release: The impala-proxy used a random selection policy to choose a coordinator. This approach did not consider the current load on each coordinator, which lead to an uneven distribution of connections and potential performance bottlenecks.
After this release: The impala-proxy now uses load-based routing to decide which coordinator should handle a new session request.
