Fixed issues in 7.1.9 SP1 CHF 11

Know more about the cumulative hotfix 11 for 7.1.9 SP1.

Following are the list of fixes that were shipped for CDP Private Cloud Base version 7.1.9-1.cdh7.1.9.p1059.70393529.

CDPD-73189: Not all tables are loading in the left assist panel
Previously, in the Hue left assist panel, only the initial 5,000 tables were loaded because the default value of max_catalog_sql_entries was set to 5,000. Increasing this configuration in Hue Safety Valve only allowed loading up to 10,000 tables due to the max_rows parameter being hardcoded to 10,000.
This issue is now resolved by fixing the hardcoded limitation. You can now load more than 10,000 tables in the left assist panel.
CDPD-84277: Query Processor cleanup failures on MySQL due to DELETE syntax errors
Previously, the cleanup DELETE operations in Query Processor were failing on MySQL versions 8.0.25 to 10.3.32 due to the unsupported use of the USING clause in DELETE statements. The legacy SQL syntax DELETE or USING caused syntax errors and prevented proper execution of cleanup tasks. This issue is now resolved by replacing the USING clause with proper JOIN syntax in the DELETE queries.
CDPD-81753: Added a configurable flag to optionally reenable data preview on database views in Hue
Previously, in the Hue 7.1.9 SP1 CHF4 release, data preview for database views was disabled by default due to resource strains caused by complex or long-running views, impacting data validation and analysis.

This issue has now been improved by adding the new allow_sample_data_from_views flag, with the default value set to false. Setting this flag to true enables Hue to fetch sample data for database views and thus restore the data preview functionality in the SQL assist panel and Table browser without affecting default system performance. You can enable the flag by performing the following steps:

  1. Navigate to Cloudera Manager > Clusters > Hue service > Configuration.
  2. In the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini field, specify the following parameter:
    [metastore]
    allow_sample_data_from_views=true
    
  3. Click Save Changes.
CDPD-55789: Memory leak in NSSummary in Recon
Previously, the memory in the NSSummary map used to leak when multiple files and directories were deleted and some orphan directory links were unlinked from the NSSummary tree, but, the orphan links could remain in the map forever. This fix cleans up the entries including the deleted files and directories and also the orphan directory links from both the NSSummary map and the NSSummary tree after the directories and files are completely deleted from Ozone, including the deletedDirTable table.

Apache Jira: HDDS-8565

CDPD-87864: Retrigger of Recon NSSummary tree build is not controlled with central flag
Previously, the retriggering of the Recon NSSummary tree was called from multiple places in the source code and was not controlled using a common flag. This issue is now fixed by controling the retriggering using a central flag.

Apache Jira: HDDS-13443

CDPD-89346/CDPD-89159: Enhanced join strategy selection for large clusters
The query planner cost model for broadcast joins can be skewed by the number of nodes in a cluster. Previously, this could lead to suboptimal join strategy choices, especially in large clusters with skewed data where a partitioned join was chosen over a more efficient broadcast join.
This issue is now resolved by introducing the broadcast_cost_scale_factor query option as an additional tuning option besides query hint to override query planner decisions.

Apache Jira: IMPALA-14263

Stable catalog cache after INVALIDATE METADATA
Previously, the INVALIDATE METADATA <table> command on tables created outside Impala caused tables to be temporarily dropped from the catalog cache.
This issue is now resolved by ensuring that the INVALIDATE METADATA command fetches the latest valid event ID from the Hive Metastore and assigns it to the table.

Apache Jira: IMPALA-12712

CDPD-88652: Catalogd HA Warm Failover
In a Catalog high availability (HA) setup, a failover cause service delays because the standby Catalogd has a cold cache so need to load the metadata again which takes time. This is particularly noticeable when metadata of lots of the tables have been loaded in the active Catalogd.
This issue is now resolved by allowing the standby Catalogd to keep its metadata cache ready, including a new method, added in IMPALA-14074, to pre-load metadata of critical tables and keep the tables up-to-date. As a result, when a failover happens, the new active Catalogd can serve requests on critical tables right away without any delays.
The fix includes the following JIRA changes:
    • IMPALA-14074 – Adds the following flags to enable metadata cache warmup in Catalogd for critical tables, ensuring that the standby is ready for failover:
      • --warmup_tables_config_file – Specifies a configuration file with the tables whose metadata is to be preloaded. The file can be on local storage or remote storage, such as HDFS. Each line can list a fully qualified table or a wildcard for all tables under a database, for example, tpch.*. Tables are loaded in the order listed in the file.
        Example:
        --warmup_tables_config_file=file:///opt/impala/warmup_table_list.txt
        --warmup_tables_config_file=hdfs:///tmp/warmup_table_list.txt
      • --keeps_warmup_tables_loaded – Controls whether the listed tables are automatically reloaded after invalidation. By default, this is set to false.
    • --catalogd_ha_reset_metadata_on_failover – Must be set to false to ensure that the standby Catalogd maintains a warm metadata cache for a faster switch during failover.
  1. IMPALA-14227 – Adds a waiting time during HA failover to ensure the new active Catalogd applies all the pending HMS events before serving requests.
  2. IMPALA-12876 – Exposes the Catalog v ersion and loaded timestamps in query profiles to help debug stale metadata issues.

Apache Jira: IMPALA-14074, IMPALA-14227, IMPALA-12876

New configuration to disable block location retrieval
Impala is designed to optimize reads by retrieving file block locations from storage systems like HDFS and Ozone, and scheduling read operations on the same nodes where the data resides. In on-premise environments, Impala nodes are typically colocated with storage nodes. However, in scenarios where they are not colocated, retrieving the block location information is a resource-intensive operation.
This issue is now fixed by introducing new Hadoop configuration properties that allow you to disable the retrieval of block location information during table loading. Disabling the retrieval of block location information can be configured in the following ways:
  • Globally, using the Hadoop Safety Valve for core-site.xml advanced configuration snippet.
    'impala.preload-block-locations-for-scheduling': 'false'
  • By using a Filesystem scheme to disable it for a specific type of storage.
    'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false'
  • By authority, to disable block location retrieval for a specific endpoint when multiple storage systems are configured with the same scheme.
    'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false'

    This disables block location retrieval only for URIs as hdfs://mycluster/warehouse/tablespace/....

Apache Jira: IMPALA-14138

Fix performance regression on Catalog operations when Catalogd HA is enabled
Previously. when Catalogd HA was enabled, performance regression occurred because a lock contention issue blocked new requests. This was caused by an additional check in each request that unnecessarily acquired an exclusive lock, impacting concurrent performance.
This issue is now resolved by optimizing the check to not depend on the exclusive lock. This change improves the concurrent performance of Catalogd in HA environments.

Apache Jira: IMPALA-14220

CDPD-87722: Merge/Update jobs fail with vertex errors after upgrading to 7.1.9
In CDP 7.1.9, improvements for update and delete operations were introduced to write data directly to table locations instead of using a staging directory. However, after upgrading to CDP 7.1.9, merge, update, and delete queries might occasionally fail with the following error:
java.lang.RuntimeException: Hive Runtime Error while closing operators: Index 17 out of bounds for length 17
            at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
            at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
            at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) 
            
            .....
            
            Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators: Index 17 out of bounds for length 17
            at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:407)
            
            .....
            
            Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17
            at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:258)
            at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1483)
            at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:731)
            at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:755)
            at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
          
This issue is now fixed.
CDPD-88850/CDPD-89412: Compaction cleaner prematurely cleans up deltas
Previously, queries that ran for a long time failed after a compaction because the compaction cleaner prematurely removed delta files. This was a regression introduced by a change in HIVE-23107, which removed a mechanism used to prevent the deletion of files still needed by running queries.
This issue is now resolved by ensuring the compaction cleaner waits for all previous transactions to commit before it begins cleaning delta files.

Apache Jira: HIVE-24291, HIVE-23107

CDPD-88851: Backward compatibility for Hive Metastore schema
An update from HIVE-23107 caused a problem when you used different versions of Hive Metastore services in the same environment. This was because the newer version removed a database table called min_history_level that was required for older versions to function correctly.
This issue is now resolved by a new change that keeps the min_history_level table.

Apache Jira: HIVE-24403

CDPD-88487: Queries with subqueries failing compilation
After upgrading to CDP 7.1.9 SP1 CHF10, some queries that included subqueries failed during compilation. The queries failed with the UnsupportedOperationException error.
This issue is now resolved by correcting the compilation failure that occurred in the SharedWorkOptimizer.
CDPD-89289: Backport RATIS-1884 to 7.1.9 SP1
Previously, a regression occurred in 7.1.9 SP1 CHF 9 and 7.1.9 SP1 CHF 10 where the Ozone Manager log listed the RUNNING retry cache entry of leader should be pending Ratis error message.
This issue is now resolved.

Apache Jira: RATIS-1884

CDPD-23034: Port COMPX-17261 to 7.1.9.1000
The yarn.nodemanager.disk-health-checker.min-free-space-per-disk-watermark-high-mb new optional configuration value is now created. This value refers to the minimum space in megabytes that must be available on a bad disk for it to be marked as good. This value must not be less than the yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb value. If it is less than the yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb value or not set, the same value is used as the yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb value. This setting applies to both the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs directories.
CDPD-22902: 7.1.9.1000: RM UI2 cannot show logs for non-MapReduce jobs with MR ACLs enabled
Previously, container log ACL checks failed for non-MapReduce jobs.
This issue is now resolved by applying the MapReduce ACLs only if the container belongs to a MapReduce job