Fixed Issues in YARN and YARN Queue Manager

Fixed issues and resolved maintenance items for YARN and YARN Queue Manager are addressed in Cloudera Runtime 7.3.2 and its associated service packs.

Cloudera Runtime 7.3.2

Cloudera Runtime 7.3.2 resolves YARN and YARN Queue Manager issues and incorporates fixes from the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.700. For a comprehensive record of all fixes in Cloudera Runtime 7.3.1.x, see Fixed Issues.

CDPD-49702: Error in NodeManager when executing /var/lib/yarn-ce/bin/container-executor
7.3.2
Previously, a job failure occurred because NodeManager returned a No such file or directory error when attempting to run the /var/lib/yarn-ce/bin/container-executor program. This issue is now resolved and NodeManager is now marked as unhealthy and shut down if it is unable to run the program.
Apache Jira: YARN-11709
COMPX-13401: CapacityScheduler UI queue filter does not work as expected when submitting applications with leaf queue's name
7.3.2
Previously, when submitting applications to YARN using only a leaf queue name, for example, default or custom, instead of the full queue path, for example, root.default, the ResourceManager (RM) and CapacityScheduler UI inconsistently displayed or filtered applications. This led to confusion, as the same queue could be displayed under different names, and applications were not visible under the expected queue filter in the UI.

This issue is now resolved and RM now returns the full queue path regardless of whether the application was submitted with a leaf queue name or a full queue path.

Apache Jira: YARN-11538
COMPX-14637: Missing permissions on NodeManagers local directories on startup
7.3.2

Previously, the NodeManager created its required local directories on startup with the correct 755 permissions only if they did not already exist.

If an administrator created these directories with incorrect permissions, or if the permissions were altered after the NodeManager started, the NodeManager failed to reset them. This lack of permission enforcement caused container failures. This issue is now resolved.

Apache Jira: YARN-11703
COMPX-17261: NodeManager disk health status oscillation with MB-based limits
7.3.2
Previously, the NodeManager disk health status could oscillate rapidly between good and full states when using the yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb property because it relied on a single threshold. This issue is now fixed. To prevent this oscillation, the yarn.nodemanager.disk-health-checker.min-free-space-per-disk-watermark-high-mb optional configuration property is now available. This setting allows administrators to specify the minimum free space in megabytes required for a previously bad disk to be marked as good again. If this value is not set or is lower than the YARN.nodemanager.disk-health-checker.min-free-space-per-disk-mb value, it defaults to the same value as the existing minimum free space configuration. This update applies to both YARN.nodemanager.local-dirs and YARN.nodemanager.log-dirs directories, providing more granular control over disk health checks.
Apache Jira: YARN-9914
COMPX-18004: Metrics missing in the RM UI2
7.3.2
The Replication Manager (RM) UI2 did not display metrics with the same granularity as RM UI (UI1). This made analyzing and debugging the scheduler behavior difficult, often requiring the retrieval of information from the UI. This issue is now resolved and the missing metrics are now available in RM UI2.
Apache Jira: YARN-11755
COMPX-18545: Setting maximum-application-lifetime using AQCv2 templates does not apply on the first submitted application
7.3.2
Setting the maximum-application-lifetime property using the AQC v2 templates did not apply to the first submitted application but was applied to the subsequent ones. This issue is now resolved.
Apache Jira: YARN-11708
COMPX-18909: NodeManager marked as unhealthy if an application is terminated
7.3.2
By design, Node Managers are marked unhealthy if an unrecoverable configuration error occurrs, that is, the container-executor script is missing. Previously, a falase positive marking occurred if an application application was terminated just before one of its containers was tried to access the localizer syslog file. This caused an IOException, and the Node Managers was incorrectly marked unhealthy. This issue is now resolved. The error checking is more specific, preventing these false positive unhealthy markings.
Apache Jira: YARN-11753
COMPX-21537: Upgraded the Jersey version
7.3.2
Upgraded the Jersey framework from version 1.19 to 2.46 to fix the CVE-2017-1000028.
COMPX-23191: Null Pointer Exception in Delegation Token Renewer causes all subsequent applications to fail
7.3.2
Previously, any uncaught exception in DelegationTokenRenewer.RenewalTimerTask#run caused all subsequent YARN applications to fail with a java.lang.IllegalStateException: Timer already cancelled exception. This issue is now resolved, and such failures are prevented.
Apache Jira: YARN-11384
COMPX-24259: Incorrect permissions on NodeManager local directories cause container failures
7.3.2
The NodeManager creates the configured local directories with 755 permissions on startup if the directories did not exist. However, if these permissions were changed after startup or if an administrator created the directories with incorrect permissions before starting YARN, the NodeManager did not reset the permissions, resulting in container failures. This issue is now resolved.
Apache Jira: YARN-11703
COMPX-21461: The queuemanager_includedCipherSuites property fails when using comma as a separator
7.3.2
Previously, the queuemanager_includedCipherSuites property in Queue Manager only supported colon (:) as a separator for cipher suites. When comma (,) was introduced as an additional separator, configurations using commas caused failures in property parsing.

This issue is now resolved by updating the property parsing logic to accept both colon and comma as valid separators.

COMPX-22209: Missing centralised Apache HttpComponents libraries
7.3.2
Previoulsy, the cpx component did not use the centralized Apache HttpComponents libraries (httpcore, httpclient, httpcore5, httpclient5, httpcore5-h2). This issue is now resolved. The component now uses these centralized libraries, to align with with internal standards and incorporate the latest fixes.
COMPX-22213: Missing centralised Bouncy Castle (org.bouncycastle) libraries
7.3.2
Previously, the cpx component did not use the centralized Bouncy Castle org.bouncycastle library versions defined in CDPD (bcprov-jdk18on, bcpkix-jdk18on, and bcutil-jdk18on updated from 1.78 to 1.78.1).This issue is now resolved. The component now uses these centralized libraries to align with internal dependency standards and incorporate the latest security and bug fixes.
COMPX-23423: Apache Commons Lang upgraded to 3.18.0
7.3.2
The Apache Commons Lang package is now upgraded to version 3.18.0 in Queue Manager.
CDPD-91280: Tez is unable to start on 7.3.2.0 FIPS clusters
7.3.2
Previously, Tez startup failed on 7.3.2.0 FIPS clusters, caused by a conflict between Cloudera Manager, which set hadoop.security.secret-manager.key-generator.algorithm to HmacSHA256, and Tez's older, hardcoded use of the HmacSHA1 algorithm. This issue is now resolved by upgrading Tez to version 0.10.5 and updating its secret managers to dynamically respect the algorithm configured by Hadoop at runtime. This change prevents the recurring DIGEST-MD5: digest response format violation errors.
Fixed CVE-2025-28924
7.3.2
Few Hadoop API endpoints are now removed from Cloudera Runtime 7.3.2. This change is a result of the fix for CVE-2025-48924 and YARN's migration to common-lang 3. For more information see, https://docs.cloudera.com/cdp-private-cloud-base/7.3.2/private-release-notes/topics/rt-pvc-api-compat-changes-hadoop.html

Apache Jira: YARN-10772