Known Issues

You must be aware of the known issues and limitations, the areas of impact, and workaround in Cloudera Manager 7.13.2 and its cumulative hotfixes.

Known issues identified in Cloudera Manager 7.13.2

OPSAPS-77052: Ozone DataNode decommission command stuck for more than 4 hours
7.13.2
Ozone DataNode decommissioning can appear stuck in Cloudera Manager while the actual decommissioning is successful. This occurs due to a bug in the monitoring script, where a loose grep expression causes the script to wait indefinitely.
Administrators can manually monitor the DataNode decommission state using the Storage Container Manager (SCM) Web UI or CLI. Once all desired DataNodes are confirmed as decommissioned, the decommission command in Cloudera Manager can be safely aborted.
OPSAPS-76455: The Stop command failed on Ozone S3 Gateway service
7.13.2
When a restart is performed on the Ozone S3 Gateway, the java.lang.IllegalStateException: Singleton not set for STATIC_INSTANCE exception occurs. This exception originates from the JBoss Weld bootstrap process when the CDI registry fails to reinitialize the static singleton provider correctly during the restart cycle.
Avoid the restart operation. Instead, perform a manual stop followed by a start.
OPSAPS-76062: Ozone Replication Configuration Override
7.13.2
When the ozone.replication property is exposed in Cloudera Manager Ozone configurations and assigned a default value, it unintentionally overrides the bucket-level replication configuration as a client-side setting—even if the user does not intend to set client-side configurations.
None
OPSAPS-76845: Last Page button on the Bucket Browser tab is not functional
7.13.2
In Cloudera Manager UI, the Last Page button in the Bucket Browser tab does not function as expected. When users click the Last Page button, the UI refreshes the current page instead of navigating to the last page of buckets. The Next button continues to work as intended, allowing users to move forward one page at a time. This issue is particularly noticeable when there are a large number of buckets, as users must navigate page by page to reach the end.
There is no direct workaround to enable the Last Page button. However, to reduce the number of navigation steps, users can increase the number of buckets displayed per page in the UI settings. This adjustment allows more buckets to be viewed at once, minimizing the number of pages to navigate.
OPSAPS-74844: Service Monitor fails to connect to multiple clusters with distinct custom Kerberos principals
7.13.2
The Cloudera Manager Service Monitor does not support unique Kerberos principal configurations across multiple clusters. The following limitations apply to the Service Monitor:
  • You cannot apply different custom Kerberos settings to different clusters managed by the same Service Monitor.
  • You cannot connect to multiple clusters simultaneously if those clusters require distinct custom Kerberos principals.
None
OPSAPS-76330: Missing request/process context ID in Atlas logs after migration from log4j2 to logback using CM default pattern
7.13.2
After upgrading Apache Atlas logging from log4j2 to logback, and using the default logback pattern provided by Cloudera Manager, Atlas logs no longer consistently include the request/process context ID (for example, etp<timestamp>-<pid> - <uuid>). This results in a regression compared to earlier behavior with log4j2, where the context ID was consistently present for request-scoped operations. The missing context information makes troubleshooting and tracing individual requests more difficult.
None
OPSAPS-75366: The Knox Gateway gateway.log.gz file in the support bundle is corrupt
7.13.2

When you collect diagnostic data, the Knox Gateway gateway.log.gz file under logs/[***HOST NAME***]/ in the downloaded bundle might have 0-byte length. The file does not contain the Knox Gateway logs even when gateway.log on the host has content.

In the diagnostic bundle, open the service-diagnostics/[***CLUSTER NAME***]/[***KNOX SERVICE NAME***]/ folder. The Knox Gateway role diagnostics archive there includes the gateway logs (for example gateway.log and gateway-audit.log).

OPSAPS-75684: Spark fails due to Zookeeper Custom Kerberos Principal issue
Incorrect Zookeeper principal configuration and missing JVM property setup leads to SASL authentication failures.
When a custom Zookeeper principal is used, add the -Dzookeeper.sasl.client.username=[***USERNAME***] JVM argument to spark.*.defaultJavaOptions or spark.*.extraJavaOptions in spark-defaults.conf.
OPSAPS-75443: Hive Metastore Server fails to start after memory reallocation
7.13.2
After executing the /api/v57/hosts/reallocateMemory API, the Hive Metastore Server (HMS) might fail to start with a "Not enough space" memory error. This issue occurs even after the heap size is set to 8GB and typically appears following an Atlas Server memory failure. The HMS service remains in a stopped state because it cannot allocate the required memory resources.
None
OPSAPS-76683: Hive system database creation
7.13.2
The Hive system database creation blocks the upgrade process. When upgrading, the process is interrupted or blocked during the creation of the Hive system database.
None
OPSAPS-73421: Hive Metastore performance logging
7.13.2
The performance logger does not function as expected in the Hive Metastore. Performance logging (Perflogger) fails to record entries in the Hive Metastore (HMS) logs, even when the "Enable Performance Logging" flag is enabled in the Hive service configurations. The required logger is not correctly added to the loggers list in the logging properties.
None
OPSAPS-73237: Hive default heap sizes on Data Hubs
7.13.2
The default Java heap sizes for Hive Metastore (HMS) and HiveServer2 (HS2) are too large for certain Data Hub configurations. On Data Hub clusters, such as those with a 64 GB environment, the default Java heap size for Hive Metastore and HiveServer2 is automatically configured to 16 GB. This high allocation can lead to memory overcommitment and leave insufficient memory for other cluster processes.
You can manually reduce the Java heap size for Hive services to 8 GB or lower in the Cloudera Manager configuration.
OPSAPS-75673: Wrong enablement of Ranger RMS Database Full Sync command
7.1.8, 7.1.9, 7.1.9 SP1, 7.2.18, 7.3.1, 7.3.2.0
The Ranger RMS Database Full Sync command should be enabled only when all RMS server instances are stopped. This is required to ensure that the RMS database synchronizes correctly without introducing conflicts or data corruption. However, when HA (High Availability) is enabled on the cluster, the command becomes available from Cloudera Manager > Ranger RMS > Actions drop-down, even though only one Ranger RMS instance is stopped while the others are still running.
None.
OPSAPS-73684: Service startup failures during High Availability deployment
7.13.2
During High Availability (HA) cluster deployments, Impala services can fail to start due to dependent services not having fully started.
Retry of HA deployment might succeed in some cases.
OPSAPS-76959: Stale alternatives temporary files cause client configuration deployment failure
7.13.2

An interrupted Cloudera Manager upgrade or Agent restart can leave stale temporary files in the /var/lib/alternatives/ directory. These leftover files prevent subsequent Deploy Client Configuration tasks from completing, as the update-alternatives command fails when it encounters an existing .new state file.

During a Cloudera Manager upgrade or configuration activation, a race condition or a forced service restart (such as a SIGTERM/Exit Code -15) can interrupt the update-alternatives process. This interruption leaves behind a stale temporary file, typically named /var/lib/alternatives/<alt_name>.new.

When you later attempt to Deploy Client Configuration, the Agent tries to run update-alternatives --install. The command fails because the operating system detects the pre-existing .new file and returns a non-zero exit code (usually Exit Code 2). Cloudera Manager then reports a command failure similar to: "client configuration ... exited with 2 and expected 0"

This issue is typically host-specific rather than cluster-wide.

If a deployment fails due to a stale alternatives state, manually clear the temporary files on the affected host and retry the deployment from the Cloudera Manager UI.

  1. Verify no active alternatives processes: SSH into the affected host and ensure no other instances of the alternatives tool are currently running:
    pgrep -af 'update-alternatives|alternatives'

    If the command returns no active processes, proceed to Step 2.

  2. Identify the stale temporary files:List the temporary files to confirm which entries are blocked:
    ls -l /var/lib/alternatives/*.new
    To check a specific service (for example, HDFS or Hive), use:
    ls -l /var/lib/alternatives/<alt_name>*
  3. Back up and remove the stale .new files: Create a backup of the stale file in a temporary directory before deleting it:
    sudo cp -a /var/lib/alternatives/<alt_name>.new /var/tmp/<alt_name>.new.$(date +%s).bak
    sudo rm -f /var/lib/alternatives/<alt_name>.new
  4. Verify the alternatives state: Confirm the current status of the link:
    /usr/sbin/update-alternatives --display <alt_name>
  5. Retry the operation: Return to the Cloudera Manager UI and re-run Deploy Client Configuration for the affected service.

OPSAPS-76960: Missing symbolic links due to Agent restart race condition during Cloudera Manager upgrade
7.13.2

During a Cloudera Manager upgrade, a race condition might prevent the creation of critical parcel symbolic links (symlinks). If the cloudera-scm-agent restarts while an update-alternatives process is active, the system terminates the task (Exit Code -15). This leaves symlinks unconfigured and causes dependent services, such as Solr, to fail.

When you upgrade Cloudera Manager, the cloudera-manager-agent package update triggers the cloudera-scm-agent service to execute parcel activation tasks. If an automated script or a user issues a systemctl restart cloudera-scm-agent command while update-alternatives is running, the operating system sends a SIGTERM (Signal 15) to the Agent and its child processes.

This forceful termination interrupts the creation of essential symbolic links, including:

  • /var/lib/hadoop-hdfs/ozone-filesystem-hadoop3.jar

  • /etc/alternatives/ozone-filesystem-hadoop3.jar

The Agent logs this failure as Exit Code: -15 in /var/log/cloudera-scm-agent/cloudera-scm-agent.log. Because these links are missing, dependent services cannot locate necessary libraries and fail to start.

If services fail to start after an upgrade due to missing alternatives or symbolic links (symlinks), manually complete the interrupted activation steps on the affected host or use the Cloudera Manager UI to reconcile the state.

Option 1: Manual fix through CLI
  1. Verify the missing symlink: SSH into the affected host machine and check the status of the failing JAR or symlink. For example:

    /usr/sbin/update-alternatives --display ozone-filesystem-hadoop3.jar

    If the output shows the link is missing or broken, proceed to Step 2.

  2. Manually run the interrupted command:

    Search the Agent log at /var/log/cloudera-scm-agent/cloudera-scm-agent.log for Exit Code: -15 to locate the failed update-alternatives command immediately preceding that error. Copy the full path to the parcel library from that log entry.

    For the ozone-filesystem-hadoop3.jar failure, run the following installation command manually as the root user.:

    sudo /usr/sbin/update-alternatives --install /var/lib/hadoop-hdfs/ozone-filesystem-hadoop3.jar ozone-filesystem-hadoop3.jar /opt/cloudera/parcels/<CDH-VERSION-PATH>/lib/hadoop-ozone/ozone-filesystem-hadoop3.jar 5
  3. Verify the fix: Run the display command again to ensure the link now correctly points to the new parcel directory:
    /usr/sbin/update-alternatives --display ozone-filesystem-hadoop3.jar
  4. Restart Services: After you verify the fix, restart the failing service (such as Solr) through the Cloudera Manager UI.

Option 2: Cloudera Manager UI (Automatic Fix)

Instead of manual CLI intervention, you can force Cloudera Manager to recreate all symbolic links:

  1. Navigate to the Hosts > Parcels page in the Cloudera Manager UI.

  2. Select the affected parcel and click Activate again. This process declaratively identifies and recreates any missing symbolic links across all hosts the cluster.

CDPD-99248: Ozone upgrade finalization might fail in Cloudera Manager
7.13.2

When finalizing an Ozone upgrade for the first time through Cloudera Manager, the finalization command might report a failure in the standard error (stderr) log, even though the process continues to run on the Storage Container Manager (SCM).

During the initial Ozone upgrade finalization, Cloudera Manager might return the following error message:
Invalid response from Storage Container Manager. 
Current finalization status is: FINALIZATION_IN_PROGRESS

This error occurs because Cloudera Manager fails to parse the interim status response from the SCM. Despite the "failed" status in the Cloudera Manager interface, the SCM continues the finalization process in the background.

If you encounter this error, do not restart the finalization command. Instead, manually verify the progress directly on the SCM by following these steps:

  1. Check the Status: Run the following command from the command line to monitor the actual SCM finalization state:
    ozone admin scm finalizationstatus
  2. Wait for Completion: Monitor the output until it indicates that finalization is complete.

  3. Verify in Cloudera Manager: Once the CLI command confirms a successful finalization, you can safely ignore the previous failure message in Cloudera Manager and proceed with your post-upgrade tasks.

OPSAPS-76363: Knox gateway database connection properties are not populated automatically when Oracle is the cluster database
7.13.2

When you run the Knox gateway in high availability (HA) and use JWT token features, only MySQL and PostgreSQL are supported for the Knox gateway token database; Oracle is not supported. Cloudera Manager populates the Knox gateway database connection properties automatically only when you use MySQL or PostgreSQL as the Knox gateway database. If you use Oracle instead, these properties are not populated automatically.

Without those settings, JWT tokens might not behave consistently across Knox gateway instances.

Use MySQL or PostgreSQL for the Knox gateway database when you run Knox in HA with JWT token features.

OPSAPS-76116: Knox gateway database configuration might not be retained after upgrading a Cloudera Base on premises cluster
7.13.2

Knox database configuration properties knox_gateway_database_name, knox_gateway_database_host, knox_gateway_database_user, and knox_gateway_database_password might not be retained after you upgrade a Cloudera Base on premises cluster.

None
OPSAPS-76528: Ozone services enforce IPv4 in DUAL_STACK configuration
7.13.2
In Cloudera Base on premises 7.3.2.0, Ozone must be able to operate correctly in a DUAL-STACK environment, but you will not have control over whether Ozone services communicate in IPv4-only mode or in dual-stack mode.
None
OPSAPS-75735: Systemd disables Cgroup v2 Controllers written by Cloudera Manager Agent

When you enable Cgroup v2, Cloudera Manager services might fail to start because specific controllers (such as cpu, memory, or pids) are missing or not enabled at the root.

Although the Cloudera Manager Agent writes controllers to the root subtree during startup, systemd (the cgroup manager for most modern Linux distributions) frequently disables controllers from the delegation tree if a service does not actively use them. On specific Linux distributions, systemd's strict enforcement of this cleanup prevents the Cloudera Manager Agent from maintaining the necessary environment for service sub-processes.

If you encounter an error stating that a controller is "missing" or "not enabled at root," follow these steps to restore and persist the controllers.

  1. Restart the Cloudera Manager Agent on the affected hosts to force it to rewrite the controllers to the root subtree.
    sudo systemctl restart cloudera-scm-agent

    Try starting the affected services again. If the issue persists, proceed to the next step.

  2. Configure Systemd Delegation by performing the following steps:

    Explicitly instruct systemd to delegate cgroup controllers to the Cloudera Manager Agent process. This ensures the controllers remain available at the root level regardless of systemd's cleanup policies.

    1. Open the unit file: Use a text editor to open the Cloudera Manager Agent service unit file: /usr/lib/systemd/system/cloudera-scm-agent.service
    2. Add the Delegate parameter: Locate the [Service] section. Add Delegate=yes to the configuration.
      Before:
      [Service]
      Type=simple
      
      TasksMax=infinity
      After:
      [Service]
      Type=simple
      Delegate=yes
      TasksMax=infinity
  3. Save the changes to the file.
  4. Reload and restart: Run the following commands in order to apply the new configuration:
    sudo systemctl daemon-reload
    sudo systemctl restart cloudera-scm-agent
OPSAPS-76314: Cloudera Management Service restart fails on large clusters due to Cloudera Manager Descriptor Fetch Timeout
On large-scale deployments, the Cloudera Management Service might fail to start or restart correctly with the following error:
2026-01-03 04:43:05,600 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2026-01-03 04:43:16,074 WARN com.cloudera.cmf.BasicScmProxy: Timed out while fetching the SCM descriptor. This can happen on large clusters. Timeout can be increased by configuring Descriptor Fetch Timeout under Administration > Settings.
2026-01-03 04:43:16,075 WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService: No descriptor fetched from https://ip-10-129-36-226.iopscloud.cloudera.com:7183 on after 1 tries, sleeping for 2 secs.

This occurs because the Cloudera Manager Descriptor Fetch Timeout defaults to 10 seconds, which is often insufficient for the Cloudera Manager Server to generate and transmit the full cluster descriptor to Cloudera Management Service roles like the Event Server or Host Monitor in high-scale environments.

Reaching this timeout causes the service to log a warning and fail initialization. Consequently, the Event Catcher enters a loop, unable to retrieve the necessary configuration.

f you encounter service startup failures on large clusters, manually increase the fetch timeout through the Cloudera Manager Admin Console:

  1. Log in to the Cloudera Manager Admin Console.

  2. Navigate to Administration > Settings.

  3. Search for the parameter: Cloudera Manager Descriptor Fetch Timeout.

  4. Increase the value from the default 10 seconds to 60 seconds.

  5. Click Save Changes.

  6. Restart the Cloudera Management Service.

OPSAPS-75899: HDFS directory creation fails on JDK 11 or higher when LDAP or Active Directory integrated clusters using the hadoop.security.group.mapping property.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500, and 7.13.1.600
Due to this issue, many critical Cloudera Manager operations fail to complete, such as:
  • Install Oozie ShareLib (Oozie > Actions > Install Oozie ShareLib)
  • Install YARN MapReduce Framework JARs (YARN > Actions > Install YARN MapReduce Framework JARs)
You must perform the following workaround steps to manually add specific Java modules to the HDFS service script on all nodes in the cluster:
  1. Navigate to the directory /opt/cloudera/cm-agent/service/hdfs/.

  2. Open the hdfs.sh file for editing.

  3. Locate the JAVA17_ADDITIONAL_JVM_ARGS variable.
  4. Append the flags (--add-exports=java.naming/com.sun.jndi.ldap=ALL-UNNAMED --add-opens=java.naming/com.sun.jndi.ldap=ALL-UNNAMED) to the end of the existing list, and update JAVA17_ADDITIONAL_JVM_ARGS variable to include the following flags:
    Change this:
    JAVA17_ADDITIONAL_JVM_ARGS="--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports=java.base/sun.net.dns=ALL-UNNAMED --add-exports=java.base/sun.net.util=ALL-UNNAMED"
    To this:
    JAVA17_ADDITIONAL_JVM_ARGS="--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports=java.base/sun.net.dns=ALL-UNNAMED --add-exports=java.base/sun.net.util=ALL-UNNAMED --add-exports=java.naming/com.sun.jndi.ldap=ALL-UNNAMED --add-opens=java.naming/com.sun.jndi.ldap=ALL-UNNAMED"
OPSAPS-74066, OPSAPS-74547: DataHub high memory consumption on Hiveserver load for JDK 17
7.13.2

In upgraded DataHub deployments, HiveServer might fail to start due to memory overallocation. This occurs because Cloudera Manager does not account for memory already assigned to Management Service roles when allocating memory for cluster roles. This issue is fixed in fresh installations of Cloudera Manager 7.13.1.500. The updated algorithm now correctly reallocates memory across all roles during cluster setup.

To resolve this issue, use the following API to manually trigger the Cloudera Manager memory allocation algorithm on the host where both HiveServer and management roles are running, and restart cluster to apply updated memory configs.:
API Endpoint: POST /api/v57/hosts/reallocateMemory

Include the host name (the host where HiveServer and management roles run) in the API request body. This ensures that memory assignments are recomputed correctly, taking all roles on the host into account.

OPSAPS-74668: ozone.snapshot.deep.cleaning.enabled and ozone.snapshot.ordered.deletion.enabled configs are missing with Cloudera 7.1.9 SP1 CHF and Cloudera Manager 7.13.1
7.13.2
Two Ozone Manager configs are missing while using Cloudera 7.1.9 SP1 CHF after upgrading Cloudera Manager version from 7.11.3 to 7.13.1.400.
If you are using Cloudera 7.1.9 SP1 CHF, before upgrading Cloudera Manager version from 7.11.3 to 7.13.1.400, add the following configs to Ozone Manager Advanced Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml so that Ozone Manager does not miss important config after the Cloudera Manager upgrade:
<property>
<name>ozone.snapshot.deep.cleaning.enabled</name>
<value>false</value>
</property>

<property>
<name>ozone.snapshot.ordered.deletion.enabled</name>
<value>true</value>
</property>
OPSAPS-73038: False-positive port conflict error message displayed in Cloudera Manager
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500
Cloudera Manager might display a false-positive error message: Port conflict detected: 8443 (Gateway Health HTTP Port) is also used by: Knox Gateway during cluster installations. The warning does not cause actual installation failures.
None
OPSAPS-74950: Ozone replication policies fail for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1.400
7.13.1.500, 7.13.2.0
Ozone replication policies for Ozone linked buckets fail when the Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters use Cloudera Manager 7.13.1.400.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-72439: HDFS and Hive external tables replication policies fail when using custom “krb5.conf” files for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, , 7.13.2.0
The issue appears when the custom krb5.conf file is not propagated to the required files, and you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500, and complete the instructions in step 13 in Using a custom Kerberos configuration path.
OPSAPS-71459: Commands continue to run after Cloudera Manager restart
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, remote replication commands continue to run endlessly even after a Cloudera Manager restart operation.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-73158, OPSAPS-74206: HDFS replication policies fail when the policies prefetch the expired Kerberos ticket from the 'sourceTicketCache' file for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, Replication Manager pre-fetches the Kerberos ticket from the sourceTicketCache file for the replication policies. Issues appear when the file contains an expired Kerberos ticket.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-73405, OPSAPS-71565, OPSAPS-72860, OPSAPS-72859: Replication policies fail even after the source or target cluster becomes available after it recovers from temporary node failures for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, Hive replication policies and HBase replication policies fail even after the source or target cluster recovers from a temporary node failure.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-73655, OPSAPS-73737: Cloud replication fails even after the delegation token is issued for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, the replication policies fail during an incremental replication run if you chose the Advanced Setting > Delete Policy > Delete permanently option during the replication policy creation process.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-74040, OPSAPS-74058: Ozone OBS replication fails due to pre-filelisting check failure for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400 and the source bucket is a linked bucket, then the replication fails during the Run Pre-Filelisting Check step for OBS-to-OBS Ozone replication, and the error message Source bucket is a linked bucket, however the bucket it points to is also a link appears. This issue appears even when the source bucket is directly linked to a regular, non-linked bucket.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-73602, OPSAPS-74353: HDFS replication policies to cloud fails with HTTP 400 error for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, the HDFS replication policies to cloud fail after you edit the replication policies in the Cloudera Manager > Replication Manager UI.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
OPSAPS-73645, OPSAPS-73847: Ozone bucket browser does not show the volume buckets for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with Cloudera Manager 7.13.1.400, the volume buckets do not appear if the number of volumes exceed 26, when you click on Next Page on the Cloudera Manager > Clusters > Ozone service > Bucket Browser page and then on a volume name.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters with Cloudera Manager 7.13.1.500.
RELENG-27000: Proper link for bigtop-detect-javahome is missing when using CDP Private Cloud Base 7.1.9 SP1 CHF5 with Cloudera Manager 7.13.1 CHF3.
When using Cloudera Manager 7.13.1 CHF3 with CDP Private Cloud Base 7.1.9 SP1 CHF5 results in inappropriate bigtop-detect-javahome link.
Create a link under /opt/cloudera/parcels/CDH/bin/bigtop-detect-javahome that points to /opt/cloudera/parcels/CDH/lib/bigtop-utils/bigtop-detect-javahome. For example:
ln -s /opt/cloudera/parcels/CDH/lib/bigtop-utils/bigtop-detect-javahome /opt/cloudera/parcels/CDH/bin/bigtop-detect-javahome
CDPD-79725: Hive fails to start after Datahub restart due to high memory usage
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500

After restarting the Cloudera Data hub, the services appears to be down in the Cloudera Manager UI. The Cloudera Management Console reports a node failure error for the master node.

The issue is caused by high memory usage due to the G1 garbage collector on Java 17, leading to insufficient memory issues and thereby moving the Cloudera clusters to an error state.

Starting with Cloudera 7.3.1.0, Java 17 is the default runtime instead of Java 8, and its memory management increases memory usage, potentially affecting system performance. Clusters might report error states, and logs might show insufficient memory exceptions.

To mitigate this issue and prevent startup failures after a Datahub restart, you can perform either of the following actions, or both:

  • Reduce the Java heap size for affected services to prevent nodes from exceeding the available memory.
  • Increase physical memory for on cloud or on-premises instances running the affected services.
OPSAPS-74370: Knox's Save Alias - IDBroker command fails due to missing variable declaration
7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
Users trying to create IDBroker aliases through the Cloudera Manager UI face issues in Cloudera Manager 7.13.1 using CDP 7.1.9.
The alias(es) can be created using the Knox CLI:
  1. ssh to Knox host.
  2. export KNOX_GATEWAY_DATA_DIR="/var/lib/knox/idbroker/data"; export KNOX_GATEWAY_CONF_DIR="/var/lib/knox/idbroker/conf"
  3. /opt/cloudera/parcels/CDH/lib/knox/bin/knoxcli.sh create-alias <ALIAS_NAME> --cluster <CLUSTER_NAME> --value <ALIAS_VALUE>
  4. Verify the addition using /opt/cloudera/parcels/CDH/lib/knox/bin/knoxcli.sh list-alias --cluster <CLUSTER_NAME>

For HA deployments, users must do it on every Knox hosts (whereas the Save Alias command applies the change to all hosts automatically).

OPSAPS-71669: The Continue option is disabled on the Static Service Pools Review page, affecting the functionality of Static Service Pools
7.13.1
7.13.1.100

The minimum and maximum I/O weight values for Cgroup v2 were incorrectly set to 100 and 1000, respectively, in Cloudera Manager 7.13.1.0. According to official Cgroup v2 documentation, the valid range should be 1 to 10,000. Due to this incorrect configuration range, the Continue option on the Static Service Pools Review page was disabled, preventing users from proceeding with pool configuration.

This issue might occur on clusters running Cloudera Manager 7.13.1.0 with Cgroup v2 resource management when configuring or reviewing Static Service Pools. After upgrading to Cloudera Manager 7.13.1.100 CHF-1, this issue no longer occurs.

None
OPSAPS-75290, OPSAPS-74994: The yarn_enable_container_usage_aggregation job is failing with “Null real user” error on Service Monitor.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, and 7.13.1.500
The yarn_enable_container_usage_aggregation job is failing with "Null real user" error on Service Mnitor when the Yarn service is running on the computer cluster with Stub DFS, and when the Powerscale Service is running in the cluster with Powerscale DFS provider instead of HDFS.
None.
OPSAPS-71581: Cloudera Manager Agent's append_properties function fails with the realpath: invalid option -- 'u' error when executed from service control scripts.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, and 7.13.1.500
Errors appear on the standard error (stderr) log of Cloudera Data Platform (CDP) services when you are attempting to trigger the cloudera-config.sh script. The error log contains the following message: realpath: invalid option -- 'u'. This is caused by an incorrectly placed command-line flag in the script, which prevents some service configurations from loading correctly.
To resolve this issue temporarily, you must perform the following workaround steps on each agent node in the base cluster::
  1. Navigate to the directory /opt/cloudera/cm-agent/service/common/.

  2. Open the cloudera-config.sh file for editing.

  3. Locate the two lines that execute the python scripts such as append_properties.py and get_property.py.

  4. In both lines, remove the -u flag or change its position to after python to the end of the line:
    Change this:
    value=$(python -u "${GET_PROPERTY_PY_DIR}"/get_property.py "${1}" "${2}")
    To this:
    value=$(python "${GET_PROPERTY_PY_DIR}"/get_property.py "${1}" "${2}" -u)
    Change this:
    python -u "${APPEND_PROPERTIES_PY_DIR}"/append_properties.py "${1}" "${2}"
    To this:
    python "${APPEND_PROPERTIES_PY_DIR}"/append_properties.py "${1}" "${2}" -u
  5. After saving the changes on all agent nodes, restart the entire cluster for the new configuration to take effect.

  6. Verify the fix by checking the stderr.log on a few service instances to ensure the realpath: invalid option -- 'u' error no longer appears.

OPSAPS-71878: Ozone fails to restart during cluster restart and displays the error message: Service has only 0 Storage Container Manager roles running instead of minimum required 1.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, and 7.13.1.500
  1. You must open Cloudera Manager on the second browser and restart the Ozone service separately.
  2. After the Ozone service restarts, you can resume the cluster restart from the first browser.
ENGESC-30503, OPSAPS-74868: Cloudera Manager limited support for custom external repository requiring basic authentication
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, and 7.13.1.400
Current Cloudera Manager does not support custom external repository with basic authentication (the Cloudera Manager Wizard supports either HTTP (non-secured) repositories or usage of Cloudera https://archive.cloudera.com only). In case customers want to use a custom external repository with basic authentication, they might get errors.

The assumption is that you can access the external custom repository (such as Nexus or JFrog, or others) using LDAP credentials. In case an applicative user is used to fetch the external content (as done in Data Services with the docker imager repository), the customer should ensure that this applicative user is located under the user's base search path where the real users are being retrieved during LDAP authentication check (so the external repository will find it and will allow it to gain access for fetching the files).

Once done, you can use the current custom URL fields in the Cloudera Manager Wizard and enter the URL for the RPMs or parcels/other files in the format of "https://USERNAME:PASSWORD@server.example.com/XX".

While using the password, you are advised to use only the printable ASCII character range (excluding space), whereas in case of a special character (not letter/number) it can be replaced with HEX value (For example, you can replace Aa1234$ with Aa1234%24 as '%24' is translated into $ sign).

OPSAPS-72164: Proxy Settings and Telemetry Publisher in Cloudera Manager
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, and 7.13.1.400
In Cloudera Manager 7.13.1, the PROXY settings for the Telemetry Publisher (TP) are not functioning as expected. This may impact the Telemetry Publisher's ability to communicate through a configured proxy.
You must upgrade to Cloudera Manager 7.13.1 CHF5 (7.13.1.500) and higher.
OPSAPS-60726: Newly saved parcel URLs are not showing up in the parcels page in the Cloudera Manager HA cluster.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
To safely manage parcels in a Cloudera Manager HA environment, follow these steps:
  1. Shutdown the Passive Cloudera Manager Server.
  2. Add and manage the parcel as usual, as described in Install Parcels.
  3. Restart the Passive Cloudera Manager server after parcel operations are complete.
OPSAPS-74341: NodeManagers might fail to start during the cluster restart after the Cloudera Manager 7.13.1.x upgrade
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500

Cgroup v2 support is enabled in CDP 7.1.9 SP1 CHF5 and higher versions. However, if the user upgrades from Cloudera Manager 7.11.3.x to Cloudera Manager 7.13.1.x, and the environment is using cgroup v2, the NodeManagers might fail to start during the cluster restart after the Cloudera Manager 7.13.1.x upgrade.

To resolve this issue temporarily, you must perform the following steps:
  1. Go to the YARN service page on the Cloudera Manager UI.

  2. Navigate to the Configuration tab.

  3. Search for NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml.

  4. Add the following entry:
    1. Add yarn.nodemanager.linux-container-executor.cgroups.v2.enabled=true

  5. Restart the Nodemanagers. Nodemanagers restart successfully.

OPSAPS-73546: Service Monitor fails to perform Canary tests on HMS / HBASE / ZooKeeper due to missing dependencies
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400

Due to a missing dependency caused by an incomplete build and packaging in certain OS releases, the HMS (Hive Metastore) Canary health test fails, logging a ClassNotFoundException in the Service Monitor log. This problem relates to all deliveries using runtime cluster version 7.1.x or 7.2.x, while the Cloudera Manager version is 7.13.1.x and the OS is NOT RHEL8.

In case your OS is either RHEL 9 or SLES 15 or Ubuntu 2004 or Ubuntu 2204 and if you install the Cloudera Manager 7.13.1.x version, then create a symbolic link using root user privileges on the node that host the Service Monitor service (cloudera-scm-firehose) at /opt/cloudera/cm/lib/cdh71/cdh71-hive-client-7.13.1-shaded.jar, pointing to /opt/cloudera/cm/lib/cdh7/cdh7-hive-client-7.13.1-shaded.jar.

Restart the Service Monitor service post the change. This will allow the Service Monitor to perform Canary testing correctly on the HMS (Hive Metastore) service.

OPSAPS-72706, OPSAPS-73188: Hive queries fail after upgrading Cloudera Manager from 7.11.2 to 7.11.3 or later
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500
Upgrading Cloudera Manager from version 7.11.2 or earlier to 7.11.3 or later causes Hive queries to fail due to JDK17 restrictions. Some JDK8 options are deprecated, leading to inaccessible classes and exceptions:
java.lang.reflect.InaccessibleObjectException: Unable to make field private volatile java.lang.String java.net.URI.string accessible
To resolve this issue:
  1. In Cloudera Manager, go to Tez > Configuration
  2. Append the following values to both tez.am.launch.cmd-opts and tez.task.launch.cmd-opts:
    
    --add-opens=java.base/java.net=ALL-UNNAMED
    --add-opens=java.base/java.util=ALL-UNNAMED
    --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
    --add-opens=java.base/java.util.regex=ALL-UNNAMED
    --add-opens=java.base/java.lang=ALL-UNNAMED
    --add-opens=java.base/java.time=ALL-UNNAMED
    --add-opens=java.base/java.io=ALL-UNNAMED
    --add-opens=java.base/java.nio=ALL-UNNAMED
  3. Save and restart
OPSAPS-72998: Missing charts for HMS event APIs
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
Charts for HMS event APIs (get_next_notification, get_current_notificationEventId, and fire_listener_event) are missing in Cloudera Manager > Hive > Hive Metastore Instance > Charts Library > API
Monitor HMS event activity using Hive Metastore logs.
OPSAPS-72270: Start ECS command fails on uncordon nodes step
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300

In an ECS HA cluster, the server node restarts during the start up. This may cause the uncordon step to fail.

To resolve this issue temporarily, you must perform the following steps:
  1. Run the following command on the same node to verify whether the kube-apiserver is ready:
    kubectl get pods -n kube-system | grep kube-apiserver
  2. Resume the command from the Cloudera Manager UI.
OPSAPS-73225: Cloudera Manager Agent reporting inactive/failed processes in Heartbeat request
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300

As part of introducing Cloudera Manager 7.13.x, some changes were done to the Cloudera Manager logging, eventually causing Cloudera Manager Agent to report on inactive/stale processes during Heartbeat request.

As a result, the Cloudera Manager servers logs are getting filled rapidly with these notifications though they do not have impact on service.

In addition, with adding the support for the Cloudera Observability feature, some additional messages were added to the logging of the server. However, in case the customer did not purchase the Cloudera Observability feature, or the telemetry monitoring is not being used, these messages (which appears as "TELEMETRY_ALTUS_ACCOUNT is not configured for Otelcol" are filling the server logs and preventing proper follow-up on the server activities).

This will be fixed in a later release by moving these log notifications to DEBUG level so they don't appear on the Cloudera Manager server logs. Until that fix, perform the following workaround to filter out these messages.

On each of the Cloudera Manager servers, update with root credentials the file /etc/cloudera-scm-server/log4j.properties and add the following lines at the end of the file:
# === Custom Appender with Filters ===
log4j.appender.filteredlog=org.apache.log4j.ConsoleAppender
log4j.appender.filteredlog.layout=org.apache.log4j.PatternLayout
log4j.appender.filteredlog.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# === Filter #1: Drop warning ===
log4j.appender.filteredlog.filter.1=org.apache.log4j.varia.StringMatchFilter
log4j.appender.filteredlog.filter.1.StringToMatch=Received Process Heartbeat for unknown (or duplicate) process.
log4j.appender.filteredlog.filter.1.AcceptOnMatch=false
# === Filter #2: Drop telemetry config warning ===
log4j.appender.filteredlog.filter.2=org.apache.log4j.varia.StringMatchFilter
log4j.appender.filteredlog.filter.2.StringToMatch=TELEMETRY_ALTUS_ACCOUNT is not configured for Otelcol
log4j.appender.filteredlog.filter.2.AcceptOnMatch=false
# === Accept all other messages ===
log4j.appender.filteredlog.filter.3=org.apache.log4j.varia.AcceptAllFilter
# === Specific logger for AgentProtocolImpl ===
log4j.logger.com.cloudera.server.cmf.AgentProtocolImpl=WARN, filteredlog
log4j.additivity.com.cloudera.server.cmf.AgentProtocolImpl=false
# === Specific logger for BaseMonitorConfigsEvaluator === 
log4j.logger.com.cloudera.cmf.service.config.BaseMonitorConfigsEvaluator=WARN, filteredlog
log4j.additivity.com.cloudera.cmf.service.config.BaseMonitorConfigsEvaluator=false

Once done, restart the Cloudera Manager server(s) for the updated configuration to be picked.

OPSAPS-73211: Cloudera Manager 7.13.1 does not clean up Python Path impacting Hue to start
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500

When you upgrade from Cloudera Manager 7.7.1 or lower versions to Cloudera Manager 7.13.1 or higher versions with CDP Private Cloud Base 7.1.7.x Hue does not start because Cloudera Manager forces Hue to start with Python 3.8, and Hue needs Python 2.7.

The reason for this issue is because Cloudera Manager does not clean up the Python Path at any time, so when Hue tries to start the Python Path points to 3.8, which is not supported in CDP Private Cloud Base 7.1.7.x version by Hue.

To resolve this issue temporarily, you must perform the following steps:

  1. Locate the hue.sh in /opt/cloudera/cm-agent/service/hue/.
  2. Add the following line after export HADOOP_CONF_DIR=$CONF_DIR/hadoop-conf:
    export PYTHONPATH=/opt/cloudera/parcels/CDH/lib/hue/build/env/lib64/python2.7/site-packages
OPSAPS-73011: Wrong parameter in the /etc/default/cloudera-scm-server file
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400
In case the Cloudera Manager needs to be installed in High Availability (2 nodes or more as explained here), the parameter CMF_SERVER_ARGS in the /etc/default/cloudera-scm-server file is missing the word "export" before it (on the file there is only CMF_SERVER_ARGS= and not export CMF_SERVER_ARGS=), so the parameter cannot be utilized correctly.
Edit the /etc/default/cloudera-scm-server file with root credentials and add the word "export" before the parameter CMF_SERVER_ARGS=.
OPSAPS-60346: Upgrading Cloudera Manager Agent triggers cert rotation in Auto-TLS use case 1
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500

Upgrading Cloudera Manager Agent nodes from the Cloudera Manager UI wizard as part of a Cloudera Manager upgrade causes the host to get new certificates, which becomes disruptive.

The issue happens with use case 1 and Cloudera Manager DB is because Cloudera Manager always regenerates the host cert as part of the host install or host upgrade step. However, with use case 3, Cloudera Manager does not regenerate the cert as it comes from the user.

Currently, there are three following possible workarounds:
  • Rotate all CMCA certs again using the generateCmca API command, and using the "location" argument to specify a directory on disk. This will revert to the old behavior of storing the certs on disk instead of the DB.
  • Switch to Auto-TLS Use Case 3 (Customer CA-signed Certificates).
  • Manual upgrade of Cloudera Manager Agents, instead of upgrading from Cloudera Manager GUI.
OPSAPS-72447, CDPD-76705: Ozone incremental replication fails to copy renamed directory
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
Ozone incremental replication using Ozone replication policies succeed but might fail to sync nested renames for FSO buckets.
When a directory and its contents are renamed between the replication runs, the outer level rename synced but did not sync the contents with the previous name.
None
OPSAPS-72756:The runOzoneCommand API endpoint fails during the Ozone replication policy run
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
The /clusters/{clusterName}/runOzoneCommand Cloudera Manager API endpoint fails when the API is called with the getOzoneBucketInfo command. In this scenario, the Ozone replication policy runs also fail if the following conditions are true:
  • The source Cloudera Manager version is 7.11.3 CHF11 or 7.11.3 CHF12.
  • The target Cloudera Manager is version 7.11.3 through 7.11.3 CHF10 or 7.13.0.0 or later where the feature flag API_OZONE_REPLICATION_USING_PROXY_USER is disabled.
Choose one of the following methods as a workaround:
  • Upgrade the target Cloudera Manager before you upgrade the source Cloudera Manager for 7.11.3 CHF12 version only.
  • Pause all replication policies, upgrade source Cloudera Manager, upgrade destination Cloudera Manager, and unpause the replication policies.
  • Upgrade source Cloudera Manager, upgrade target Cloudera Manager, and rerun the failed Ozone replication policies between the source and target clusters.
OPSAPS-65377: Cloudera Manager - Host Inspector not finding Psycopg2 on Ubuntu 20 or Redhat 8.x when Psycopg2 version 2.9.3 is installed.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400

Host Inspector fails with Psycopg2 version error while upgrading to Cloudera Manager 7.13.1.x versions. When you run the Host Inspector, you get an error Not finding Psycopg2, even though it is installed on all hosts.

None
OPSAPS-68340: Zeppelin paragraph execution fails with the User not allowed to impersonate error.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500

Starting from Cloudera Manager 7.11.3, Cloudera Manager auto-configures the livy_admin_users configuration when Livy is run for the first time. If you add Zeppelin or Knox services later to the existing cluster and do not manually update the service user, the User not allowed to impersonate error is displayed.

If you add Zeppelin or Knox services later to the existing cluster, you must manually add the respective service user to the livy_admin_users configuration in the Livy configuration page.

OPSAPS-72804: For recurring replication policies, the interval is overwritten to 1 after the replication policy is edited
7.13.1
7.13.1.100, 7.13.2.0
When you edit an Atlas, Iceberg, Ozone, or a Ranger replication policy that has a recurring schedule on the Replication Manager UI, the Edit Replication Policy modal window appears as expected. However, the frequency of the policy is reset to run at “1” unit where the unit depends on what you have set in the replication policy. For example, if you have set the replication policy to run every four hours, it is reset to one hour when you edit the replication policy.
After you edit the replication policy as required, you must ensure that you manually set the frequency to the original scheduled frequency, and then save the replication policy.
OPSAPS-69342: Access issues identified in MariaDB 10.6 were causing discrepancies in High Availability (HA) mode
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500

MariaDB 10.6, by default, includes the property require_secure_transport=ON in the configuration file (/etc/my.cnf), which is absent in MariaDB 10.4. This setting prohibits non-TLS connections, leading to access issues. This problem is observed in High Availability (HA) mode, where certain operations may not be using the same connection.

To resolve the issue temporarily, you can either comment out or disable the line require_secure_transport in the configuration file located at /etc/my.cnf.

CDPD-53160: Incorrect job run status appears for subsequent Hive ACID replication policy runs after the replication policy fails
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300, 7.13.2.0
When a Hive ACID replication policy run fails with the FAILED_ADMIN status, the subsequent Hive ACID replication policy runs show SKIPPED instead of FAILED_ADMIN status on the Cloudera Manager > Replication Manager > Replication Policies > Actions > Show History page which is incorrect. It is recommended that you check Hive ACID replication policy runs if multiple subsequent policy runs show the SKIPPED status.
None
CDPQE-36126: Iceberg replication fails when source and target clusters use different nameservice names
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
When you run an Iceberg replication policy between clusters where the source and target clusters use different nameservice names, the replication policy fails.
Perform the following steps to mitigate the issue, note that in the following steps the source nameservice is assumed to be ns1 and target cluster nameservice is assumed to be ns2:
  1. Go to the Cloudera Manager > Replication > Replication > Replication Policies page.
  2. Click Actions > Edit for the required Iceberg replication policy.
  3. Go to the Advanced tab on the Edit Iceberg Replication Policy modal window.
  4. Enter the mapreduce.job.hdfs-servers.token-renewal.exclude = ns1, ns2 key value pair for Advanced Configuration Snippet (Safety Valve) for source hdfs-site.xml and Advanced Configuration Snippet (Safety Valve) for destination hdfs-site.xml fields.
  5. Save the changes.
  6. Click Actions > Run Now to run the replication policy.
CDPD-53185: Clear REPL_TXN_MAP table on target cluster when deleting a Hive ACID replication policy
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
The entry in REPL_TXN_MAP table on the target cluster is retained when the following conditions are true:
  1. A Hive ACID replication policy is replicating a transaction that requires multiple replication cycles to complete.
  2. The replication policy and databases used in it get deleted on the source and target cluster even before the transaction is completely replicated.

In this scenario, if you create a database using the same name as the deleted database on the source cluster, and then use the same name for the new Hive ACID replication policy to replicate the database, the replicated database on the target cluster is tagged as ‘database incompatible’. This happens after the housekeeper thread process (that runs every 11 days for an entry) deletes the retained entry.

Create another Hive ACID replication policy with a different name for the new database
DMX-3973: Ozone replication policy with linked bucket as destination fails intermittently
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
When you create an Ozone replication policy using a linked/non-linked source cluster bucket and a linked target bucket, the replication policy fails during the "Trigger a OZONE replication job on one of the available OZONE roles" step.
None
OPSAPS-68143:Ozone replication policy fails for empty source OBS bucket
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
An Ozone incremental replication policy for an OBS bucket fails during the “Run File Listing on Peer cluster” step when the source bucket is empty.
None
OPSAPS-71592: Replication Manager does not read the default value of “ozone_replication_core_site_safety_valve” during Ozone replication policy run
7.13.1
7.13.1.100, 7.13.2
During the Ozone replication policy run, Replication Manager does not read the value in the ozone_replication_core_site_safety_valve advanced configuration snippet if it is configured with the default value.
To mitigate this issue, you can use one of the following methods:
  • Remove some or all the properties in ozone_replication_core_site_safety_valve, and move them to ozone-conf/ozone-site.xml_service_safety_valve.
  • Add a dummy property with no value in ozone_replication_core_site_safety_valve. For example, add <property><name>dummy_property</name><value></value></property>, save the changes, and run the Ozone replication policy.
OPSAPS-71897: Finalize Upgrade command fails after upgrading the cluster with CustomKerberos setup causing INTERNAL_ERROR with EC writes.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
After the UI FinalizeCommand fails, you must manually run the finalize commands through the Ozone Admin CLI:
  1. kinit with the scm custom kerberos principal
  2. ozone admin scm finalizeupgrade
  3. ozone admin scm finalizationstatus
OPSAPS-72204: HMS compaction configuration not updated through Cloudera Manager UI
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
The hive.compactor.initiator.on checkbox in Cloudera Manager UI for Hive Metastore (HMS) does not reflect the actual configuration value in cloud deployments. The default value is false, causing the compactor to not run.
To update the hive.compactor.initiator.on value:
  1. In the Cloudera Manager, go to Hive > Configuration
  2. Add the value for hive.compactor.initiator.on to true in the "Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml"
  3. Save the changes and Restart.
Once applied, the compaction process will run as expected.
OPSAPS-70702: Ranger replication policies fail if the clusters do not use AutoTLS
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
Ranger replication policies fail during the Exporting services, policies and roles from Ranger remote step.
  • Log in to the Ranger Admin host(s) on the source cluster.
  • Identify the Cloudera Manager agent PEM file using the # cat /etc/cloudera-scm-agent/config.ini | grep -i client_cert_file command. For example, the file might reside in client_cert_file=/myTLSpath/cm_server-cert.pem location.
  • Create the path for the new PEM file using the # mkdir -p /var/lib/cloudera-scm-agent/agent-cert/ command.
  • Copy the client_cert_file from config.ini as cm-auto-global_cacerts.pem file using the # cp /myTLSpath/cm_server-cert.pem /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem command.
  • Change the ownership to 644 using the # chmod 644 /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem command.
  • Resume the Ranger replication policy in Replication Manager.

OPSAPS-71424: The configuration sanity check step ignores during the replication advanced configuration snippet values during the Ozone replication policy job run
7.13.1
7.13.1.100, 7.13.2.0
The OBS-to-OBS Ozone replication policy jobs fail if the S3 property values for fs.s3a.endpoint, fs.s3a.secret.key, and fs.s3a.access.key are empty in Ozone Service Advanced Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml even though you defined the properties in Ozone Replication Advanced Configuration Snippet (Safety Valve) for core-site.xml.
Ensure that the S3 property values for fs.s3a.endpoint, fs.s3a.secret.key, and fs.s3a.access.key contains at least a dummy value in Ozone Service Advanced Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml.

Additionally, you must ensure that you do not update the property values in Ozone Replication Advanced Configuration Snippet (Safety Valve) for core-site.xml for Ozone replication jobs. This is because the values in this advanced configuration snippet overrides the property values in core-site.xml and not the ozone-site.xml file.

Different property values in Ozone Service Advanced Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml and Ozone Replication Advanced Configuration Snippet (Safety Valve) for core-site.xml result in a nondeterministic behavior where the replication job picks up either value during the job run which leads to incorrect results or replication job failure.

OPSAPS-71403: Ozone replication policy creation wizard shows "Listing Type" field in source Cloudera Private Cloud Base versions lower than 7.1.9
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
When the source Cloudera Private Cloud Base cluster version is lower than 7.1.9 and the Cloudera Manager version is 7.11.3, the Ozone replication policy creation wizard shows Listing Type and its options. These options are not available in Cloudera Private Cloud Base 7.1.8.x versions.
OPSAPS-71659: Ranger replication policy fails because of incorrect source to destination service name mapping
7.13.1
7.13.1.100, 7.13.2.0
Ranger replication policy fails because of incorrect source to destination service name mapping format during the transform step.
If the service names are different in the source and target, then you can perform the following steps to resolve the issue:
  1. SSH to the host on which the Ranger Admin role is running.
  2. Find the ranger-replication.sh file.
  3. Create a backup copy of the file.
  4. Locate substituteEnv SOURCE_DESTINATION_RANGER_SERVICE_NAME_MAPPING ${RANGER_REPL_SERVICE_NAME_MAPPING} in the file.
  5. Modify it to substituteEnv SOURCE_DESTINATION_RANGER_SERVICE_NAME_MAPPING "'${RANGER_REPL_SERVICE_NAME_MAPPING//\"}'"
  6. Save the file.
  7. Rerun the Ranger replication policy.
OPSAPS-69782: HBase COD-COD replication from 7.3.1 to 7.2.18 fails during the "create adhoc snapshot" step
7.13.1
7.13.1.100, 7.13.2.0
An HBase replication policy replicating from 7.3.1 COD to 7.2.18 COD cluster that has ‘Perform Initial Snapshot” enabled fails during the snapshot creation step in Cloudera Replication Manager.
OPSAPS-71414: Permission denied for Ozone replication policy jobs if the source and target bucket names are identical
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
The OBS-to-OBS Ozone replication policy job fails with the com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden or Permission denied error when the bucket names on the source and target clusters are identical and the job uses S3 delegation tokens. Note that the Ozone replication jobs use the delegation tokens when the S3 connector service is enabled in the cluster.
You can use one of the following workarounds to mitigate the issue:
  • Use different bucket names on the source and target clusters.
  • Set the fs.s3a.delegation.token.binding property to an empty value in ozone_replication_core_site_safety_valve to disable the delegation tokens for Ozone replication policy jobs.
OPSAPS-71256: The “Create Ranger replication policy” action shows 'TypeError' if no peer exists
7.13.1
7.13.1.100, 7.13.2.0
When you click target Cloudera Manager > Replication Manager > Replication Policies > Create Replication Policy > Ranger replication policy, the TypeError: Cannot read properties of undefined error appears.
OPSAPS-71067: Wrong interval sent from the Replication Manager UI after Ozone replication policy submit or edit process.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.2.0
When you edit the existing Ozone replication policies, the schedule frequency changes unexpectedly.
OPSAPS-70848: Hive external table replication policies fail if the source cluster is using Dell EMC Isilon storage
7.13.1
7.13.1.100, 7.13.2.0
During the Hive external table replication policy run, the replication policy fails at the Hive Replication Export step. This issue is resolved.
OPSAPS-71005: RemoteCmdWork uses a singlethreaded executor
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
Replication Manager runs the remote commands for a replication policy through a single-thread executor.
OPSAPS-59553: SMM's bootstrap server config should be updated based on Kafka's listeners
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
SMM does not show any metrics for Kafka or Kafka Connect when multiple listeners are set in Kafka.
Workaround: SMM cannot identify multiple listeners and still points to bootstrap server using the default broker port (9093 for SASL_SSL). You need to override the bootstrap server URL by performing the following steps:
  1. In Cloudera Manager, go to SMM > Configuration > Streams Messaging Manager Rest Admin Server Advanced Configuration Snippet (Safety Valve)
  2. Override bootstrap server URL (hostname:port as set in the listeners for broker) for streams-messaging-manager.yaml.
  3. Save your changes.
  4. Restart SMM.
OPSAPS-69317: Kafka Connect Rolling Restart Check fails if SSL Client authentication is required
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
The rolling restart action does not work in Kafka Connect when the ssl.client.auth option is set to required. The health check fails with a timeout which blocks restarting the subsequent Kafka Connect instances.
You can set ssl.client.auth to requested instead of required and initiate a rolling restart again. Alternatively, you can perform the rolling restart manually by restarting the Kafka Connect instances one-by-one and checking periodically whether the service endpoint is available before starting the next one.
OPSAPS-70971: Schema Registry does not have permissions to use Atlas after an upgrade
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
Following an upgrade, Schema Registry might not have the required permissions in Ranger to access Atlas. As a result, Schema Registry's integration with Atlas might not function in secure clusters where Ranger authorization is enabled.
  1. Access the Ranger Console (Ranger Admin web UI).
  2. Click the cm_atlas resource-based service.
  3. Add the schemaregistry user to the all - * policies.
  4. Click Manage Service > Edit Service.
  5. Add the schemaregistry user to the default.policy.users property.
OPSAPS-59597: SMM UI logs are not supported by Cloudera Manager
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
Cloudera Manager does not display a Log Files menu for SMM UI role (and SMM UI logs cannot be displayed in the Cloudera Manager UI) because the logging type used by SMM UI is not supported by Cloudera Manager.
View the SMM UI logs on the host.
OPSAPS-72298: Impala metadata replication is mandatory and UDF functions parameters are not mapped to the destination
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
Impala metadata replication is enabled by default but the legacy Impala C/C++ UDF's (user-defined functions) are not replicated as expected during the Hive external table replication policy run.
Edit the location of the UDF functions after the replication run is complete. To accomplish this task, you can edit the “path of the UDF function” to map it to the new cluster address, or you can use a script.
OPSAPS-70713: Error appears when running Atlas replication policy if source or target clusters use Dell EMC Isilon storage
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
You cannot create an Atlas replication policy between clusters if one or both the clusters use Dell EMC Isilon storage.
None
OPSAPS-72468: Subsequent Ozone OBS-to-OBS replication policy do not skip replicated files during replication
7.13.1
7.13.1.100
The first Ozone replication policy run is a bootstrap run. Sometimes, the subsequent runs might also be bootstrap jobs if the incremental replication fails and the job runs fall back to bootstrap replication. In this scenario, the bootstrap replication jobs might replicate the files that were already replicated because the modification time is different for a file on the source and the target cluster.
None
OPSAPS-72470: Hive ACID replication policies fail when target cluster uses Dell EMC Isilon storage and supports JDK17
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.2.0
Hive ACID replication policies fail if the target cluster is deployed with Dell EMC Isilon storage and also supports JDK17.
None
OPSAPS-73138, OPSAPS-72435: Ozone OBS-to-OBS replication policies create directories in the target cluster even when no such directories exist on the source cluster
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
Ozone OBS-to-OBS replication uses Hadoop S3A connector to access data on the OBS buckets. Depending on the runtime version and settings in the clusters:
  • directory marker keys (associated to the parent directories) appear in the destination bucket even when it is not available in the source bucket.
  • delete requests of non-existing keys to the destination storage are submitted which result in `Key delete failed` messages to appear in the Ozone Manager log.

The OBS buckets are flat namespaces with independent keys, and the character ‘/’ has no special significance in the key names. Whereas in FSO buckets, each bucket is a hierarchical namespace with filesystem-like semantics, where the ‘/’ separated components become the path in the hierarchy. The S3A connector provides filesystem-like semantics over object stores where the connector mimics the directory behaviour, that is, it creates and optionally deletes the “empty directory markers”. These markers get created when the S3A connector creates an empty directory. Depending on the runtime (S3A connector) version and settings, these markers are deleted when a descendant path is created and is not deleted.

Empty directory marker creation is inherent to S3A connector. Empty directory marker deletion behavior can be adjusted using the fs.s3a.directory.marker.retention = keep or delete key-value pair. For information about configuring the key-value pair, see Controlling the S3A Directory Marker Behavior.
OPSAPS-73655: Cloud replication fails after the delegation token is issued
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400
7.13.1.500, 7.13.2.0
HDFS and Hive external table replication policies from an on-premises cluster to cloud fail when the following conditions are true:
  1. You choose the Advanced Options > Delete Policy > Delete Permanently option during the replication policy creation process.
  2. Incremental replication is in progress, that is the source paths of the replication are snapshottable directories and the bootstrap replication run is complete.
None
OPSAPS-75090: Ozone replication policies fail without source proxy user
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, 7.13.1.400, 7.13.1.500
An Ozone replication policy with an empty Run on Peer as Username field (The default value for this field is empty) fails with the "java.io.IOException: Error acquiring writer for listing file "ofs://<service id>/user/om/.cm/distcp-staging/<timestamp>/fileList.seq": bucket name 'om' is too short, valid length is 3-63 characters". error message.
If you do not have a source proxy user name to specify in the Run on Peer as Username field, you can enter om as the default user for the replication on the source cluster.
OPSAPS-75994: Intermittent HBase replication failure because of missing result file
7.13.2
HBase replication policies fail intermittently during the Check if source tables exist step with the java.lang.IllegalArgumentException: argument "src" is null error message.
Delete and recreate the failed HBase replication policy.
OPSAPS-72125: The arguments field size exceeds the limit for Hive external table replications
7.13.2
When replicating a large number of Hive external tables using table filters to target clusters that use PostgreSQL for the Cloudera Manager database, the arguments field of the Hive Data Replication command might exceed the column limit. By default, the arguments column limit in the COMMANDS table is 1,048,676 characters. If the command exceeds the limit, Cloudera Manager cannot persist the command to the database.
Perform the following steps to mitigate this issue:
  1. When using table filters, split the policy into multiple chunks so that the Hive Data Replication command created by the chunked policies can be persisted.
  2. Increase the arguments column size of the COMMANDS table in the target Cloudera Manager database using the ALTER TABLE COMMANDS ALTER COLUMN arguments TYPE character varying(10485760); command. The maximum varchar column size is 10,485,760.
OPSAPS-73362: Temporary Ozone snapshots are not deleted automatically
7.13.2
Temporary snapshots used by Ozone incremental data replication for checking the target side changes are not deleted automatically in some error modes.
Currently, the temporary snapshots are generated and reside in the cm-tmp-[***RANDOM_UUID *** ] target bucket. These snapshots are deleted immediately after a snapshot-diff calculation. You can delete the snapshots manually only when no replication policy involving this bucket is actively running.
OPSAPS-73254, OPSAPS-73252: Editing a replication policy can set the user name to an empty string
7.13.2
On an unsecure (non-Kerberos) cluster, creating or editing a replication policy with an empty Run as Username or Run on Peer as Username field might cause the replication jobs to fail.
Use the Cloudera Manager API to update the fields to contain a null value instead of an empty string.
DMX-4681: Iceberg replication synchronization step fails for the database created at a custom location without an Ozone key
7.13.2
The synchronization step of the Iceberg replication command fails during bootstrap replication if you created the database in an Ozone bucket without providing the Ozone key name. The policy fails even if you have configured the Location Mapping field to map the correct Ozone buckets.
  • For existing databases or tables that you created without keys, enter the location mapping of the source and target om service IDs in the Location Mapping field in the Iceberg replication policy. For example, ofs://srcomid, ofs://tgtomid.
  • For new databases and tables, ensure that you provide a key when you create the database. For example, CREATE DATABASE db1 LOCATION ‘ofs://omid/volume1/bucket1/db1.db'; and CREATE EXTERNAL TABLE tb1 (id int) STORED BY ICEBERG LOCATION 'ofs://omid/volume1/bucket1/tb1’.
OPSAPS-76854: Cannot edit existing Iceberg replication policies after upgrade
7.13.2
You cannot edit the existing Iceberg replication policies in Replication Manager UI after you upgrade from Cloudera Manager 7.11.3 or 7.13.1 to 7.13.2.0.
You can use the Cloudera Manager API to view the policy details. To edit the replication policy, use the Cloudera Manager API, or delete and recreate the policy.
CDPD-63922, CDPD-95711: Atlas replication policies fail when the number of databases and tables exceed 100,000
7.13.2
When a composite replication policy targets more than 100,000 entities, for example, 100 databases containing 1,000 tables each, the following issues occur:
  • A Iceberg replication policy with Atlas metadata migration – The replication policy fails for both bootstrap and incremental jobs. However, the Cloudera Manager > Replication > Replication Policies page displays the replication policy Status as Successful, and the Atlas UI does not represent the expected entities.
  • A Hive external replication policy with Atlas metadata migration — The replication policy fails for 400 GB (40,000 entities), the Replication Policies page displays the replication policy Status as Failed, and the Atlas UI becomes unresponsive.
OPSAPS-74398: Ozone and HDFS replication policies might fail when you use different destination proxy user and source proxy user
7.13.2
HDFS on-premises to on-premises replication fails when the following conditions are true:
  • You configure different Run As Username and Run on Peer as Username during the replication policy creation process.
  • The user configured in Run As Username does not have the permission to access the source path on the source HDFS.

Ozone replication fails when the following conditions are true:

  • FSO-to-FSO replication or an OBS-to-OBS replication with Incremental with fallback to full file listing or Incremental only replication type.
  • You configured different Run As Username and Run on Peer as Username during the replication policy creation process.
  • The user configured in Run As Username does not have the permission to access the source bucket on the source Ozone.
Provide the same permissions to the user configured in Run As Username as the permissions of Run on Peer as Username on the source cluster.
OPSAPS-75361: Multiple policies do not start simultaneously
7.13.2
When multiple Atlas replication policies are scheduled to start at the same time, some policies might fail to initiate. For example, you schedule to run seven Atlas replication policies to run simultaneously, only three might start successfully. The remaining policies are not triggered, remain in a None state, and do not recur, which results in incomplete replication. The Replication Policies page displays None for these policies.
Do not schedule multiple Atlas replication policies to start at the same time. To avoid this issue, Replication Manager also ensures that a two-minute gap between two Atlas replication policies creation process is maintained to avoid this issue.
OPSAPS-76832, OPSAPS-70771: Running replication policy runs must not allow you to download the performance reports
7.13.2
During a replication policy run, the A server error has occurred. See Cloudera Manager server log for details error message appears on the Replication Manager UI, and the Cloudera Manager log shows java.lang.IllegalStateException: Command has no result data when you click:
  • Performance Reports > Performance Summary or Performance Reports > Performance Full on the Replication Policies page.
  • Download CSV on the Replication History page to download any report.

This is because the Replication Manager UI shows the performance report links as enabled and clickable which is incorrect. You can download the reports only after the replication job run is complete.

None
OPSAPS-76099: Incremental Iceberg replication time exceeds the bootstrap duration
7.13.2
The incremental Iceberg replication takes a longer time to complete as compared to bootstrap replication for Iceberg replication policies.
None
OPSAPS-75848: Composite Iceberg and Atlas replication duration takes 10xto 15x times more duration as compared to standalone Atlas replication
7.13.2
The Atlas replication takes up to 15x the time when run using the composite Iceberg replication as compared to standalone Atlas replication, though the Iceberg data gets replicated in the expected time.
None
OPSAPS-75853: The history entries display "Partial success" for successful composite replication for Iceberg and Atlas
7.13.2
The Cloudera Manager > Replication > Replication History page for a composite Iceberg replication policy displays Partial success even when both the Atlas and Iceberg replications were successful which is incorrect.
None