Known issues for Cloudera Data Services on premises 1.5.5 SP1
Learn about the known issues for Cloudera Data Services on premises, the impact or changes to the functionality, and the workaround.
Known issue identified in 1.5.5 SP1
- OPSAPS-70612: Invalid URL error while installing Cloudera Data Services on premises 1.5.5 SP1 from Cloudera Manager 7.13.1.501
- When attempting to install Cloudera Data Services on premises using a Cloudera Manager 7.13.1.501 hotfix, the installation may fail.
The installation fails to process the repository URL correctly in this specific scenario, resulting in an "Invalid URL" error, which blocks the installation.
- OPSX-6794 - Upgrading from Cloudera Data Services on premises 1.5.4 to 1.5.5 Service Pack 1 fails with a Upgrade embedded DB error message midway through the upgrade process.
- When attempting to upgrade from Cloudera Data Services on premises 1.5.4 to 1.5.5 Service Pack 1, the
upgrade process could fail.
If you have performed a manual Expand Volume operation on the
cdp-embedded-db-backendPersistent Volume Claim (PVC) of thecdp-embedded-db servicein cdp namespace, it can cause the upgrade process to fail. The upgrade workflow attempts to apply the original value that was used during the initial cluster set up. As Longhorn does not support reducing the volume size, the upgrade process fails.
- OPSX-6797 - A possible upgrade failure resulting from expanding
the
cdp-embedded-db volumefrom Longhorn UI after the initial installation of your Data Services on premises cluster - If you have expanded
cdp-embedded-db volumefrom Longhorn UI after the initial installation of your Data Services on premises cluster, you must complete the workaround steps before planning your upgrade to Cloudera Data Services on premises 1.5.5 Service Pack 1 to avoid a potential upgrade failure.
- COMPX-20437 - DB connection failures causing RPM and CAM pods to CrashLoopBackOff
- During an upgrade from version 1.5.5 to any 1.5.5 hotfix release, the cluster-access-manager (CAM) and resource-pool-manager (RPM) pods can enter a CrashLoopBackOff state if they are not automatically restarted during the upgrade.
- OBS-9491 - Prometheus configuration exceeds size limit in large environments
- In environments with a large number of namespaces (approximately 300 or more per environment), the Prometheus configuration for Cloudera Monitoring might exceed the 1 MB Kubernetes Secret size limit. If the total size, which depends on factors such as the number of namespaces, the length of namespace names, their variability, and the size of the certificate store, exceeds 1 MB, the new Prometheus configuration will not be applied, and new namespaces will not be monitored. As a result, the telemetry data will not be collected from those namespaces and will not be reflected on the corresponding Grafana charts.
- OPSX-6618 - In Cloudera Embedded Container Service upgrade not all volumes are upgraded to the latest longhorn version
- During restart of the Cloudera Embedded Container Service cluster from 1.5.5 to 1.5.5 SP1, the upgrade failed due to longhorn health issues. This is because one of the volumes was degraded.
- COMPX-23842 / COMPX-24130/ COMPX-23319 - Pod status shown as OutOfcpu, OutOfmemory, or Pending after a cluster restart
-
During a cluster restart, due to an upgrade or normal maintenance, all nodes in the cluster are restarted. During this process, the cluster operates with reduced resource capacity. When this occurs, pod placement can be rejected, resulting in some pods entering OutOfcpu or OutOfmemory state.
- OPSX-6566 - Cloudera Embedded Container Service restart fails with etcd connectivity issues
- Restart of the Cloudera Embedded Container Service server fails with etcd error: "error reading from server"
- OPSX-6401 - Istio ingress-default-cert is not created in the upgrade scenario
- After upgrading to 1.5.5 SP1, the Secret
ingress-default-certis not created in theistio-ingressnamespace. Because this certificate is expected, failing to create it causes components like CAII & MR provisioning to fail.
- OPSX-6767 - Cloudera Embedded Container Service cluster has stale configuration after Cloudera Manager upgrade to 7.13.1.501-b2 from 7.11.3.24
- After upgrading Cloudera Manager to
version 7.13.1.501, the Cloudera Embedded Container Service shows a
staleness indicator. This occurs due to configuration
changes applied by the upgrade:
-
worker-shutdown-timeout: reduced from 24 hours (86,400 s) to 15 minutes (900 s). -
smon_host: a new monitoring configuration added. -
smon_port: a new monitoring port configuration (9997).
-
- OPSX-6638 - Post rolling restart many pods are stuck in pending state
- Pods remain in Pending state and fail to schedule with etcd performance warnings.
