Known Issues in Cloudera Data Services on premises 1.5.4 SP2
The following are the known issues in the 1.5.4 SP2 release of Cloudera Data Services on premises.
- DWX-20809: Cloudera Data Services on premises installations on RHEL 8.9 or lower versions may encounter issues
- You may notice issues when installing Cloudera Data Services on premises on Cloudera Embedded Container Service
clusters running on RHEL 8.9 or lower versions. Pod crashloops are noticed with the
following
error:
The issue is due to a memory leak with 'seccomp' (Secure Computing Mode) in the Linux kernel. If your kernel version is not on 6.2 or higher verisons or if it is not part of the list of versions mentioned here, you may face issues during installation.Warning FailedCreatePodSandBox 1s (x2 over 4s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown
- OPSX-5776 and OPSX-5747: ECS - Some of the rke2-canal DaemonSet pods in the kube-system namespace are stuck in Init state causing longhorn volume attach issues
- In a few cases, after upgrading from Cloudera Data Services on premises 1.5.3 or 1.5.3-CHF1 to 1.5.4 SP2, a pod that belongs to rke2-canal DaemonSet is stuck in Init status. This causes some pods in kube-system and longhorn-system namespaces to be in Init or CrashLoopBackOff status. This manifests as volume attach failure in the embedded-db-0 pod in CDP namespace, and causes some pods in CDP namespace to be in CrashLoopBackOff state.
- OPSX-5903: 1.5.3 to 1.5.4 SP2 ECS - The upgrade fails when the rke2-ingress-nginx-controller system exceeds its progress deadline.
- Upgrade fails while running the following command:
kubectl rollout status deployment/rke2-ingress-nginx-controller -n kube-system --timeout=5m
- OPSAPS-72270: Start ECS command fails on uncordon nodes step
-
In an ECS HA cluster sometimes, the server node restarts during start up. This causes the uncordon step to fail.
- OPSAPS-72964 and OPSAPS-72769: Unseal Vault command fails after restarting the ECS service
- Unseal Vault command fails sometimes, after restarting the ECS service.
- OPSX-5986: ECS fresh install failing with helm-install-rke2-ingress-nginx pod failing to come into Completed state
-
ECS fresh install fails at the "Execute command Reapply All Settings to Cluster on service ECS" step due to a timeout waiting for helm-install. To confirm the issue, run the following kubectl command on the ECS server host to check if the pod is stuck in a running state:
kubectl get pods -n kube-system | grep helm-install-rke2-ingress-nginx