Cloudera Data Services on premises Software Requirements
This release ships with Cloudera Manager 7.11.3 CHF 11. If you have an existing Cloudera Base on premises cluster set up using an earlier version of Cloudera Manager, you must first upgrade Cloudera Manager to version 7.11.3 CHF 11.
For more information about specific software requirments, see the Software Support Matrix for Cloudera Embedded Container Service.
Additionally, you must perform the following:
- For Cloudera AI, you must install nfs-utils in order to mount longhorn-nfs provisioned mounts. The nfs-utils package is required on every node of the Cloudera Embedded Container Service cluster. Run this command yum install nfs-utils to install nfs-utils.
- If you have nodes with GPU, ensure that the GPU hosts have nVidia Drivers and
nvidia-container-runtime installed. You must confirm that drivers are properly
loaded on the host by executing the command nvidia-smi. You must
also install the
nvidia-container-toolkit
package. - You must have a minimum of one agent node for Cloudera Embedded Container Service.
- Set up Kerberos on these clusters using an Active Directory.
- Enable TLS on the Cloudera Manager cluster for communication with components and services.
- If you do not have entitlements, contact your Cloudera account team to get the necessary entitlements.
- The default docker service uses /docker folder. Whether you wish to retain /docker or override /docker with any other folder, you must have a minimum of 300 GiB free space.
- Create the folder before the start of the installation. For example: mkdir /ecs/docker.
- Ensure that all of the hosts in the Cloudera Embedded Container Service cluster have
more than 300 GiB of free space in the
/var/lib
directory at the time of installation. - The cluster generates multiple hosts and host based routing is used in the cluster in order to route it to the right service. You must decide on a domain for the services which Cloudera Manager by default points to one of the host names on the cluster. However, during the installation, you should check the default domain and override the default domain (only if necessary) with what you plan to use as the domain. To override, create an A record with a wildcard. For Example: *.apps.APPDOMAIN
- You must install nvidia-container-toolkit. (nvidia-container-runtime migrated to nvidia-container-toolkit , see Migration Notice.) The steps for this are shown in the NVIDIA Installation Guide. If using Red Hat Enterprise Linux (RHEL), use dnf to install the package. For an example with RHEL 8.7, see Installing the NVIDIA Container Toolkit.
- Python 3.8 is required for Cloudera Manager version 7.11.3.0 and higher versions. Cloudera Manager agents will not start unless Python 3.8 is installed on the cluster nodes.
Modifying Access Control Lists (ACLs) for any Rancher or Kubernetes-related directories is strictly prohibited as it can cause permission issues, service failures, or security vulnerabilities. Unauthorized ACL changes may lead to:
- Failure of Rancher services to start properly.
- Kubernetes components encountering permission errors.
- Issues with upgrades, backups, or cluster operations.
Affected Directories
Below are the key Rancher and Kubernetes directories that must not have their ACLs modified:
- /var/lib/rancher/ – Contains Rancher cluster data, configurations, and metadata.
- /etc/rancher/ – Stores Rancher configuration files, certificates, and settings.
- /var/log/rancher/ – Logs generated by Rancher services.
- /var/lib/kubelet/ – Stores node-level Kubernetes configurations and data.
- /etc/kubernetes/ – Holds Kubernetes API server, controller manager, and scheduler configurations.
- /var/lib/etcd/ – Contains the etcd database, critical for cluster state management.
- /var/log/pods/ – Stores logs for Kubernetes pods.
- /var/run/secrets/kubernetes.io/ – Used for service account authentication and tokens.
Best Practices
- Ensure that these directories maintain default ownership and permissions as configured by Rancher/Kubernetes.
- For troubleshooting, rely on logs and built-in diagnostics rather than altering file permissions.
By following these guidelines, you can avoid unexpected permission issues and maintain a stable and secure Rancher/Kubernetes environment.