Installing Cloudera Data Services on premises using Cloudera Embedded Container Service

Install Cloudera Data Services on premises using the Cloudera Embedded Container Service.

  • In the Cloudera Manager WebUI, the remote_repo_override_user and remote_repo_override_password parameters must contain valid credentials for archive.cloudera.com before any upgrades or installation of Cloudera Embedded Container Service.
  • When deploying an Cloudera Embedded Container Service cluster, the batch size limitation for adding Cloudera Embedded Container Service agent nodes to Cloudera Embedded Container Service cluster is under 50. If the requirement is to deploy an Cloudera Embedded Container Service cluster with more than 50 nodes, Cloudera recommends starting the initial deployment with less than 50 nodes and incrementally add nodes to the cluster after the first installation succeeds.
  • Before configuring Cluster IP Range (cluster-cidr) and Service IP Range (service-cidr), you must review best practices at the Suse website. After your cluster is deployed, these values cannot change. Any misconfiguration will require decommissioning the cluster and redeploying it to correct the settings.
  1. If you are installing Cloudera Embedded Container Service on RHEL 8 or RHEL 9, perform the following steps:
    1. Run the following command to check if the nm-cloud-setup.service and nm-cloud-setup.timer services are enabled:
      systemctl status nm-cloud-setup.service nm-cloud-setup.timer
    2. If the nm-cloud-setup.service and nm-cloud-setup.timer services are enabled, disable them by running the following command on each host you added:
      systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
      For more information, see Known issues and limitations.
    3. IReboot the added hosts.
  2. In Cloudera Manager UI, click Data Services in the left navigation pane.
    Figure 1. Cloudera Manager Home page
  3. The Add Cloudera on Premises Containerized Cluster page is displayed. Click Continue
    .
    Figure 2. Add Cloudera on Premises Containerized Cluster page
  4. On the Getting Started page of the installation wizard, select Internet or Air Gapped as the Install Method.
    • Select the Internet installation method.
      • Click Custom Repository to use a custom repository link provided by Cloudera.
      Figure 3. Getting Started page with the Internet Install Method selected
      Figure 4. Getting Started page with the Air-gapped Install Method selected
    • Select the Air Gapped install option.
      1. Download everything under from https://archive.cloudera.com/p/cdp-pvc-ds/latest.

        wget -l 0 --recursive --no-parent -e robots=off -nH --cut-dirs=2 --reject="index.html*" -t 10 https://<username>:<password>@archive.cloudera.com/p/cdp-pvc-ds/latest/
        To download only the required images, use the following script:
        DSPATH=<https://<username>>:<password>@archive.cloudera.com/p/cdp-pvc-ds/latest
        IMGPATH=${DSPATH}/images/wget ${DSPATH}/manifest.json
        wget ${DSPATH}/cdp-private*.tgz
        wget -r -nc --no-parent -A * ${DSPATH}/parcels
        wget -r -nc --no-parent -A a*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A u*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A v*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A w*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A y*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A c*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A e*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A f*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A i*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A j*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A k*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A o*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A q*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A w*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A z*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A b*.tar.gz -R boltz2*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A d*.tar.gz -R deepseek*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A g*.tar.gz -R gpt-oss*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A h*.tar.gz -R hugging*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A l*.tar.gz -R llama*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A m*.tar.gz -R mix*.tar.gz,mis*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A n*.tar.gz -R nemo*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A p*.tar.gz -R paddle*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A r*.tar.gz -R riva-asr*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A s*.tar.gz -R starcoder2*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A t*.tar.gz -R triton*.tar.gz $IMGPATH
        wget -r -nc --no-parent -A *.tar.gz.asc $IMGPATH 
      2. Edit the manifest.json file in the downloaded directory. Change "http_url": "..." to "http_url": "http://your_local_repo/cdp-pvc-ds/latest".

      3. Mirror the downloaded directory to your local HTTP server, for example, to http://your_local_repo/cdp-pvc-ds/latest.

      4. Click Custom Repository and add http://your_local_repo/cdp-pvc-ds/latest as a custom repository.

      5. From the Select Repository drop-down menu select http://your_local_repo/cdp-pvc-ds/latest.
      Figure 5. Getting Started page with the Air Gapped Install Method selected

    Click Continue.

  5. On the Cluster Basics page, type a name in the Cluster Name field for the Cloudera on premises cluster that you want to create. From the Base Cluster drop-down list, select the cluster that has the storage and SDX services that you want this new Cloudera Data Services on premises instance to connect with. Click Continue.
    Figure 6. Cluster Basics page
  6. The Specify Hosts page displays hosts that are already added to Cloudera Manager. You can select one or more of these hosts to add to the Cloudera Embedded Container Service cluster.
    Figure 7. Specify Hosts page


    On the New Hosts tab you can specify one or more new hosts to add to Cloudera Manager. Enter a Fully Qualified Domain Name in the Hostname field, then click Search.

    After you finish specifying the Cloudera Embedded Container Service hosts, click Continue.

  7. On the Select JDK page, select one of the following options:
    1. Manually manage JDK
    2. Install a Cloudera-provided version of OpenJDK
    3. Install a system-provided version of OpenJDK
      Figure 8. Select JDK page
  8. On the Enter Login Credentials page, All hosts accept the same password as authentication method is selected by default. Enter the username in the SSH Username field, and the password in the Password and Confirm Password fields. You can also select the All hosts accept the same private key option as authentication method and provide the Private Key and passphrase.
    Figure 9. Enter Login Credentials page with the All hosts accept the same password option selected


  9. The Install Agents page displays a progress indicator as the agent packages are installed. When the installation completes, click Continue.
    Figure 10. Install Agents progress window


  10. On the Assign Roles page, customize the role assignments for your new Cloudera on premises containerized cluster then click Continue.
    Figure 11. Assign Roles page

    Single node Cloudera Embedded Container Service installation is supported, but is only intended to enable CDSW to Cloudera AI migration. If you are installing Cloudera Embedded Container Service on a single node, only the Docker and Cloudera Embedded Container Service Server roles are assigned. The Cloudera Embedded Container Service Agent role is not required for single node installation.

  11. Configure a Docker Repository.
    On the Configure Docker Repository page, select one of the following options:
    • Use an embedded Docker Repository

      If you select the Internet Install Method option on the Getting Started page, images are copied over the internet from the Cloudera repository.

      If you select the Air Gapped option, images are copied from a local http mirror you have set up in your environment.

      Select Default to deploy all of the default Docker images to the repository, or select Select the Optional Images to choose which images to deploy. If you will be deploying , toggle the switch on to copy the images for .

    • Use a custom Docker Repository

      You must enter the following options:
      • Custom Docker Repository – Enter the URL for your Docker Repository
      • Docker Username – Enter the username for the Docker Repository.
      • Docker Password – Enter the password for the Docker Repository.
      • Docker Certificate – Click the Choose File button to upload a TLS certificate to secure communications with the Docker Repository.
      Click the Generate the copy-docker script button to generate and download a script that copies the Docker images from Cloudera, or, for air-gapped installation, from a local http mirror in your network.
      Run the script from a machine that is running Docker locally and has access to the Docker images using the following commands:
      docker login [***URL for Docker Repository***] -u [***username of user with write access***]
      
      bash copy-docker.txt

      The copying operation might take 4-5 hours.

      Several options exist for configuring a Docker Repository. For more information about these options, see Docker repository access.

      The following tables show the ports that must be opened and allowed regardless of which Docker repository option you choose. For more information on the ports, see RKE2 Documentation.

      • Ports required for / agent (port 5000 is required for ):
        Protocol Port
        TCP 7180-7192
        TCP 19001
        TCP 5000
        TCP 9000
      • Inbound rules for Server nodes (Kubernetes/RKE2):
        Protocol Port
        TCP 9345
        TCP 6443
        UDP 8472
        TCP 9099
        UDP 51820
        UDP 51821
        TCP 10250
        TCP 2379
        TCP 2380
        TCP 2381
        TCP 30000-32767
      • Inbound Rules for the Agent (Kubernetes/RKE2):
        Protocol Port
        UDP 4789
        TCP 179
    • Cloudera default Docker Repository This option requires that cluster hosts have access to the internet and you have selected Internet as the install method.

      This option requires that you set up a Docker Repository in your environment and that all cluster hosts have connectivity to the repository.

  12. On the Configure Data Services page, modify configuration settings such as the data storage directory, or number of replicas. If multiple disks are mounted on each host with different characteristics (HDD and SSD), then Local Path Storage Directory must point to the path belonging to the optimal storage.
    1. Review your changes. If you want to specify a custom certificate, place the certificate and the private key in a specific location on the Cloudera Manager server host and specify the paths in the Ingress Controller TLS/SSL Server Certificate and Ingress Controller TLS/SSL Server Private Key File fields. This certificate will be copied to the Cloudera Control Plane during the installation process.

      The Ingress Controller TLS/SSL Server Certificate File (PEM Format) configuration value must only contain -----BEGIN CERTIFICATE----- through -----END CERTIFICATE-----, inclusive, for the server certificates. The value cannot include any preamble text and must not include a private key.

      The Ingress Controller TLS/SSL Server Private Key File (PEM Format) configuration value must only contain the unencrypted key, and only the header through the footer, with no preamble text.

      Both of these files must be readable by the cloudera-scm account.

      For information on the required entries that must be present in DNS and TLS certificates when not using wildcards, see No Wildcard DNS/TLS Setup.

    2. Configure the default Control Plane administrator login credentials that can be used for the first time once the installation is completed.
    3. Click Continue.
    Figure 12. Configure Data Services page
  13. On the Configure Databases page, click Continue.
    Figure 13. Configure Databases page


  14. On the Install Parcels page, the selected parcel is downloaded to the Cloudera Manager server host, distributed, unpacked, and activated on the Cloudera Embedded Container Service cluster hosts. Click Continue.
  15. If the hosts do not meet the prerequisites, the Check Prerequisites page displays the applicable issues. You cannot continue installation with failures. Correct the issues, then click Run Again. After all of the issues are resolved, click Continue.
    Table 1. Host prerequisite inspection reference
    Inspection name Description
    StorageInspection Checks for a minimum of 300 GiB space in the /var/lib and 350 GiB space in the docker data directories respectively. Checks if the /var/lib/longhorn directory or its parent directories are symlinked. If they are, this inspection fails.
    CPUInspection Checks to ensure that the hosts have 16 virtual cores.
    PortsInspection Checks for the availability of ports 443 and 80.
    ecsLonghornDedicatedDisk Checks if the Longhorn storage directory is on a dedicated disk and not the root filesystem.
    EcsHostDnsInspection Checks to ensure that less than 3 nameserver entries are in the /etc/resolv.conf file, and checks the connections to the Cloudera Manager cluster and the Cloudera Data Platform console. It also checks to see if the vault.localhost.localdomain ping can be resolved. If not, the host /etc/nsswitch.conf file might be misconfigured.

    If this inspection fails, perform the following steps:

    • Check the /etc/resolv.conf and /etc/nsswitch.conf files and ensure that the /etc/resolv.conf field does not contain three or more nameservers, and that the /etc/nsswitch.conf file contains the myhostname field under the hosts field.
    • Check to see if the connections were resolved correctly. If connection to the Cloudera Data Platform console fails, check to see if your DNS wildcard is configured properly.
    VersionInspection Checks that Java is installed and consistent among all Cloudera Embedded Container Service hosts.
    IPTablesInspection Checks that if the iptables command exists, rules are cleared. If the iptables command does not exist, iptables are installed during FirstRun so this inspection passes.

    If iptables are installed and the rules are not cleared, this inspection fails.

    For information on installing iptables, see Installing iptables on the new Cloudera Embedded Container Service control plane nodes.

    EcsCleanUpHostInspection Checks to ensure that the /var/lib/rancher and docker data directories do not contain any files.

    The EcsSystemConfigInspection check is part of the Host Prerequisites Inspections section of install or upgrade. This check must be fixed and cannot be bypassed to continue the installation or upgrade.

    To fix this issue temporarily, perform the following steps:
    1. Login to the affected host.
    2. Enter the following command:
      sysctl fs.inotify.max_user_instances=256
    For fixing this issue permanently, perform the following steps:
    1. Login to each affected host.
    2. Verify that the current configuration is 128 by running the following command:
      cat /proc/sys/fs/inotify/max_user_instances
    3. Edit the vi /etc/sysctl.conf file.
    4. Add the following contents to the end of the file and save:
      fs.inotify.max_user_instances=256
    5. Reload the configuration by using the following command:
      sudo sysctl -p
    6. Verify if the configuration is updated, by using the following command that is expected to return 256:
      cat /proc/sys/fs/inotify/max_user_instances
    Figure 14. Check Prerequisites page with no detected issues
  16. On the Inspect Cluster page, click Inspect Cluster and Inspect Network Performance to inspect your hosts and network performance . If the inspection tool displays any issues, fix them and click Run Again to rerun the inspections. After all of the issues are resolved, click Continue.
    Figure 15. Inspect Cluster page with the I understand the risks of not running the inspections or the detected issues, let me continue with cluster setup checkbox selected
  17. The Install Data Services page displays the Data Services installation progress. When the installation is complete, click Continue.
    Figure 16. Install Data Services page with Finished status
  18. When the installation is complete, the Summary page is displayed. Click Launch Cloudera on premises. You can also click Finish and then access the Data Services cluster from Cloudera Manager.
  19. Access your Cloudera Data Services on premises instance from Cloudera Manager. Click Data Services, then click Open Cloudera on premises for the applicable Data Services cluster.

If the installation fails, and you see the following error message in the stderr output during the Install Longhorn UI step, retry the installation by clicking the Resume button:

++ openssl passwd -stdin -apr1 + echo 'cm-longhorn:$apr1$gp2nrbtq$1KYPGI0QNlFJ2lo5sV62l0' + kubectl -n longhorn-system create secret generic basic-auth --from-file=auth + rm -f auth + kubectl -n longhorn-system apply -f /opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml Error from server (InternalError): error when creating "/opt/cloudera/cm-agent/service/ecs/longhorn-ingress.yaml": 
Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://rke2-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=10s": x509: certificate signed by unknown authority

In Cloudera Data Services on premises deployments using RKE2, container life cycle events, such as container mounts, start, or stop, are logged to the /var/log/messages file using systemd.

Because the Cloudera Data Services on premises installation is not OS-integrated, no systemd, rsyslogd, or logrotate configurations are delivered. As a result, these verbose messages are flooding the /var/log/messages file, creating disk space pressure on the /var directory, which might lead to stability or availability issues.

To avoid the flooding of container events in the /var/log/messages file, consider the following logging strategies:
  1. Configure the following OS-level logging options for Cloudera Data Services on premises, especially for Cloudera Embedded Container Service on RKE2:
    • A sample rsyslog.d configuration or journald filters to redirect container lifecycle logs to a separate file. For example, to /var/log/rancher-container-events.log.

    • Safety valve-based injection method, if feasible, or post-install script guidance for these OS-level logging configurations.

  2. Alternatively, filter or rate-limit verbose lifecycle logs at the container runtime layer, if possible.