Operating System Requirements

This topic describes the operating system requirements for CDP Private Cloud Base. Azul OpenJDK, OpenJDK 8, OpenJDK 11, and OpenJDK 17 are TCK certified for Cloudera Data Platform.

CDP Private Cloud Base Supported Operating Systems

See the Cloudera Data Platform Support Matrix for detailed information about supported operating systems.

Operating System support for the CDP Private Cloud Base Trial Installer

SLES 15 SP4 is supported when using the Trial Installer (cloudera-manager-installer.bin) to install Cloudera Manager.

Important information about Cloudera Runtime and Cloudera Manager Supported Operating Systems

Cloudera Runtime provides parcels for select versions of RHEL-compatible operating systems.

CDP Private Cloud Base supported operating systems

Operating System Version
IBM PowerPC on RHEL
The following components are not supported:
  • Impala
  • Kudu
  • Ozone
  • Navigator Encrypt

Operating System and IBM PowerPC support matrix

This matrix explains the operating system supported on IBM PowerPC. There are two core configurations with CDP Private Cloud Base and different PowerPC version deployments:
  1. IBM PowerPC only and CDP Private Cloud Base
  2. IBM PowerPC CPU, IBM Spectrum Scale Storage, and CDP Private Cloud Base. This is a subset of what is supported generally on IBM PowerPC.
    IBM PowerPC Support Documentation
    PowerPC 8 and 9 generally without Spectrum Scale Storage https://www.ibm.com/docs/en/linux-on-systems?topic=lpo-supported-linux-distributions-virtualization-options-power8-power9-linux-power-systems
    PowerPC 10 generally without Spectrum Scale Storage https://www.ibm.com/docs/en/linux-on-systems?topic=lpo-supported-linux-distributions-virtualization-options-power10-linux-power-servers
    IBM Spectrum Scale Storage with CDP Private Cloud Base on x86 and PowerPC combinations https://www.ibm.com/docs/en/spectrum-scale-bda?topic=requirements-support-matrix

Software Dependencies

  • Python - Python dependencies for the different Cloudera Data Platform components is mentioned below:
    Cloudera Manager
    You must install Python 3.8 or 3.9 for RHEL 8 on all hosts before upgrading to Cloudera Manager 7.13.1.
    You must install Python 3.9 for RHEL 9 on all hosts before upgrading to Cloudera Manager 7.13.1.
    You must install Python 3.8 for Ubuntu 20 on all hosts before upgrading to Cloudera Manager 7.13.1.
    You must install Python 3.10 for SLES 15 or Ubuntu 22 on all hosts before upgrading to Cloudera Manager 7.13.1.
    Hue
    Hue supports Python 3.8 only on Ubuntu 20 in Cloudera Runtime 7.3.1.
    Hue supports Python 3.9 only on RHEL 8 and RHEL 9 in Cloudera Runtime 7.3.1.
    Hue supports Python 3.10 only on SLES 15 and Ubuntu 22 in Cloudera Runtime 7.3.1.
    Spark
    Spark 2.4 supports Python 2.7 and 3.4-3.7.
    Spark 2.4 supports Python 2.7 and 3.4-3.7.
    Spark 3.0 supports Python 2.7 and 3.4 and higher, although support for Python 2 and 3.4 to 3.5 is deprecated.
    Spark 3.1 supports Python 3.6 and higher.
    If the right level of Python is not picked up by default, set the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables to point to the correct Python executable before running the pyspark command.
    CDS (Cloudera Data Platform Distribution of Spark) 3.3 supports Python 3.7 and higher.
  • Perl - Cloudera Manager requires perl.
  • python-psycopg2 - Cloudera Manager 7 has a dependency on the package python-psycopg2. PostgreSQL-backed Hue in Runtime 7 requires a higher version of psycopg2 than is required by the Cloudera Manager dependency. For more information, see Installing the psycopg2 Python Package.
  • iproute package - CDP Private Cloud Base has a dependency on the iproute package. Any host that runs the Cloudera Manager Agent requires the package. The required version varies depending on the operating system:
    Table 1. iproute package
    Operating System iproute version
    RHEL iproute
    Ubuntu iproute2
    SLES iproute2
  • rpcbind package - CDP Private Cloud Base has a dependency on the rpcinfo command which is usually found in the rpcbind package. Any host that runs the Cloudera Manager Agent requires this package. The required version varies depending on the operating system.

Filesystem Requirements

The Hadoop Distributed File System (HDFS) is designed to run on top of an underlying filesystem in an operating system.

Supported Filesystems

Cloudera Data Platform recommends that you use either of the following filesystems tested on the supported operating systems:

  • ext3: This is the most tested underlying filesystem for HDFS.
  • ext4: This scalable extension of ext3 is supported in more recent Linux releases.
  • XFS: This is the default filesystem in RHEL.
  • S3: Amazon Simple Storage Service

Kudu Filesystem Requirements - Kudu is supported on ext4 and XFS. Kudu requires a kernel version and filesystem that supports hole punching. Hole punching is the use of the fallocate(2) system call with the FALLOC_FL_PUNCH_HOLE option set.

File Access Time

Linux filesystems keep metadata that record when each file was accessed. This means that even reads result in a write to the disk. To speed up file reads, Cloudera Data Platform recommends that you disable this option, called atime, using the noatime mount option in /etc/fstab:

/dev/sdb1 /data1 ext4 defaults,noatime 0

Apply the change without rebooting:

mount -o remount /data1

Filesystem Mount Options

The filesystem mount options have a sync option that allows you to write synchronously.

Using the sync filesystem mount option reduces performance for services that write data to disks, such as HDFS, YARN, Kafka and Kudu. In Cloudera Data Platform, most writes are already replicated. Therefore, synchronous writes to disk are unnecessary, expensive, and do not measurably improve stability.

NFS and NAS options are not supported for use as DataNode Data Directory mounts, even when using Hierarchical Storage features.

Cloudera Data Platform supports mounting /tmp with the noexec option. Mounting /tmp as a filesystem with the noexec option is sometimes done as an enhanced security measure to prevent the execution of files stored there.

Filesystem Requirements

You can control resource allocation for Cloudera Manager and Cloudera Runtime services (nproc, nofile, etc) from /etc/security/limits.conf, and through init scripts on traditional SysV Init systems. However, on systems using systemd the limits either needs to be set in the service’s unit file, or in /etc/systemd/system.conf, or in files present under /etc/systemd/system.conf.d/*. This is due to a known limitation with systemd as it does not use PAM login sessions (pam_limits.so) for daemon services, thereby ignoring the limits defined in /etc/security/limits.conf. Both Cloudera Manager Agent and Supervisord (responsible for starting Cloudera Runtime services) are daemonised during system initialisation.

You can perform either of the following steps to modify the resource limit:
  1. For system-wide change, uncomment the process properties from /etc/systemd/system.conf, or create an override.conf under /etc/systemd/system.conf.d/. This requires a *nix system reboot for the changes to take effect. For more information, see Limits.conf.
  2. To apply custom limits on Cloudera Runtime services, add the required process properties to the [Service] section in /usr/lib/systemd/system/cloudera-scm-supervisord.service.
    For instance, to customise the number of child processes a process can fork. You can set the property as follows:
    LimitNPROC=<value>
    Then reload the configuration by running the following command for the limits to be applied in the subsequent service restarts:
    # systemctl daemon-reload

    Here are the list of available process properties.

nscd for Kudu

Although not a strict requirement, it's highly recommended that you use nscd to cache both DNS name resolution and static name resolution for Kudu.

Configuring system level operating system

Cloudera Data Platform recommends you to set up the following configurations:
  • Disabling Transparent Hugepages (THP)
  • vm.swappiness Linux Kernel Parameter

For setting these configurations, see Disabling Transparent Hugepages (THP) and Setting the vm.swappiness Linux Kernel Parameter.