Operating System Requirements
This topic describes the operating system requirements for Cloudera Private Cloud Base. Azul OpenJDK, OpenJDK 8, OpenJDK 11, and OpenJDK 17 are TCK certified for CDP.
Cloudera Private Cloud Base Supported Operating Systems
Please see the Cloudera Support Matrix for detailed information about supported operating systems.
Operating System support for the Cloudera Private Cloud Base Trial Installer
cloudera-manager-installer.bin
) to install Cloudera Manager.Important information about Runtime and Cloudera Manager Supported Operating Systems
Runtime provides parcels for select versions of RHEL-compatible operating systems.
CDP Private Cloud Base supported operating systems
Operating System | Version |
---|---|
IBM PowerPC on RHEL |
The following components are not supported:
|
Operating System and IBM PowerPC support matrix
- IBM PowerPC only and CDP Private Cloud Base
- IBM PowerPC CPU, IBM Spectrum Scale Storage, and CDP Private Cloud Base. This is a
subset of what is supported generally on IBM PowerPC.
IBM PowerPC Support Documentation PowerPC 8 and 9 generally without Spectrum Scale Storage https://www.ibm.com/docs/en/linux-on-systems?topic=lpo-supported-linux-distributions-virtualization-options-power8-power9-linux-power-systems PowerPC 10 generally without Spectrum Scale Storage https://www.ibm.com/docs/en/linux-on-systems?topic=lpo-supported-linux-distributions-virtualization-options-power10-linux-power-servers IBM Spectrum Scale Storage with CDP Private Cloud Base on x86 and PowerPC combinations https://www.ibm.com/docs/en/spectrum-scale-bda?topic=requirements-support-matrix
Software Dependencies
- Python - Python dependencies for the different CDP components is mentioned below:
- Cloudera Manager
- You must install Python 3.8 or 3.9 for RHEL 8 on all hosts before upgrading to Cloudera Manager 7.13.1.
- Hue
- Hue supports Python 3.8 only on Ubuntu 20 in Cloudera Runtime 7.3.1.
- Spark
- Spark 2.4 supports Python 2.7 and 3.4-3.7.
- Perl - Cloudera Manager requires perl.
- python-psycopg2 - Cloudera Manager 7 has a dependency on the package
python-psycopg2
. PostgreSQL-backed Hue in Runtime 7 requires a higher version ofpsycopg2
than is required by the Cloudera Manager dependency. For more information, see Installing thepsycopg2
Python Package. - iproute package - Cloudera Private Cloud Base has a dependency
on the
iproute
package. Any host that runs the Cloudera Manager Agent requires the package. The required version varies depending on the operating system:Table 1. iproute package Operating System iproute version RHEL iproute Ubuntu iproute2 SLES iproute2 - rpcbind package - CDP Private Cloud Base has a dependency on the rpcinfo command which is usually found in the rpcbind package. Any host that runs the Cloudera Manager Agent requires this package. The required version varies depending on the operating system.
Filesystem Requirements
Supported Filesystems
The Hadoop Distributed File System (HDFS) is designed to run on top of an underlying filesystem in an operating system. Cloudera recommends that you use either of the following filesystems tested on the supported operating systems:
- ext3: This is the most tested underlying filesystem for HDFS.
- ext4: This scalable extension of ext3 is supported in more recent Linux releases.
- XFS: This is the default filesystem in RHEL 7.
- S3: Amazon Simple Storage Service
Kudu Filesystem Requirements - Kudu is supported on ext4 and
XFS. Kudu requires a kernel version and filesystem that supports hole
punching. Hole punching is the use of the
fallocate(2)
system call with the
FALLOC_FL_PUNCH_HOLE
option set.
File Access Time
Linux filesystems keep metadata that record when each file was
accessed. This means that even reads result in a write to the disk. To
speed up file reads, Cloudera recommends that you disable this option,
called atime
, using the noatime
mount
option in /etc/fstab
:
/dev/sdb1 /data1 ext4 defaults,noatime 0
Apply the change without rebooting:
mount -o remount /data1
Filesystem Mount Options
The filesystem mount
options have a
sync
option that allows you to write
synchronously.
Using the sync
filesystem mount option reduces
performance for services that write data to disks, such as HDFS, YARN,
Kafka and Kudu. In CDP, most writes are already replicated. Therefore,
synchronous writes to disk are unnecessary, expensive, and do not
measurably improve stability.
NFS and NAS options are not supported for use as DataNode Data Directory mounts, even when using Hierarchical Storage features.
Cloudera supports mounting /tmp
with the noexec
option.
Mounting /tmp
as a filesystem with the noexec
option is
sometimes done as an enhanced security measure to prevent the execution of files stored
there.
Filesystem Requirements
You can control resource allocation for Cloudera Manager and CDP Runtime services
(nproc
, nofile
, etc) from
/etc/security/limits.conf, and through init
scripts on traditional SysV Init systems. However, on systems using systemd
the limits either needs to be set in the service’s unit file, or in
/etc/systemd/system.conf, or in files present under
/etc/systemd/system.conf.d/*. This is due to a known limitation with
systemd
as it does not use PAM login sessions
(pam_limits.so
) for daemon services, thereby ignoring the limits defined
in /etc/security/limits.conf. Both Cloudera Manager Agent and
Supervisord
(responsible for starting CDP Runtime services) are
daemonised during system initialisation.
- For system-wide change, uncomment the process properties from
/etc/systemd/system.conf, or create an
override.conf
under /etc/systemd/system.conf.d/. This requires a *nix system reboot for the changes to take effect. For more information, see Limits.conf. - To apply custom limits on CDP Runtime services, add the required process properties to
the [Service] section in
/usr/lib/systemd/system/cloudera-scm-supervisord.service.For instance, to customise the number of child processes a process can fork. You can set the property as follows:
LimitNPROC=<value>
Then reload the configuration by running the following command for the limits to be applied in the subsequent service restarts:# systemctl daemon-reload
Here are the list of available process properties.
nscd for Kudu
Although not a strict requirement, it's highly recommended that you use
nscd
to cache both DNS name resolution and static name resolution for
Kudu.
Configuring system level operating system
- Disabling Transparent Hugepages (THP)
- vm.swappiness Linux Kernel Parameter
For setting these configurations, see Disabling Transparent Hugepages (THP) and Setting the vm.swappiness Linux Kernel Parameter.