Cloudera Data Engineering hardware requirements
Review the requirements needed to get started with the Cloudera Data Engineering service on Red Hat OpenShift.
Requirements
- Cloudera Data Engineering assumes it has cluster-admin privileges on the OpenShift cluster.
- Openshift cluster should be configured with route admission policy set to
namespaceOwnership: InterNamespaceAllowed. This allows Openshift cluster
to run applications in multiple namespaces with the same domain name. The
oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission": {"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
- Cloudera Data Engineering Service requirements: Overall for a Cloudera Data Engineering service, it requires 110 GB Block PV or NFS PV, 9 CPU cores,
and 18 GB memory. The following are the Cloudera Data Engineering Service
requirements:
Table 1. Cloudera Data Engineering Service requirements Component vCPU Memory Block PV or NFS PV Number of replicas Embedded DB 4 8 GB 100 GB 1 Admission Controller 250 m 512 MB -- 1 Config Manager 500 m 1 GB -- 2 Dex Downloads 250 m 512 MB -- 1 Knox 250 m 1 GB -- 1 Management API 1 2 GB -- 1 NGINX Ingress Controller 100 m 90 MB -- 1 Tgt Generator 350 m 630 MB -- 1 FluentD Forwarder 250 m 512 MB -- 1 to 5 Grafana 250 m 512 MB 10 GB 1 Keytab Management 250 m 512 MB -- 1 Data Connector 250 m 512 MB -- 1 Total 8600 m 17.71 GB 110 GB - Cloudera Data Engineering Virtual Cluster requirements:
- For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 5.35 CPU cores, and 15.6 GB per virtual cluster.
- For Spark 2: If you are using Spark 2, you need additional 500 m CPU, 4.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 5.85 CPU cores, and 20.1 GB per virtual cluster.
Table 2. Cloudera Data Engineering Virtual Cluster requirements for Spark 3 Component vCPU Memory Block PV or NFS PV Number of replicas Airflow API 350 m 612 MB 100 GB 1 Airflow Scheduler 1 1 GB 100 GB 1 Airflow Web 250 m 512 MB -- 1 Runtime API 250 m 512 MB 100 GB 1 Livy 3 12 GB 100 GB 1 SHS 250 m 1 GB 1 Pipelines 250 m 512 MB -- 1 Total 5350 m 16.1 GB 400 GB - Workloads: Depending upon the workload, you must configure resources.
- The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
- In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
- The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
- In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
- Minimal Airflow jobs need 100 m CPU and 200 MB memory per Airflow worker.