Cloudera Data Engineering hardware requirements

Review the requirements needed to get started with the Cloudera Data Engineering service on Red Hat OpenShift.

Requirements

Cloudera Data Engineering assumes it has cluster-admin privileges on the OpenShift cluster.
Openshift cluster should be configured with route admission policy set to namespaceOwnership: InterNamespaceAllowed. This allows Openshift cluster to run applications in multiple namespaces with the same domain name. The
```
oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":
{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge 
```

Cloudera Data Engineering Service requirements: Overall for a Cloudera Data Engineering service, it requires 110 GB Block PV or NFS PV, 9 CPU cores, and 18 GB memory. The following are the Cloudera Data Engineering Service requirements:

Table 1. Cloudera Data Engineering Service requirements
Component	vCPU	Memory	Block PV or NFS PV	Number of replicas
Embedded DB	4	8 GB	100 GB	1
Admission Controller	250 m	512 MB	--	1
Config Manager	500 m	1 GB	--	2
Dex Downloads	250 m	512 MB	--	1
Knox	250 m	1 GB	--	1
Management API	1	2 GB	--	1
NGINX Ingress Controller	100 m	90 MB	--	1
Tgt Generator	350 m	630 MB	--	1
FluentD Forwarder	250 m	512 MB	--	1 to 5
Grafana	250 m	512 MB	10 GB	1
Keytab Management	250 m	512 MB	--	1
Data Connector	250 m	512 MB	--	1
Total	8600 m	17.71 GB	110 GB

Cloudera Data Engineering Virtual Cluster requirements:

For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 5.35 CPU cores, and 15.6 GB per virtual cluster.
For Spark 2: If you are using Spark 2, you need additional 500 m CPU, 4.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 5.85 CPU cores, and 20.1 GB per virtual cluster.

The following are the Cloudera Data Engineering Virtual Cluster requirements for Spark 3:

Table 2. Cloudera Data Engineering Virtual Cluster requirements for Spark 3
Component	vCPU	Memory	Block PV or NFS PV	Number of replicas
Airflow API	350 m	612 MB	100 GB	1
Airflow Scheduler	1	1 GB	100 GB	1
Airflow Web	250 m	512 MB	--	1
Runtime API	250 m	512 MB	100 GB	1
Livy	3	12 GB	100 GB	1
SHS	250 m	1 GB		1
Pipelines	250 m	512 MB	--	1
Total	5350 m	16.1 GB	400 GB

important
The above requirements does not include workloads. See the below workload information on the additional resources based on workload.
Workloads: Depending upon the workload, you must configure resources.
- The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
- In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
- The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
- In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
- Minimal Airflow jobs need 100 m CPU and 200 MB memory per Airflow worker.