Additional resource requirements for Cloudera Data Engineering
For standalone Cloudera Data Engineering, Cloudera recommends three nodes (one master and two workers) with the following minimum memory, storage, and hardware requirements for each node:
Component | Minimum | Recommended |
---|---|---|
Node Count | 2 | 4 |
CPU | 24 cores for CDE workspace (base and virtual cluster) and 12 cores for workload | 24 cores for CDE workspace (base and virtual cluster) and 32 cores (you can extend this depending upon the workload size) |
Memory | 64 GB for CDE workspace (base and virtual cluster) and 32 GB (you can extend this depending upon the workload size) | 64 GB for CDE workspace (base and virtual cluster) and 64 GB (you can extend this depending upon the workload size) |
Storage | 700 GB block storage | 700 GB block storage |
Network Bandwidth | 1 GB/s to all nodes and base cluster | 10 GB/s to all nodes and base cluster |
Cloudera Data Engineering Service and Virtual Cluster requirements
- Cloudera Data Engineering Service requirements: Overall for a Cloudera Data Engineering service, it requires 110 GB Block PV or NFS PV, 10 CPU cores,
and 30 GB memory.
Table 1. The following are the Cloudera Data Engineering Service requirements: Component vCPU Memory Block PV or NFS PV Number of replicas Embedded DB 4100 m 9 GB 100 GB 1 Admission Controller 250 m 512 MB -- 1 Config Manager 500 m 1 GB -- 2 Authz 1100 m 2 GB -- 1 Dex Downloads 350 m 1.5 GB -- 1 Knox 350 m 2 GB -- 1 Management API 1100 m 3 GB -- 1 NGINX Ingress Controller 200 m 1114 MB -- 1 Tgt Generator 100 m 1 GB -- 1 FluentD Forwarder 250 m 512 MB -- 1 to 5 Grafana 350 m 1.5 GB 10 GB 1 Keytab Management 350 m 512 MB -- 1 Data Connector 350 m 1.5 GB -- 1 Total 9350 m 28.71 GB 110 GB - Cloudera Data Engineering Virtual Cluster requirements:
- For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 7 CPU cores, and 26 GB per virtual cluster.
- For Spark 2: If you are using Spark 2, you need additional 600 m CPU, 5.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 8 CPU cores, and 32 GB per virtual cluster.
Table 2. The following are the Cloudera Data Engineering Virtual Cluster requirements for Spark 3: Component vCPU Memory Block PV or NFS PV Number of replicas Airflow API 450 m 1636 MB 100 GB 1 Airflow Scheduler 1100 m 2560 MB 100 GB 1 Airflow Web 350 m 1.5 GB -- 1 Runtime API 750 m 1.5 GB 100 GB 1 Livy 3100 m 14 GB 100 GB 1 SHS 350 m 1.5 GB -- 1 Pipelines 350 m 1.5 GB -- 1 Total 6450 m 25.1 GB 400 GB - Workloads: Depending upon the workload, you must configure resources.
- The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
- In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
- The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
- In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
- Minimal Airflow jobs need 200 m CPU and 328 MB memory per Airflow worker.