How to use the Cloudera Private Cloud Data Services sizing spreadsheet
You can use the sizing spreadsheet to model the hardware requirements for a Cloudera Private Cloud Data Services deployment.
Overview
The Cloudera Private Cloud Data Services Sizing spreadsheet is a spreadsheet that you can use to model the quantity and specifications for worker hosts required in a Cloudera Private Cloud Data Services deployment.
This spreadsheet is intended to use information about workloads you are planning to run and hardware specifications for worker nodes to arrive at an approximate number of worker nodes required for your deployment. Due to the complexity of estimating workloads, Cloudera recommends you review any sizing or purchasing decisions with Cloudera Professional Services before committing to those decisions.
How to access the spreadsheet
You can access the spreadsheet here: Cloudera Private Cloud Data Services Sizing. The file is in Microsoft Excel format. You can open the file in Excel, or upload it to Google Sheets.
There are three tabs in the spreadsheet. You will make your inputs only on the Worker Node Totals tab. Do not modify the following tabs (these tabs contain data used to calculate values in the spreadsheet and should not be modified):
-
Component Lookup
-
K8s Resources
Workload inputs
The spreadsheet calculates the total amount vcores, RAM, and storage required based on information you enter about the combined workloads you intend to deploy. Then based on the hardware specifications entered, calculates the number of worker nodes required, which is displayed in cell E24.
The following sections describe values you must enter into the spreadsheet. Values are required for each Data Service you intend to deploy, and values to enter for the hardware specifications for your worker nodes.
Cloudera Control Plane monitoring
Label | Cell | Description |
---|---|---|
Cloudera Control Plane Monitoring | B3 | Increment this number by one for each environment. |
Cloudera Data Warehouse
If you will deploy Cloudera Data Warehouse, on the Worker Node Totals tab, enter the following information:
Label | Cell | Description |
---|---|---|
CDW Data Catalog (min 1 per env) | B5 | Enter the number of Data Catalogs you will need in your deployment. You must have at least one Data Catalog. |
CDW LLAP warehouses | B6 | Enter the number of LLAP warehouses you will need for each Virtual Warehouse in your deployment. |
-- LLAP Executors | B7 | Enter the total number of LLAP Executors you will need in your deployment. |
CDW Impala warehouses | B8 | Enter the number of CDW Impala warehouses for each Virtual Warehouse you will need in your deployment. |
-- Impala Coordinators (2 x for HA) | B9 | Enter the number of Impala Warehouses you will need in your deployment. If you have enabled high availability, enter twice the number of Warehouses. |
-- Impala Executors | B10 | Enter the number of Impala Executors you will need in your deployment. |
CDW Cache |
B11 |
Enter the amount of CDW Cache space for each coordinator and executor (Default 600) |
Data Viz - small instances | B12 | Enter the size selected when creating a Data Visualization instance. |
Data Viz - medium instances | B13 | |
Data Viz - large instances | B14 |
Cloudera AI
Sizing for a Cloudera AI deployment depends on the number of concurrent jobs you expect to run and the number of Workspaces you provision.
Label | Cell | Description |
---|---|---|
CML Workspace (min of 1 ) | B16 | Enter the number of workspaces you need in your deployment. |
-- CML Small concurrent sessions | B17 | Enter the number of concurrent small-sized sessions you intend to run. |
-- CML Average concurrent sessions | B18 | Enter the number of concurrent average-sized sessions you intend to run. |
For more information about sizing the Cloudera AI service, see the following topics:
- Additional resource requirements for Cloudera AI.
- (OCP) Cloudera AI requirements
- (Cloudera Embedded Container Service) Cloudera AI requirements
Cloudera Data Engineering
Label | Cell | Description |
---|---|---|
CDE Service (min/max 1 per cluster) | B20 | Enter the number of Cloudera Data Engineering clusters you will need in your deployment. |
CDE Virtual Cluster | B21 | Enter the number of Cloudera Data Engineering Virtual Clusters you will need in your deployment. |
-- CDE Small concurrent jobs | B22 | Enter the number of concurrent small-sized jobs you intend to run. |
-- CDE Average concurrent jobs | B23 | Enter the number of concurrent average-sized jobs you intend to run. |
For more information about sizing the Cloudera Data Engineering service, see Additional resource requirements for Cloudera Data Engineering.
Worker node hardware specifications
Based on the inputs you supplied for your workloads, the spreadsheet totals the number of vcores, RAM, and storage required for the cluster in cells C20-C26. Then, based on the worker node hardware specifications you enter in cells B26-B29, divides the totals for vcores, RAM and storage by each of the worker node specifications to arrive at the required number of nodes for vcores, RAM and storage shown in cells D5-D29. The final number, in cell E27 chooses the higher value of these cells.
You may notice that the calculated values in cells D26 and D27 are different. This indicates that some nodes are oversubscribed for RAM or vcores. Adjust the hardware specifications for CPU and RAM until the two cells are closer together in value. Changing these values may also change the calculated number of worker nodes.
Label | Cell | Description |
---|---|---|
CPU recommend 40+ cores (80 vcores) | B27 | Enter the number of vcores for each worker node. |
RAM (GB) recommend 415 GB RAM | B28 | Enter the amount of RAM, in gigabytes, for each worker node. |
Disk (GB) Block (OCP CSI block, Cloudera Embedded Container Service Longhorn) | B29 |
Enter the number of gigabytes Block required for: - OpenShift Container Platform: CSI block - Cloudera Embedded Container Service: Cloudera Embedded Container Service Longhorn |
Disk (GB) Fast Cache for Cloudera Data Warehouse (nvme,ssd) | B30 | Enter the number of gigabytes of Fast Cache used in Cloudera Data Warehouse. |
Cloudera Control Plane Block Overhead per host (300 to 1024) | B31 | Enter the Control Plane block overhead |
NFS (GB) (choose 1 from below) | B33 | Enter required storage in either cell B34 or cell B35 |
-- Embedded nfs - (subtract from Block provider) non-prod | B34 | Enter the number of gigabytes storage for an embedded NFS. |
-- External nfs | B35 | Enter the number of gigabytes of storage for an External NFS. |
Cloudera Embedded Container Service Master Node requires
1 for non HA - 3 for HA If you are using the Cloudera Embedded Container Service, you will also need to provision a host for the Cloudera Embedded Container Service Master Node (a node running the ECS Server component). The values described here contain Cloudera’s recommendations for specifications for the Cloudera Embedded Container Service Master node. |
B38 |
Minimum: 16 vcores Recommended: 32 vcores |
B39 |
Minimum: 32 GB RAM Recommended: 64 GB RAM |
|
B40 |
Minimum: 300 GB HDD (This amount is adequate for a proof-of-concept cluster.) Recommended: 1 TB HDD |