Deploy Cloudera using Terraform

This guide demonstrates how to deploy Cloudera on AWS, Azure, or GCP by using one of the Cloudera deployment templates.

The templates use Terraform, an open source Infrastructure as Code (IaC) software tool for defining and managing cloud or data center infrastructure. You interface the templates via a simple configuration file residing in a GitHub repository.

For an overview of best practices for deploying Cloudera, refer to Creating and managing Cloudera deployments.

Prerequisites

Before deploying Cloudera, you should make sure that your cloud account meets the basic requirements and that you've installed a few prerequisites.

To meet these requirements and install the prerequisites, refer to the following documentation:You should also familiarize yourself with the background information about Cloudera deployment patterns and deployment pattern definitions described in Creating and managing Cloudera deployments.

Next, you can follow the instructions below for deploying Cloudera.

Deploy Cloudera

Setting up a Cloudera deployment involves cloning a GitHub repository, editing the configuration, and running Terraform commands to launch the deployment.

Step 1: Clone the repository

The cdp-tf-quickstarts repository contains Terraform resource files to quickly deploy Cloudera on cloud and associated pre-requisite cloud resources. It uses the Cloudera Terraform Modules provided by Cloudera to do this.

Clone this repository and navigate to the local directory with the cloned repository:

git clone https://github.com/cloudera-labs/cdp-tf-quickstarts.git
cd cdp-tf-quickstarts

Step 2: Edit the configuration file for the required cloud provider

In the cloned repository, change to the required cloud provider directory - AWS, Azure, or GCP.

Edit the input variables in the configuration file as required:

cd aws
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
cd azure
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
cd gcp
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars

Following is a sample configuration file indicating the values to be changed. The variables are explained after the sample. You should review and update all the variables.

# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
aws_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eu-west-1

# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
azure_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eastus

# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
gcp_project = "<ENTER_VALUE>" # Change this to specify the GCP Project ID
gcp_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. europe-west2

# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>"  # Specify the deployment pattern below. Options are public, semi-private or private

As a result of this step, your configuration file should look similar to the following:

# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
aws_region = "eu-west-1" # Change this to specify Cloud Provider region, e.g. eu-west-1

# ------- Cloudera Environment Deployment -------
deployment_template = "public"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
azure_region = "westeurope" # Change this to specify Cloud Provider region, e.g. eastus

# ------- Cloudera Environment Deployment -------
deployment_template = "public"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "cdp-demo" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
gcp_project = "my-gcp-project-id" # Change this to specify the GCP Project ID
gcp_region = "us-central1" # Change this to specify Cloud Provider region, e.g. europe-west2

# ------- Cloudera Environment Deployment -------
deployment_template = "public"  # Specify the deployment pattern below. Options are public, semi-private or private

The following tables explain the mandatory inputs that need to be provided in the configuration file.

Table 1: Mandatory inputs

Input Description Default value
env_prefix A string prefix that will be used to name the cloud provider and Cloudera resources created. Not set
aws_region The AWS region in which the cloud prerequisites and Cloudera will be deployed. For example, eu-west-1. For a list of supported AWS regions, see Supported AWS regions. Not set
deployment_template

The selected deployment pattern. Values allowed:

private, semi-private and public.

public
Input Description Default value
env_prefix A string prefix that will be used to name the cloud provider and Cloudera resources created. Not set
azure_region The Azure region in which the cloud prerequisites and Cloudera will be deployed. For example, eastus. For a list of supported Azure regions, see Supported Azure regions. Not set
deployment_template

The selected deployment pattern. Values allowed:

private, semi-private and public.

public
Input Description Default value
gcp_region The GCP region in which the cloud prerequisites and Cloudera will be deployed. For example, eastus. For a list of supported GCP regions, see Supported GCP regions. Not set.
gcp_project GCP project ID that will be used for Cloudera. Not set.
env_prefix A string prefix that is used to name the created cloud provider and Cloudera resources. Not set.
deployment_template

The selected deployment pattern. Values allowed:

private, semi-private, public.

public

The following tables explain the optional inputs that can be added to the configuration file. While the mandatory input attributes are included in the configuration file and only their values need to be entered, optional attributes and values must be added manually.

Table 2: Optional inputs

Input Description Default value
aws_key_pair The name of an AWS keypair that exists in your account in the selected region. Not set
ingress_extra_cidrs_and_ports

Inbound access to the UI and API endpoints of your deployment will be allowed from the CIDRs (IP ranges) and ports specified here.

Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here.

CIDRs are not set.

Ports are set to 443, 22 by default.

create_vpc Flag to specify if the VPC should be created. true
cdp_vpc_id VPC ID for Cloudera environment. Required if create_vpc is false Empty string
cdp_public_subnet_ids List of public subnet ids. Required if create_vpc is false. Can be an empty list depending on deployment_template. Empty list
cdp_private_subnet_ids List of private subnet ids. Required if create_vpc is false. Empty list
private_network_extensions Enable creation of resources for connectivity to Cloudera Control Plane (public subnet and NAT Gateway) for Private Deployment. Only relevant for private deployment template. true
env_tags

Define environment-level tags for your resources, such as owner, project, and end date. For more information about custom tags, see the Defining custom tags documentation.

Using the owner, project, and end date example, define the environment-level tags as follows:

env_tags = {
  owner   = "<ENTER_VALUE>"
  project = "<ENTER_VALUE>"
  enddate = "<ENTER_VALUE>"
}
Not set
Input Description Default value
public_key_text An SSH public key string to be used for the nodes of the Cloudera environment. Not set
ingress_extra_cidrs_and_ports

Inbound access to the UI and API endpoints of your deployment will be allowed from the CIDRs (IP ranges) and ports specified here.

Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here.

CIDRs are not set.

Ports are set to 443, 22 by default.

create_vnet Flag to specify if the VNet should be created. true
cdp_resourcegroup_name Preexisting Azure resource group for Cloudera environment. Required if create_vnet is false. Empty string
cdp_vnet_name Preexisting VNet name for Cloudera environment. Required if create_vnet is false. Empty string
cdp_subnet_names List of preexisting subnet names for Cloudera resources. Required if create_vnet is false. Empty list
cdp_gw_subnet_names List of preeexisting subnet names for Cloudera Gateway. Required if create_vnet is false. Can be an empty list depending on deployment_template. Empty list
cdp_delegated_subnet_names List of preeexisting subnet names for Postgres flexible servers. Can be an empty list depending on deployment_template. Empty list
env_tags

Define environment-level tags for your resources, such as owner, project and end date. For more information about custom tags, see the Defining custom tags documentation.

Using the owner, project, and end date example, define the environment-level tags as follows:

env_tags = {
  owner   = "<ENTER_VALUE>"
  project = "<ENTER_VALUE>"
  enddate = "<ENTER_VALUE>"
}
Not set
Input Description Default value
create_vpc Flag to specify if the VPC should be created. True
cdp_vpc_name VPC name for Cloudera environment. Required if create_vpc is false. Empty string.
cdp_subnet_names List of subnet names for Cloudera resources. Required if create_vpc is false. Empty list.
public_key_text An SSH public key string used for the Cloudera environment nodes. If not specified, an SSH keypair is generated as part of the code. Not set.
ingress_extra_cidrs_and_ports

Inbound access to the UI and API endpoints of your deployment will be allowed from the CIDRs (IP ranges) and ports specified here.

Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here. If not specified, the public IP of the machine is looked up where the Terraform code is being executed.

CIDRs are not set.

Ports are set to 443 and 22 by default.

env_tags

Define environment-level tags for your resources, such as owner, project, and end date. For more information about custom tags, see the Defining custom tags documentation.

Using the owner, project, and end date example, define the environment-level tags as follows:

env_tags = {
  owner   = "<ENTER_VALUE>"
  project = "<ENTER_VALUE>"
  enddate = "<ENTER_VALUE>"
}
Not set.

Step 3: Launch the deployment

Run the Terraform commands to validate the configuration and launch the deployment:
terraform init
terraform apply

Terraform displays a plan with the list of cloud provider and Cloudera resources that will be created.

When you are prompted, type yes to instruct Terraform to perform the deployment. Typically, this takes about 60 minutes. Once the deployment is complete, Cloudera will print output similar to the following:

Apply complete! Resources: 46 added, 0 changed, 0 destroyed.

You can navigate to the Cloudera web interface at https://cdp.cloudera.com/ and see your deployment progress. Once the deployment completes, you can create Cloudera Data Hub clusters and data services.

Clean up the Cloudera environment and infrastructure

If you no longer need the infrastructure provisioned by Terraform, run the following command to remove the deployment infrastructure and terminate all resources:

terraform destroy