There is a list of pre-upgrade checks that will run after the upgrade version has
been chosen. This checklist verifies if your cluster is ready for upgrade.
In the Cloudera Manager UI, under Getting Started for Upgrade Step in the
Upgrade wizard, select the repository URL for upgrade. A
pre-upgrade checklist will appear that verifies that the hosts and services in your
cluster are ready for the upgrade.
Click on Download Upgrade Validator,
this will download the upgrade validator onto all your ECS hosts and is
needed to run the Control Plane Health Check and Docker Registry Health
Check.
Once the Download Upgrade Validator is completed, the Control
Plane Health Check and Docker Registry Health Check will automatically
run.
Here is the pre-upgrade checklist:
Checklist
Description
Hosts check
This verifies the host health status, runs the host prerequisite
inspections, and host warning inspections.
Host Health Status- This check verifies that there are no
hosts in bad health or concerning health. It also checks for any
stopped roles on the hosts.
Host Prerequisites Inspections - These are host inspections
that must pass in order for you to proceed to upgrade. Currently the
prerequisite inspection includes:
EcsHostDnsInspection - Checks to make sure that there
are less than three nameserver entries in the
/etc/resolv.conf file, and checks
the connections to the Cloudera Manager cluster and the CDP
console. It also checks to see if
vault.localhost.localdomain's ping
can be resolved. If not, it is likely that the host
/etc/nswitch.conf file is
misconfigured.
If this inspection fails:
Check the /etc/resolv.conf
and /etc/nswitch.conf files
and ensure that
/etc/resolv.conf does not
contain three or more nameservers, and that
/etc/nswitch.conf must
contain myhostname under the
hosts field.
Check to see the connections are resolved
correctly. If the connection to the CDP console
fails, check to see if your DNS wildcard is
configured properly.
Host Warning Inspections - These are host inspections that
are used to detect potential factors that can cause issues during an
upgrade. Currently the warning inspections include:
SecuritySoftwareInspection - Checks to make sure that
there are no security software processes running on the
hosts in the cluster.
Upgrade Storage Inspection - Checks to make sure
there is at least 100 GB of free space under
/var/lib/ and 200 GB of free space
under the docker data directory.
Services Health Check
This verifies that there are no services in bad or concerning
health.
Download Upgrade Validator
This downloads the upgrade validator used to verify the control plane
and docker registry health checks onto all the hosts in the
cluster.
Control Plane Health Check
This verifies the control plane is in a healthy state before upgrade.
Here is the list of things it checks:
Longhorn Health Check: This verifies that all the
longhorn volumes are in a healthy, robust state. It also verifies
that PVCs are bound.
Longhorn Engine Check: This verifies that the
longhorn engine version matches the current longhorn manager
version.
RKE2 Health Check: This verifies that the
Kubernetes API server is reachable and the nodes are in a Ready and
Schedulable state.
Pod Readiness Health Check: This verifies that all
the pods in the kube-system, longhorn-system, vault-system,
yunikorn, k8tz, ecs-webhooks, and cdp namespaces are in a Ready
state.
Vault Health Check: This verifies that the vault-0
pod is running and the vault is unsealed.
Docker Registry Health Check
This verifies that the selected docker registry is ready for upgrade:
It verifies the connection to the docker registry by
pulling an image.
For custom registry setups, it will also verify that the
new required images stated in the manifest.json are present in your
registry before upgrade.