Upgrade cluster components
Cluster components are application layer components that are installed in a cluster such as prometheus, external-dns, opa, certmanager etc.
Components are configured as terraform modules and are called from cloud-platform-infrastructure repo with a release tag.
When you start working on upgrading any cloud-platform-components ticket:
- Check which chart versions are available to upgrade to
- Check whether the component is upgradeable to that chart version from the current one (some major versions cannot be skipped)
- Check the release notes of the component for any breaking changes
- Check and add notes from the upgrading process mentioned in the original github repository related the component (if any)
- Check and add notes of every CHANGELOG.md of the component between the current chart version installed in your cluster to the chart version you want to upgrade to
- Review the CHANGELOG notes with another member of the team (check for breaking changes, deprecations, change to values file and suggested plan for upgrading the production clusters
Testing the upgrade in a test cluster
Your GPG key must be added to the cloud-platform-infrastructure repo so that you are able to run
You have the AWS CLI profile
moj-cpwith suitable credentials
You have docker installed
Run a shell in the tools image
The cloud platform tools image has all the software required to update a cluster.
From a local copy of the cloud-platform-infrastructure repo, run the following command:
Authenticate to the test cluster
Create the file
~/.kube/config in your tools-image container by running:
aws eks --region eu-west-2 update-kubeconfig --name <cluster-name>
Run the integration tests
This will ensure the test cluster does not have any existing issues and is ready to use.
To run Go tests:
Testing the upgrade
Make the changes required to the module. For example, for upgrading the cert-manager, change the cert-manager terraform module. This might include
- The helm chart version
- changes to the values file (in needed)
Push changes to a branch(upgrade) of the module
Update the local copy of the cloud-platform-infrastructure repo with the branch reference. For cert-manager module, the code would change to
source = "github.com/ministryofjustice/cloud-platform-terraform-certmanager?ref=upgrade"
terraform planfor the changes, verify whether the changes are correct and do
terraform applyto apply the changes
Check the things to observe section for specific components
Run the integration tests again
Once the testing is complete and integration tests are passed, create a PR to be reviewed by the team and have the module unit tests passed. After the PR is approved, merge the changes to the main branch of the module and make a release.
Change the module release tag in the eks/components folder of cloud-platform-infrastructure repo and raise a PR. Verify the terraform plan from the cloud-platform-infrastructure plan pipeline and get it reviewed by the team.
Once approved, merge the PR and monitor the cloud-platform-infrastructure apply pipeline when applying the changes.
Things to observe when testing the upgrade
The below are some of the general things to check when during the upgrade and not a complete list.
Performing the upgrade
There is a cli command to perform the upgrade. First navigate to the cloud-platform-infrastructure repo and run the following command:
cloud-platform environments bump-module --module <module-name> --module-version <version>
module-name flag must contain a word in the module source. For example, if you were to upgrade the cert-manager module to 0.5.0, you would run the following command:
cloud-platform environment bump-module --module certmanager --module-version 0.5.0
- The CRDs for cert-manager do not get deleted
- The existing certificates do not change in any way. This can be done by checking the timestamps of certificate creation
- Able to create and validate new certificates
- Ensure correct CRDs versions of prometheus-operator are updated before upgrading prometheus-operator
- The existing PrometheusRules and ServiceMonitors do not change in any way. This can be done by checking the timestamps of those resourse creation
- Able to query prometheus for metrics