Skip to main content

Upgrade cluster components

Cluster components are application layer components that are installed in a cluster such as prometheus, external-dns, opa, certmanager etc.

Components are configured as terraform modules and are called from cloud-platform-infrastructure repo with a release tag.

Planning

When you start working on upgrading any cloud-platform-components ticket:

  • Check which chart versions are available to upgrade to
  • Check whether the component is upgradeable to that chart version from the current one (some major versions cannot be skipped)
  • Check the release notes of the component for any breaking changes
  • Check and add notes from the upgrading process mentioned in the original github repository related the component (if any)
  • Check and add notes of every CHANGELOG.md of the component between the current chart version installed in your cluster to the chart version you want to upgrade to
  • Review the CHANGELOG notes with another member of the team (check for breaking changes, deprecations, change to values file and suggested plan for upgrading the production clusters live-1, eks-manager and live)

Testing the upgrade in a test cluster

Setup environment

Setup the environment variables listed in example.env.create-cluster in the cloud-platform-infrastructure repo.

Create test cluster

Run the create-cluster concourse job to create a test cluster.

Run a shell in the tools image

The cloud platform tools image has all the software required to update a cluster.

From a local copy of the cloud-platform-infrastructure repo, run the following command:

make tools-shell

Authenticate to the test cluster

Create the file ~/.kube/config in your tools-image container by running:

aws eks --region eu-west-2 update-kubeconfig --name <cluster-name>

Run the integration tests

This will ensure the test cluster does not have any existing issues and is ready to use.

To run Go tests:

make run-tests

Testing the upgrade

  1. Make the changes required to the module. For example, for upgrading the cert-manager, change the cert-manager terraform module. This might include

    • The helm chart version
    • changes to the values file (in needed)
  2. Push changes to a branch(upgrade) of the module

  3. Update the local copy of the cloud-platform-infrastructure repo with the branch reference. For cert-manager module, the code would change to

    source = "github.com/ministryofjustice/cloud-platform-terraform-certmanager?ref=upgrade"
    
  4. Do a terraform plan for the changes, verify whether the changes are correct and do terraform apply to apply the changes

  5. Check the things to observe section for specific components

  6. Run the integration tests again

  7. Once the testing is complete and integration tests are passed, create a PR to be reviewed by the team and have the module unit tests passed. After the PR is approved, merge the changes to the main branch of the module and make a release.

  8. Change the module release tag in the eks/components folder of cloud-platform-infrastructure repo and raise a PR. Verify the terraform plan from the cloud-platform-infrastructure plan pipeline and get it reviewed by the team.

  9. Once approved, merge the PR and monitor the cloud-platform-infrastructure apply pipeline when applying the changes.

  10. Run the reporting tests in concourse to ensure live/manager/live-2 are working as expected.

Things to observe when testing the upgrade

The below are some of the general things to check when during the upgrade and not a complete list.

Performing the upgrade

There is a cli command to perform the upgrade. First navigate to the cloud-platform-infrastructure repo and run the following command:

cloud-platform environments bump-module --module <module-name> --module-version <version>

The module-name flag must contain a word in the module source. For example, if you were to upgrade the cert-manager module to 0.5.0, you would run the following command:

cloud-platform environment bump-module --module certmanager --module-version 0.5.0

Cert manager

  • The CRDs for cert-manager do not get deleted
  • The existing certificates do not change in any way. This can be done by checking the timestamps of certificate creation
  • Able to create and validate new certificates

Prometheus

  • Ensure correct CRDs versions of prometheus-operator are updated before upgrading prometheus-operator
  • The existing PrometheusRules and ServiceMonitors do not change in any way. This can be done by checking the timestamps of those resourse creation
  • Able to query prometheus for metrics

Logs

Compare the logs for the updated component/addon between test cluster and live to ensure there are no additional warnings or errors.

Compare environment variables

Check the components/addon deamonset to ensure that the environment variables match.

This page was last reviewed on 18 December 2023. It needs to be reviewed again on 18 June 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 18 June 2024 by the page owner #cloud-platform. This might mean the content is out of date.