Skip to main content

Upgrade EKS cluster

The Cloud Platform EKS cluster upgrade consists of three distinct parts:

  • Upgrade EKS Terraform Module
  • Upgrade EKS version (Control Plane and Node Groups)
  • Upgrade addon(s)

The Cloud Platform EKS clusters are created using the official terraform-aws-eks module. The EKS version and addons are currently independent of the version of the terraform-aws-eks module. Therefore, it will not always require an upgrade of the terraform-aws-eks module and/or the addons whenever there is an upgrade of the EKS version. Please check the changelogs for the terraform-aws-eks module, the EKS version and the addons when planning an upgrade.

Run the upgrade, via the tools image

The cloud platform [tools image] has all the software required to run the upgrade.

Start from the root directory of a working copy of the [infrastructure repo].

With your environment variables set, launch a bash shell on the tools image:

make tools-shell

Pre-requisites

Before you begin, there are a few pre-requisites:

  • Your GPG key must be added to the infrastructure repo so that you can run git-crypt unlock.

  • You have the AWS CLI profile moj-cp with suitable credentials.

  • You have terraform and docker installed

  • Review the changelog of the Kubernetes release and the EKS release you are planning to upgrade to.

  • Review the official EKS upgrading a cluster document for any extra steps that are a part of a specific EKS release.

  • Run kubent against cluster to find deprecated APIs.

Upgrade Steps

Upgrade EKS Terraform Module

As mentioned previously; when a new EKS major version is released, it is normally followed by a release of an associated terraform-aws-eks module.

1) The first step of the EKS upgrade is to identify the corresponding module release with the EKS major version you want to upgrade to. Review the changes in the changelog. Plan/make any necessary changes or required updates.

Create a PR in Cloud Platform Infrastructure repository against the EKS module making the change to the desired terraform-aws-eks version

 module "eks" {
   source  = "terraform-aws-modules/eks/aws"
-  version = "v16.2.0"
+  version = "v17.1.0"

2) Execute terraform plan (or the automated plan pipeline) and review changes. If changes are all as expected, run terraform apply to execute the changes.

Note: When you run terraform plan, if it is only showing launch_template version change as below, executing terraform apply will only create a new template version. For cluster node groups to use the new template version created, you need to run terraform apply again, that will trigger a re-cycle of all the nodes. To avoid the re-cycle of nodes at this stage, we don’t run terraform apply until we complete the upgrade of node groups along with updating the template version at a later stage.

  # module.eks.module.node_groups.aws_launch_template.workers["monitoring_ng"] will be updated in-place
  ~ resource "aws_launch_template" "workers" {
      ~ default_version         = 1 -> (known after apply)
      ~ latest_version          = 1 -> (known after apply)

Upgrade Control Plane

3) Create a PR in Cloud Platform Infrastructure repository against the EKS module making the change to the desired EKS cluster version.

 module "eks" {
   source  = "terraform-aws-modules/eks/aws"
-  cluster_version = "1.14"
+  cluster_version = "1.15"

4) Execute terraform plan (or the automated plan pipeline) and review changes. If changes are all as expected, perform the upgrade from the AWS Console EKS Control Plane.

We don’t want to run terraform apply to apply the EKS cluster version, as the terraform apply process will take longer and timed out, also to avoid re-cycling of nodes as explained in step 2.

Once the process is completed, AWS Console will confirm the Control Plane is on the correct version.

$ aws eks describe-cluster --query 'cluster.version' --name manager
"1.15"
$

AWS Console

Upgrade Node Group(s)

The easiest way to upgrade node groups is through AWS Console. We advise to follow the official AWS EKS upgrade instructions from the Updating a Managed Node Group documentation.

While updating the node group AMI release version, we should also change the launch template version which is created in step 2. To perform both the changes together, select Update Node Group version and Change launch template version options as shown below. Select update strategy as force update, this does not respect pod disruption budgets and it forces node restarts.

Update Node Group

Recycle all nodes

When a node group version changes, this will cause all of the nodes to recycle. When AWS recycles the nodes, it will not evict pods if it will break the PDB. This will cause the node to stall the update and the nodes will not continue to recycle.

To rectify this, run the script mentioned in Recycle-all-nodes- Gotchas section.

Upgrade addon(s)

We have 3 addons managed through cloud-platform-terraform-eks-add-ons module.

Refer to the below documents to get the addon version to be used with the EKS major version you just upgraded to.

managing-kube-proxy

managing-coredns

managing-vpc-cni

Create a PR in Cloud Platform Infrastructure repository against the cloud-platform-terraform-eks-add-ons module making the changes to the desired addon version’s here. Execute terraform plan (or the automated plan pipeline) and review changes. If changes are all as expected, run terraform apply to execute the changes.

This page was last reviewed on 6 May 2023. It needs to be reviewed again on 6 August 2023 by the page owner #cloud-platform .
This page was set to be reviewed before 6 August 2023 by the page owner #cloud-platform. This might mean the content is out of date.