Skip to main content

Upgrade EKS Terraform Module

The Cloud Platform EKS clusters are created using the official terraform-aws-eks module. This is independent of the EKS version and EKS addons. Therefore, it will not always require to upgrade the EKS version or EKS addons when the EKS module is upgraded.

Pre-requisites

Before you begin, there are a few pre-requisites:

  • Your GPG key must be added to the infrastructure repo so that you can run git-crypt unlock.

  • You have the AWS CLI profile moj-cp with suitable credentials.

  • You have terraform and docker installed

  • Identify the major version you want to upgrade to

  • Review the changes in the changelog.

Upgrade Steps

Create a PR in Cloud Platform Infrastructure repository against the EKS module making the change to the desired terraform-aws-eks version

 module "eks" {
   source  = "terraform-aws-modules/eks/aws"
-  version = "17.24.0"
+  version = "18.31.2"

Based on the changes in the changelog, you can decide if the upgrade is a breaking change or not.

Upgrade with no breaking changes

  • Execute terraform plan (or the automated plan pipeline) and review changes. If changes are all as expected, run terraform apply to execute the changes.

Note: When you run terraform plan, if it is only showing launch_template version change as below, executing terraform apply will only create a new template version.

  # module.eks.module.node_groups.aws_launch_template.workers["monitoring_ng"] will be updated in-place
  ~ resource "aws_launch_template" "workers" {
      ~ default_version         = 1 -> (known after apply)
      ~ latest_version          = 1 -> (known after apply)

For cluster node groups to use the new template version created, you need to run terraform apply again, that will trigger a re-cycle of all the nodes using terraform. This can be disruptive and also incur terraform apply timeout. Hence, follow the below steps to update the node groups with the new template version.

To update the node groups with the new template version: - login to AWS console - select EKS and select the cluster - click on the Compute and select the node group - click on Change launch template version option - select update strategy as force update and Update.

This will perform a rolling update of all the nodes in the node group. When AWS recycles the nodes, it will not evict pods if it will break the PDB. This will cause the node to stall the update and the nodes will not continue to recycle.

This CloudWatch Dashboard is used to monitor the pod eviction status for live cluster.

To rectify this, run the script mentioned in Recycle-all-nodes Gotchas section

Upgrade with breaking changes

Recent EKS module upgrade from 17 to 18 mentioned breaking changes to the resources. Hence, a non-distruptive process of creating new node group, moving terraform state, draining the old node group and finally deleting the old node group was followed.

Detailed steps are mentioned in the google doc The PR with the module changes are here: Here is the PR for the changes: #2458

Any future upgrades where the terraform plan or the changelog shows breaking changes, the procedure needs to be reviewed and modified based on what the breaking changes are.

This page was last reviewed on 22 April 2024. It needs to be reviewed again on 22 October 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 22 October 2024 by the page owner #cloud-platform. This might mean the content is out of date.