Upgrade EKS cluster
The Cloud Platform EKS cluster upgrade consists of three distinct parts:
- Upgrade EKS Terraform Module
- Upgrade EKS version (Control Plane and Node Groups)
- Upgrade addon(s)
The Cloud Platform EKS clusters are created using the official terraform-aws-eks module. The EKS version and addons are currently independent of the version of the terraform-aws-eks module. Therefore, it will not always require an upgrade of the terraform-aws-eks module and/or the addons whenever there is an upgrade of the EKS version. Please check the changelogs for the terraform-aws-eks module, the EKS version and the addons when planning an upgrade.
Run the upgrade, via the tools image
The cloud platform tools image has all the software required to run the upgrade.
Start from the root directory of a working copy of the infrastructure repo.
With your environment variables set, launch a bash shell on the tools image:
make tools-shell
Pre-requisites
Before you begin, there are a few pre-requisites:
Your GPG key must be added to the infrastructure repo so that you can run
git-crypt unlock
.You have the AWS CLI profile
moj-cp
with suitable credentials.You have terraform and docker installed
Review the changelog of the Kubernetes release and the EKS release you are planning to upgrade to.
Review the official EKS upgrading a cluster document for any extra steps that are a part of a specific EKS release.
Run kubent against cluster to find deprecated APIs.
Upgrade Steps
Upgrade EKS Terraform Module
As mentioned previously; when a new EKS major version is released, it is normally followed by a release of an associated terraform-aws-eks module.
1) The first step of the EKS upgrade is to identify the corresponding module release with the EKS major version you want to upgrade to. Review the changes in the changelog. Plan/make any necessary changes or required updates.
Create a PR in Cloud Platform Infrastructure repository against the EKS module making the change to the desired terraform-aws-eks version
module "eks" {
source = "terraform-aws-modules/eks/aws"
- version = "v16.2.0"
+ version = "v17.1.0"
2) Execute terraform plan
(or the automated plan pipeline) and review changes. If changes are all as expected, run terraform apply
to execute the changes.
Note: When you run terraform plan
, if it is only showing launch_template version change as below, executing terraform apply
will only create a new template version. For cluster node groups to use the new template version created, you need to run terraform apply
again, that will trigger a re-cycle of all the nodes. To avoid the re-cycle of nodes at this stage, we don’t run terraform apply
until we complete the upgrade of node groups along with updating the template version at a later stage.
# module.eks.module.node_groups.aws_launch_template.workers["monitoring_ng"] will be updated in-place
~ resource "aws_launch_template" "workers" {
~ default_version = 1 -> (known after apply)
~ latest_version = 1 -> (known after apply)
Upgrade Control Plane
3) Create a PR in Cloud Platform Infrastructure repository against the EKS module making the change to the desired EKS cluster version.
module "eks" {
source = "terraform-aws-modules/eks/aws"
- cluster_version = "1.14"
+ cluster_version = "1.15"
4) Execute terraform plan
(or the automated plan pipeline) and review changes. If changes are all as expected, perform the upgrade from the AWS Console EKS Control Plane.
We don’t want to run terraform apply
to apply the EKS cluster version, as the terraform apply process will take longer and timed out, also to avoid re-cycling of nodes as explained in step 2.
Once the process is completed, AWS Console will confirm the Control Plane is on the correct version.
$ aws eks describe-cluster --query 'cluster.version' --name manager
"1.15"
$
Upgrade Node Group(s)
The easiest way to upgrade node groups is through AWS Console. We advise to follow the official AWS EKS upgrade instructions from the Updating a Managed Node Group documentation.
While updating the node group AMI release version, we should also change the launch template version which is created in step 2. To perform both the changes together, select Update Node Group version
and Change launch template version
options as shown below. Select update strategy as force update
, this does not respect pod disruption budgets and it forces node restarts.
Recycle all nodes
When a node group version changes, this will cause all of the nodes to recycle. When AWS recycles the nodes, it will not evict pods if it will break the PDB. This will cause the node to stall the update and the nodes will not continue to recycle.
To rectify this, run the script mentioned in Recycle-all-nodes- Gotchas section.
Update kubectl version in tools image
kubectl is supported within one minor version (older or newer) of the cluster version. Update the kubectl version in the cloud platform tools image to match the current cluster version.
Upgrade addon(s)
We have 3 addons managed through cloud-platform-terraform-eks-add-ons module.
Refer to the below documents to get the addon version to be used with the EKS major version you just upgraded to.
Create a PR in Cloud Platform Infrastructure repository against the cloud-platform-terraform-eks-add-ons module making the changes to the desired addon version’s here. Execute terraform plan
(or the automated plan pipeline) and review changes. If changes are all as expected, run terraform apply
to execute the changes.