Skip to main content

Add nodes to the cluster

This runbook covers how to increase the number of nodes in a Kops Kubernetes cluster.

In the steps below, we assume we want to increase the number of nodes in Live-1, from 27 to 30.

Requirements

1. Ensure you have access to the Cloud Platform AWS account

2. Set your environment variables :

export KOPS_STATE_STORE=s3://cloud-platform-kops-state
export AWS_PROFILE=moj-cp

3. Start a shell in the tools image

This will ensure you have the right versions of everything

cd cloud-platform-infrastructure
make tools-shell

4. Be authenticated to the cluster :

kops export kubecfg live-1.cloud-platform.service.justice.gov.uk

Inline edit

Once authenticated to the cluster

bash-5.0# kops get instancegroup
Using cluster from kubectl context: live-1.cloud-platform.service.justice.gov.uk

NAME                                    ROLE    MACHINETYPE     MIN     MAX     ZONES
2xlarge-nodes-1.17.12                   Node    r5.2xlarge      2       2       eu-west-2b
ingress-nodes-1.17.12-eu-west-2a        Node    c5.xlarge       1       1       eu-west-2a
ingress-nodes-1.17.12-eu-west-2b        Node    c5.xlarge       1       1       eu-west-2b
ingress-nodes-1.17.12-eu-west-2c        Node    c5.xlarge       1       1       eu-west-2c
master-eu-west-2a                       Master  c5.4xlarge      1       1       eu-west-2a
master-eu-west-2b                       Master  c5.4xlarge      1       1       eu-west-2b
master-eu-west-2c                       Master  c5.4xlarge      1       1       eu-west-2c
nodes-1.17.12-eu-west-2a                Node    r5.xlarge       9       9       eu-west-2a
nodes-1.17.12-eu-west-2b                Node    r5.xlarge       9       9       eu-west-2b
nodes-1.17.12-eu-west-2c                Node    r5.xlarge       9       9       eu-west-2c

The 3 main worker node groups are nodes-1.17.12-eu-west-2*. To increase the worker node count to 30, you need to add 1 to each of those groups.

Do an inline edit for each group, and set minSize and maxSize to the same value, the increased number of nodes.

bash-5.0# kops edit instancegroup nodes-1.17.12-eu-west-2a
spec:
    ...
    maxSize: 10
    minSize: 10
    ...

Repeat for the 2b and 2c instancegroups.

Before we apply the changes, kops allows us to confirm what is going to happen:

% kops update cluster
Will modify resources:
AutoscalingGroup/nodes-1.17.12-eu-west-2a.live-1.cloud-platform.service.justice.gov.uk
    MaxSize                  9 -> 10
    MinSize                  9 -> 10
...

THESE SHOULD BE THE ONLY CHANGES THAT YOU SEE, IF YOU SEE ANYTHING MORE: ABORT

If you need to abort, overwrite the kops state on S3 with the unchanged YAML file in the repo like this

If happy with the information above, the changes can be applied with:

% kops update cluster --yes

Change progress can be monitored with:

% kops validate cluster

Persisting the changes

In order to ensure that the new configuration does not get overwritten at the next kops update, we need to persist those changes in the code.

1. Create a new branch of the cloud-platform-infrastructure repo

2. In /terraform/cloud-platform/variables.tf, modify the cluster_node_count_* variables

variable "cluster_node_count_a" {
  description = "The number of worker node in the cluster in Availability Zone eu-west-2a"
  default = {
    live-1  = "10"
    default = "1"
  }
}

Make the same change for cluster_node_count_b and cluster_node_count_c.

3. Push, Review, Merge

Once the PR is merged and a terraform apply is run, the cloud-platform-infrastructure/kops/live-1.yaml will be updated with the new size.

The updated kops/live-1.yaml file should also be committed to the repository, in a second PR.

For more information on kops update & kops rolling-updtae, have a look at this runbook.

This page was last reviewed on 11 May 2021. It needs to be reviewed again on 11 August 2021 by the page owner #cloud-platform .
This page was set to be reviewed before 11 August 2021 by the page owner #cloud-platform. This might mean the content is out of date.