Skip to main content

Making changes to EKS node groups or instances types

Why?

You may need to make a change to an EKS cluster node group or instance type config. We can’t just let terraform apply these changes because terraform doesn’t gracefully rollout the old and new nodes. Terraform will bring down all of the old nodes immediately, which will cause outages to users.

How?

To avoid bringing down all the nodes at once is to follow these steps:

  1. add a new node group with your updated changes
  2. re-run the infrastructure-account/terraform-apply pipeline to update the Modsecurity Audit logs cluster to map roles to both old and new node group IAM Role This is to avoid losing modsec audit logs from the new node group
  3. lookup the old node group name (you can find this in the aws gui)
  4. once merged in you can drain the old node group using the command below:

    cloud-platform pipeline cordon-and-drain –cluster-name –node-group script source because this command runs remotely in concourse you can’t use this command to drain default ng on the manager cluster.

  5. raise a new pr deleting the old node group

  6. re-run the infrastructure-account/terraform-apply pipeline to again to update the Modsecurity Audit logs cluster to map roles with only the new node group IAM Role

  7. run the integration tests to ensure the cluster is healthy

Notes:

  • When making changes to the default node group in live, it’s handy to pause the pipelines for each of our environments for the duration of the change.
  • The cloud-platform pipeline command [cordons-and-drains-nodes] in a given node group waiting 5mins between each drained node.
  • If you can avoid it try not to fiddle around with the target node group in the aws console for example reducing the desired nodes, aws deletes nodes in an unpredictable way which might cause the pipeline command to fail. Although it is possible if you need to.
This page was last reviewed on 15 December 2023. It needs to be reviewed again on 15 June 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 15 June 2024 by the page owner #cloud-platform. This might mean the content is out of date.