Provisioning EKS clusters
IMPORTANT: All cluster’s names must be globally unique. Each one creates a unique Route53 hostzone which is unique to the name
You can create a new EKS test cluster using the cluster build pipeline.
Alternatively, using the create-cluster
script.
Execute the script, by giving the -k eks
or --kind eks
option and the desired name of your new cluster. e.g.:
./create-cluster.rb --name mogaal-eks --kind eks
Check the pre-requisites and environment variables section of this document before running this script.
NB: Your cluster name must be no more than 12 characters. Any longer, and some of the computed strings which include the cluster name will exceed their maximum allowed values. The error messages you get if this happens are unhelpful. In order to prevent this, the build script will fail immediately if you supply a name which is too long.
See our [cluster naming policy] for information on how to choose a suitable name for your cluster.
By default, the script will create a small
cluster. This means the master and worker EC2 instances will be less powerful machine types than in our production cluster.
You can see more options to use when creating the cluster by running:
./create-cluster.rb --help
The script takes around 30 minutes to execute. At the end, you should see output like this:
2022-05-04 12:10:27 kubectl cluster-info
Kubernetes control plane is running at https://B04C5FC7828A0515457E141A9610815D.sk1.eu-west-2.eks.amazonaws.com
CoreDNS is running at https://B04C5FC7828A0515457E141A9610815D.sk1.eu-west-2.eks.amazonaws.com
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The section before kubectl cluster-info
is the cluster integration tests, which run at the end of the build
process. The number of tests will change, so the output will vary from what is shown here.
Alternatively, if you need more control over the test cluster parameters, or you just prefer to do it manually, the rest of this document describes the process.
Pre-requisites
- Make sure you have access to Cloud Platform AWS account
- Set your environment variables (you might also want to include
AWS_*
envs): - You have an [AWS CLI] profile
moj-cp
, with suitable credentials
export AWS_PROFILE=moj-cp
- Your GPG key must be added to the [infrastructure repo] so that you are able to run
git-crypt unlock
(the script will run this for you, but you must be able to do it) - You have [docker] installed
Environment Variables
See the file [example.env.create-cluster] in the [infrastructure repo]. This shows examples of the environment variables which must be set in order to run the create-cluster.rb
script to create a new cluster.
You can get the auth0 values from the terraform-provider-auth0
application on auth0.
Provisioning
1. VPC
We need to create a VPC to deploy the cluster, VPCs are deployed using terraform code under cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc folder:
cd cloud-platform-infrastructure/terraform/aws-acounts/cloud-platform-aws/vpc
terraform init
terraform workspace new <WorkspaceName>
terraform apply
You should be able to see your new VPC (called WorkspaceName
) inside the AWS Console. Check it before jumping to the next step.
NOTE: For conventions purposes please call all terraform workspaces the same.
2. Creating EKS cluster
Now it’s time to provision the cluster itself, we change directory to terraform/aws-accounts/cloud-platform-aws/vpc/eks and follow very similar terraform workflow:
terraform init
terraform workspace new <WorkspaceName>
terraform apply -var="vpc_name=$VPC_NAME"
vpc_name
must be set, it’s the only required variable. It can also be used variables like cluster_node_count
and worker_node_machine_type
to set up different node groups parameters
Once terraform finishes you should be able to generate your kubeadmin file (used by kubectl to establish a connection to the cluster) using aws-cli in the following way:
aws eks --region eu-west-2 update-kubeconfig --name mogaal-eks
NOTE: Replace “mogaal-eks” with the cluster name (workspace name)
Final check for this step consists in cluster connection
$ kubectl cluster-info
Kubernetes master is running at https://B04C5FC7828A0515457E141A9610815D.sk1.eu-west-2.eks.amazonaws.com
CoreDNS is running at https://B04C5FC7828A0515457E141A9610815D.sk1.eu-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
3. Deploy components
Components installation is the last step, for that we need to have the KUBECONFIG
variable exported and make sure you have access to the cluster. To deploy components we go to terraform/aws-accounts/cloud-platform-aws/vpc/eks/components and follow exactly the same terraform workflow:
terraform init
terraform workspace new <WorkspaceName>
terraform apply
4. Delete the EKS cluster
Delete the EKS cluster using the script
There is a destroy-cluster.rb script which you can use to delete your cluster.
Read the script before using it. Deleting a cluster is something you should be very cautious about, and ensure you know exactly what you’re doing.
The script is entirely non-interactive, and will not prompt you to confirm anything. It just destroys things.
First, run make tools-shell
The delete cluster script must always be run in a container. This ensures that the environment of the script is fully controlled, and you don’t run into problems such as the kubernetes context being changed in another window, or extra environment variables causing unwanted effects.
Then invoke the script like this:
./destroy-cluster.rb --name [short cluster name] --yes
Run without --yes
to do a dry run, and see what commands would be executed.
You can get more information using:
./destroy-cluster.rb --help
If any steps fail:
- Fix the underlying problem
- Edit the script to comment out any sections of the
ClusterDeleter.run
function which you no longer need to run - Re-run the script
Delete the cluster using concourse fly commands
In case you prefer concourse pipeline to destroy the cluster, these are the steps to follow, to delete the cluster using “concourse fly commands”
First, cd
` to the working copy of the concourse [pipelines repo][pipelines repo]. Make below two changes to the eks-create-test-destroy.yaml file.
In the eks-create-test-destroy pipeline definition, comment out the below line in destroy-cluster job.
args:
# export $(cat keyval/keyval.properties | grep CLUSTER_NAME )
Commenting out this will not set the CLUSTER_NAME
provided by the create-cluster-run-tests job.
Update the $CLUSTER_NAME
to <cluster-name-to-be-deleted
in the below line
./destroy-cluster.rb --name $CLUSTER_NAME --yes
Run the below commands updating the <cluster-name-to-be-deleted>
.
The first fly command will apply the changes made for the eks-create-test-destroy.yaml file with the hardcoded CLUSTER_NAME
in the destroy-cluster job
The second command will trigger the destroy-cluster job for the CLUSTER_NAME updated in the destroy-cluster job.
fly -t manager sp -p create-test-destroy -c create-test-destroy.yaml
fly -t manager trigger-job -j create-test-destroy/destroy-cluster
Note: After the destroy-cluster job completed sucessfully, run the [bootstrap pipleine][bootstrap pipleine] to discard the changes made to eks-create-test-destroy.yaml file.
fly -t manager trigger-job -j bootstrap/bootstrap-pipelines
Delete the EKS cluster manually
Follow these steps, to delete the EKS cluster.
First, set the kubectl context for the EKS cluster you are deleting. The easiest way to do this is with aws command:
$ export KUBECONFIG=~/.kube/config
$ export cluster=<cluster-name>
$ aws eks --region eu-west-2 update-kubeconfig --name ${cluster}
You should see this output:
Added new context arn:aws:eks:eu-west-2:754256621582:cluster/<cluster-name> to .kube/config
Then, from the root of a checkout of the cloud-platform-infrastructure
repository, run
these commands to destroy all cluster components, and delete the terraform workspace:
$ cd terraform/aws-accounts/cloud-platform-aws/vpc/eks/components
$ terraform init
$ terraform workspace select ${cluster}
$ terraform destroy
The destroy process often gets stuck on prometheus operator. If that happens, running this in a separate window usually works:
kubectl -n monitoring delete job prometheus-operator-operator-cleanup
$ terraform workspace select default
$ terraform workspace delete ${cluster}
Change directories and perform the following to destroy the EKS cluster, and delete the terraform workspace.
$ cd .. # working dir is now `eks`
$ terraform init
$ terraform workspace select ${cluster}
$ terraform destroy
$ terraform workspace select default
$ terraform workspace delete ${cluster}
Change directories and perform the following to destroy the cluster VPC, and delete the terraform workspace.
$ cd .. # working dir is now `vpc`
$ terraform init
$ terraform workspace select ${cluster}
$ terraform destroy
$ terraform workspace select default
$ terraform workspace delete ${cluster}