Skip to main content

Replacing Live-1 Would be Hard Because…

Treating clusters as cattle, not pets is one of our strategic goals.

The purpose of this document is to collect all the reasons we currently treat live-1 as a pet, so that we keep them top of mind, and prioritise solving them.

When we have an incident, why don’t we build a new cluster from scratch, instead of nursing live-1 back to health?

Reasons replacing live-1 is hard:

  • Teams would have to adjust their deployment pipelines to target the new cluster (this doesn’t seem particularly hard)
  • Teams would need to rotate their CircleCI (or whatever) service account credentials in their deployment pipelines (this doesn’t seem like that big a deal either, TBH)
  • Something something cert-manager?
  • Something something external-dns?

Solved issues

  • Restoring resources we don’t control (e.g. certificates) is solved by Velero
This page was last reviewed on 7 May 2021. It needs to be reviewed again on 7 August 2021 by the page owner #cloud-platform .
This page was set to be reviewed before 7 August 2021 by the page owner #cloud-platform. This might mean the content is out of date.