Incident on 2022-03-10 - All ingress resources using *.apps.live.cloud-platform urls showing certificate issue
Key events
- First detected 2022-03-10 11:48
- Incident declared 2022-03-10 11.50
- Repaired 2022-03-10 11:56
- Resolved 2022-03-10 11.56
Time to repair: 8m
Time to resolve: 8m
Identified: Users reported in #ask-cloud-platform that they are seeing errors for CP domain urls.
Hostname/IP does not match certificate's altnamesImpact: All ingress resources using the *apps.live.cloud-platform.service.justice.gov.uk have mismatched certificates.
Context:
- Occurred immediately following a terraform apply to a test cluster
- The change amended the default certificate of
livecluster to*.apps.yy-1003-0100.cloud-platform.service.justice.gov.uk. - Timeline: timeline for the incident
- Slack thread: #ask-cloud-platform for the incident
Resolution:
- The immediate repair was to perform an inline edit of the default certificate in
live. Adding the wildcard dnsNames*.apps.live,*.live,*.apps.live-1and*.live-1to the default certificate i.e. reverting the faulty change. - Further investigation followed finding the cause of the incident was actually was due to the environment variable KUBE_CONFIG set to the config path which had
livecontext set - The terraform kubectl provider used to apply
kubectl_manifestresources uses environment variableKUBECONFIGandKUBE_CONFIG_PATH. But it has been found that it can also use variableKUBE_CONFIGcausing the apply of certificate to the wrong cluster.
- The immediate repair was to perform an inline edit of the default certificate in
Review actions:
- Ticket raised to configure kubectl provider to use data source #3589