Skip to main content

Incident Log

Q3 2024 (July-September)

Q1 2024 (January-April)

Q4 2023 (October-December)

Q3 2023 (July-September)

Q2 2023 (April-June)

Q1 2023 (January-March)

Q4 2022 (October-December)

Q3 2022 (July-September)

Q1 2022 (January to March)

Q4 2021 (October to December)

Q3 2021 (July-September)

Q2 2021 (April-June)

Q1 2021 (January - March)

  • Mean Time to Repair: N/A

  • Mean Time to Resolve: N/A

No incidents declared

Q4 2020 (October - December)

Q3 2020 (July - September)

Q2 2020 (April - June)

Q1 2020 (January - March)

About this incident log

The purpose of publishing this incident log:

  • for the Cloud Platform team to learn from incidents
  • for the Cloud Platform team and its stakeholders to track incident trends and performance
  • because we operate in the open

Definitions:

  • The words used in the timeline of an incident: fault occurs, team becomes aware (of something bad), incident declared (the team acknowledges and has an idea of the impact), repaired (system is fully functional), resolved (fully functional and future failures are prevented)
  • Incident time - The start of the failure (Before March 2020 it was the time the incident was declared)
  • Time to Repair - The time between the incident being declared (or when the team became aware of the fault) and when service is fully restored. Only includes Hours of Support.
  • Time to Resolve - The time between when the fault occurs and when system is fully functional (and include any immediate work done to prevent future failures). Only includes Hours of Support. This is a broader metric of incident response performance, compared to Time to Repair.

Source: Atlassian

Datestamps: please use YYYY-MM-DD HH:MM (almost ISO 8601, but more readable), for the London timezone

Template

Incident on YYYY-MM-DD - [Brief description]

  • Key events

    • First detected YYYY-MM-DD HH:MM
    • Incident declared YYYY-MM-DD HH:MM
    • Repaired YYYY-MM-DD HH:MM
    • Resolved YYYY-MM-DD HH:MM
  • Time to repair: Xh Xm

  • Time to resolve: Xh Xm

  • Identified:

  • Impact:

  • Context:

    • Timeline: [Timeline](url of google document) for the incident
    • Slack thread: [Slack thread](url of primary incident thread) for the incident.
  • Resolution:

  • Review actions: