Skip to main content

How We Work

This page describes the tasks and duties of members of the Cloud Platform (CP) team, and explains some of our working practices.

Getting Help

If you need help at any point, don’t hesitate to ask the team. Among other things, you can:

  • Post in #cloud-platform explaining the problem
  • Post in #cloud-platform asking for someone to pair-program with you
  • Mention the issue during the daily standup

Sprints and ceremonies

We work in two-week sprints - these are usually product-led. Our team ceremonies include: daily stand up at 10:30am on Google Meet, planning every forth-night on Wednesday and our sprint demo followed by retro on Thursday.

Firebreak

For 1 sprint every 2 months, we have a Firebreak sprint. Firebreaks are team-led and provide team members the opportunity to work on tickets that are of interest to them and also of value to the organisation such as:

  1. Address tech debt
  2. Prototyping innovative new features
  3. Try out new technologies

As with our usual sprint, at the end of the firebreak sprint, we showcase and demo our progress and learning to the team and other interested parties. find out more about firebreaks on GOV.UK

Story

Stories/Tickets are an item of work and should have enough information that all team members know what is required. When creating a new ticket, please use the appropriate template for a support request story or firebreak story.

Stories are estimated with story points based on complexity during planning using the fibonacci sequence. Voting is done in the open by show of fingers.

Story points

Story Points Complexity
1 This is straight forward and simple. I know exactly the code I would write if I went back to my desk right now.
2 This is quite easy. I know roughly what I’d have to do. I might have to look one or two things up.
3 This is a bit complex. I might have to refresh my memory on a few things and there are a couple of unknowns.
5 This is big. I have only a rough idea of how I’d do this.
8+ This is too big, it will have to be broken down.

The Board / Tickets

We use a kanban process to manage our backlog of work on this Github Project board, which aggregates GitHub Issues from the various CP team repositories.

During the sprint, the process of getting work done should look like this:

  • Assign the topmost ticket (i.e. github issue) of the “This Sprint” column to yourself, and move it to the “In Progress” column
  • Let the team know what you’re working on by posting in the #cloud-platform slack channel
  • Create a new branch in each affected github repository and make whatever changes are necessary
  • Raise a pull request (PR), and get at least one other CP team member to review your changes (two reviews are required for infrastructure PRs).
  • After your PR has been approved, merge it and, if necessary, apply your changes using terraform or whatever else is required to apply your change to our infrastructure
  • Close the github issue, and move on to the next ticket

Adding tickets

Anyone in the team is encouraged to add new tickets at any time. So if you think of something we ought to do, please raise a ticket for it.

Support tickets

If the time spent working on a non-ticketed issue or request such as from the ‘#ask-cloud-platform’ channel, is more than 15 minutes:

  • The team member working on the issue will create a ticket and estimate the ticket based on complexity, using the agreed criteria for story points.
  • Where appropriate, ask the user who raised the issue to create a ticket.

Making changes to code

Please read these technical guidelines for how we prefer to work on code.

Reviewing/Merging PRs

  • Whoever raises a PR is responsible for getting someone to review it, and for merging it after it has been approved
  • Please be pro-active about reviewing other team members’ PRs
  • When reviewing a PR, please add a “reaction” emoji to the corresponding slack message, so that other team members know you’re doing so. This avoids duplicated effort. We tend to use 👀 to show we’re reviewing a PR, and/or ✔ when we’ve approved it.

Support Squad

We have a support squad to manage the support requests and alerts that come in. The support squad is responsible for the below in order of priority:

  • Acknowledging and invoking the team to high-priority alerts in the #high-priority-alarms slack channel during support hours
  • The 🔨 Hammer of Justice
  • Acknowledging and responding to alerts in the #lower-priority-alarms slack channel which include
    • Alerts related to the platform - lower priority alarms which are triggered from Prometheus, AWS, and Pingdom
    • Alerts from concourse pipelines related to Integration tests, infrastructure and divergence
    • Alerts from concourse pipelines related to environments repository i.e apply-namespace, apply-live
    • Any other alerts from concourse pipelines
  • Support tickets raised by users
  • Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or carefully destroying orphaned AWS resources)
  • Open Dependabot PRs raised against the cloud-platform repositories, which are managed in our GitHub Project here
  • Any issues from link checker report

The 🔨 Hammer of Justice

The origin of the name is lost, but it sounds a lot more fun than “support manager” 😏

We designate one member of the team to be the Hammer on each working day. Please volunteer when you feel comfortable, so that we all take a turn.

The Hammer is responsible for:

  • Ensuring questions/problems in the #ask-cloud-platform slack channel are being worked on, and that users receive frequent updates until a problem is resolved
  • Ensuring that users’ PRs raised against the environments repository are reviewed in a timely fashion

It is not the Hammer’s job to answer every query in the channel or review every PR

It is the Hammer’s job to ensure that all queries are handled, and that PRs are reviewed. This may involve asking other team members for help with particular queries where they have relevant expertise, or if there are more queries coming in than you can handle.

Anyone can (and should) respond to queries in #ask-cloud-platform, and review PRs. You don’t have to be the Hammer to help.

Backlog Tickets

Working on tickets in the backlog when you’re the Hammer is not advised. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you’re head down in a problem and don’t notice them.

Support Tickets

Support tickets are created by users of Cloud Platform for various reasons. These can be anything from

  • a request for help with a technical problem
  • a request for a new feature or service
  • setting up Alertmanager Receiver
  • setting up pingdom integration

Support tickets are triaged by the support squad. If the support ticket is a quick change e.g. for setting an Alertmanager receiver, the ticket should be assigned to a member of the support team and should be finished in a day or two.

If the ticket involves some investigation work, then this can be assigned to a support squad member in the same sprint, or discussed in backlog refinement and added to the following sprint.

When working on support ticket, ensure that the ticket is updated with the progress and the user is informed.

Documentation

Most of our user-facing documentation is in the user guide, and documentation for the team is in the runbooks site.

There are also a lot of important README.md files like this one, especially for our terraform modules. We also have code samples like this for each of our terraform modules.

It is important to keep all of this up to date as the underlying code changes, so please remember to factor this in when estimating and working on tickets.

This page hosts a list of documents which are overdue for review. Please feel free to review any of the documents listed, and raise a PR making any updates (including updating the last_reviewed_on date).

This page was last reviewed on 11 September 2024. It needs to be reviewed again on 11 December 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 11 December 2024 by the page owner #cloud-platform. This might mean the content is out of date.