How We Work

This page describes the tasks and duties of members of the Cloud Platform (CP) team, and explains some of our working practices.

Getting Help

If you need help at any point, don’t hesitate to ask the team. Among other things, you can:

Post in #cloud-platform explaining the problem
Post in #cloud-platform asking for someone to pair-program with you
Mention the issue during the daily standup

Sprints and ceremonies

We work in two-week sprints - these are usually product-led. Our team ceremonies include: daily stand up at 10:30am on Google Meet, planning every forth-night on Wednesday and our sprint demo followed by retro on Thursday.

Firebreak

For 1 sprint every 2 months, we have a Firebreak sprint. Firebreaks are team-led and provide team members the opportunity to work on tickets that are of interest to them and also of value to the organisation such as:

Address tech debt
Prototyping innovative new features
Try out new technologies

As with our usual sprint, at the end of the firebreak sprint, we showcase and demo our progress and learning to the team and other interested parties. find out more about firebreaks on GOV.UK

Story

Stories/Tickets are an item of work and should have enough information that all team members know what is required. When creating a new ticket, please use the appropriate template for a support request story or firebreak story.

Stories are estimated with story points based on complexity during planning using the fibonacci sequence. Voting is done by everyone entering their scores into the call chat window at the same time, scores are then discussed to reach a consensus.

Story points

Story Points	Complexity
1	This is straight forward and simple. I know exactly the code I would write if I went back to my desk right now.
2	This is quite easy. I know roughly what I’d have to do. I might have to look one or two things up.
3	This is a bit complex. I might have to refresh my memory on a few things and there are a couple of unknowns.
5	This is big. I have only a rough idea of how I’d do this.
8+	This is too big, it will have to be broken down.

We generally score spikes as a 5, we do not score bugs or support tickets unless they are moved in to general ticketed work.

The Board / Tickets

We use a Scrum process to manage our backlog of work on this Github Project board, which aggregates GitHub Issues from the various CP team repositories.

During the sprint, the process of getting work done should look like this:

Assign the topmost ticket (i.e. github issue) of the “This Sprint” column to yourself, and move it to the “In Progress” column
Create a new branch in each affected github repository and make whatever changes are necessary
Raise a pull request (PR), and get at least one other CP team member to review your changes (two reviews are required for infrastructure PRs).
After your PR has been approved, merge it and, if necessary, apply your changes using terraform or whatever else is required to apply your change to our infrastructure
Move the ticket into review for another team member to review.
Once reviewed, move the github issue to done, and move on to the next ticket
If you cannot work on a ticket as you are waiting on someone else, move the ticket to blocked and start work or help on another ticket.

Running out of tickets mid sprint

If there are no more tickets in Todo to pick up:

Ask in stand up or the team channel if anyone needs any help or pairing, the priority should always be to finish all tickets in sprint before bringing anything else in
Do any mandatory learning or use your 10% time
Help the support team with their tasks, such as dependabot or document reviews
If everything above is done, discuss with the team bringing more work into sprint (do not bring things into sprint without asking)

Adding tickets to the backlog

Anyone in the team is encouraged to add new tickets at any time. So if you think of something we ought to do, please raise a ticket for it.

Support tickets

If the time spent working on a non-ticketed issue or request such as from the ‘#ask-cloud-platform’ channel, is more than 15 minutes:

The team member working on the issue will create a ticket and estimate the ticket based on complexity, using the agreed criteria for story points.
Where appropriate, ask the user who raised the issue to create a ticket.

Making changes to code

Please read these technical guidelines for how we prefer to work on code.

Reviewing/Merging PRs

Whoever raises a PR is responsible for getting someone to review it, and for merging it after it has been approved
Please be pro-active about reviewing other team members’ PRs
When reviewing a PR, please add a “reaction” emoji to the corresponding slack message, so that other team members know you’re doing so. This avoids duplicated effort. We tend to use 👀 to show we’re reviewing a PR, and/or ✔ when we’ve approved it.

Support Squad

We have a support squad to manage the support requests and alerts that come in. The support squad is responsible for the below in order of priority:

Acknowledging and invoking the team to high-priority alerts in the #high-priority-alarms slack channel during support hours
The 🔨 Hammer of Justice
Acknowledging and responding to alerts in the #lower-priority-alarms slack channel which include
- Alerts related to the platform - lower priority alarms which are triggered from Prometheus, AWS, and Pingdom
- Alerts from concourse pipelines related to Integration tests, infrastructure and divergence
- Alerts from concourse pipelines related to environments repository i.e apply-namespace, apply-live
- Any other alerts from concourse pipelines
Support tickets raised by users
Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or carefully destroying orphaned AWS resources)
Open Dependabot PRs raised against the cloud-platform repositories, which are managed in our GitHub Project here
Any issues from link checker report

The 🔨 Hammer of Justice

The origin of the name is lost, but it sounds a lot more fun than “support manager” 😏

We designate one member of the team to be the Hammer on each working day. Please volunteer when you feel comfortable, so that we all take a turn.

The Hammer is responsible for:

Ensuring questions/problems in the #ask-cloud-platform slack channel are being worked on, and that users receive frequent updates until a problem is resolved
Ensuring that users’ PRs raised against the environments repository are reviewed in a timely fashion

It is not the Hammer’s job to answer every query in the channel or review every PR

It is the Hammer’s job to ensure that all queries are handled, and that PRs are reviewed. This may involve asking other team members for help with particular queries where they have relevant expertise, or if there are more queries coming in than you can handle.

Anyone can (and should) respond to queries in #ask-cloud-platform, and review PRs. You don’t have to be the Hammer to help.

Sprint Tickets

Do not work on tickets in the sprint when you’re the Hammer. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you’re head down in a problem and don’t notice them.

Secondary Support

The secondary support role is to cover other support work (not covered by the Hammer) and to support the Hammer where needed.

Support Tickets

Support tickets are created by users of Cloud Platform for various reasons. These can be anything from

a request for help with a technical problem
a request for a new feature or service
setting up Alertmanager Receiver
setting up pingdom integration

Support tickets are triaged by the support squad. If the support ticket is a quick change e.g. for setting an Alertmanager receiver, the ticket should be assigned to a member of the support team and should be finished in a day or two.

If the ticket involves some investigation work, then this can be assigned to a support squad member in the same sprint, or discussed in backlog refinement and added to the following sprint.

When working on support ticket, ensure that the ticket is updated with the progress and the user is informed.

Documentation

Most of our user-facing documentation is in the user guide, and documentation for the team is in the runbooks site.

There are also a lot of important README.md files like this one, especially for our terraform modules. We also have code samples like this for each of our terraform modules.

It is important to keep all of this up to date as the underlying code changes, so please remember to factor this in when estimating and working on tickets.

This page hosts a list of documents which are overdue for review. Please feel free to review any of the documents listed, and raise a PR making any updates (including updating the last_reviewed_on date).

This page was last reviewed on 25 February 2025. It needs to be reviewed again on 25 August 2025 by the page owner #cloud-platform .

This page was set to be reviewed before 25 August 2025 by the page owner #cloud-platform. This might mean the content is out of date.