How We Work
This page describes the tasks and duties of members of the Cloud Platform (CP) team, and explains some of our working practices.
Getting Help
If you need help at any point, don’t hesitate to ask the team. Among other things, you can:
- Post in #cloud-platform explaining the problem
- Post in #cloud-platform asking for someone to pair-program with you
- Mention the issue during the daily standup
Sprints and ceremonies
We work in two-week sprints - these are usually product-led. Our team ceremonies include: daily stand up at 10:30am on Google Meet, planning every forth-night on Wednesday and our sprint demo followed by retro on Thursday.
Firebreak
For 1 sprint every 2 months, we have a Firebreak sprint. Firebreaks are team-led and provide team members the opportunity to work on tickets that are of interest to them and also of value to the organisation such as:
- Address tech debt
- Prototyping innovative new features
- Try out new technologies
As with our usual sprint, at the end of the firebreak sprint, we showcase and demo our progress and learning to the team and other interested parties. find out more about firebreaks on GOV.UK
Story
Stories/Tickets are an item of work and should have enough information that all team members know what is required. When creating a new ticket, please use the appropriate template for a support request story or firebreak story.
Stories are estimated with story points based on complexity during planning using the fibonacci sequence. Voting is done in the open by show of fingers.
Story points
Story Points | Complexity |
---|---|
1 | This is straight forward and simple. I know exactly the code I would write if I went back to my desk right now. |
2 | This is quite easy. I know roughly what I’d have to do. I might have to look one or two things up. |
3 | This is a bit complex. I might have to refresh my memory on a few things and there are a couple of unknowns. |
5 | This is big. I have only a rough idea of how I’d do this. |
8+ | This is too big, it will have to be broken down. |
The Board / Tickets
We use a kanban process to manage our backlog of work on this Github Project board, which aggregates GitHub Issues from the various CP team repositories.
During the sprint, the process of getting work done should look like this:
- Assign the topmost ticket (i.e. github issue) of the “This Sprint” column to yourself, and move it to the “In Progress” column
- Let the team know what you’re working on by posting in the
#cloud-platform
slack channel - Create a new branch in each affected github repository and make whatever changes are necessary
- Raise a pull request (PR), and get at least one other CP team member to review your changes (two reviews are required for infrastructure PRs).
- After your PR has been approved, merge it and, if necessary, apply your changes using terraform or whatever else is required to apply your change to our infrastructure
- Close the github issue, and move on to the next ticket
Adding tickets
Anyone in the team is encouraged to add new tickets at any time. So if you think of something we ought to do, please raise a ticket for it.
Support tickets
If the time spent working on a non-ticketed issue or request such as from the ‘#ask-cloud-platform’ channel, is more than 15 minutes:
- The team member working on the issue will create a ticket and estimate the ticket based on complexity, using the agreed criteria for story points.
- Where appropriate, ask the user who raised the issue to create a ticket.
Making changes to code
Please read these technical guidelines for how we prefer to work on code.
Reviewing/Merging PRs
- Whoever raises a PR is responsible for getting someone to review it, and for merging it after it has been approved
- Please be pro-active about reviewing other team members’ PRs
- When reviewing a PR, please add a “reaction” emoji to the corresponding slack message, so that other team members know you’re doing so. This avoids duplicated effort. We tend to use đ to show we’re reviewing a PR, and/or â when we’ve approved it.
Support Squad
We have a support squad to manage the support requests and alerts that come in. The support squad is responsible for the below in order of priority:
- Acknowledging and invoking the team to high-priority alerts in the
#high-priority-alarms
slack channel during support hours - The đ¨ Hammer of Justice
- Acknowledging and responding to alerts in the
#lower-priority-alarms
slack channel which include- Alerts related to the platform - lower priority alarms which are triggered from Prometheus, AWS, and Pingdom
- Alerts from concourse pipelines related to Integration tests, infrastructure and divergence
- Alerts from concourse pipelines related to environments repository i.e apply-namespace, apply-live
- Any other alerts from concourse pipelines
- Support tickets raised by users
- Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or carefully destroying orphaned AWS resources)
- Open Dependabot PRs raised against the
cloud-platform
repositories, which are managed in our GitHub Project here - Any issues from link checker report
The đ¨ Hammer of Justice
The origin of the name is lost, but it sounds a lot more fun than “support manager” đ
We designate one member of the team to be the Hammer on each working day. Please volunteer when you feel comfortable, so that we all take a turn.
The Hammer is responsible for:
- Ensuring questions/problems in the
#ask-cloud-platform
slack channel are being worked on, and that users receive frequent updates until a problem is resolved - Ensuring that users’ PRs raised against the environments repository are reviewed in a timely fashion
It is not the Hammer’s job to answer every query in the channel or review every PR
It is the Hammer’s job to ensure that all queries are handled, and that PRs are reviewed. This may involve asking other team members for help with particular queries where they have relevant expertise, or if there are more queries coming in than you can handle.
Anyone can (and should) respond to queries in
#ask-cloud-platform
, and review PRs. You don’t have to be the Hammer to help.
Backlog Tickets
Working on tickets in the backlog when you’re the Hammer is not advised. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you’re head down in a problem and don’t notice them.
Support Tickets
Support tickets are created by users of Cloud Platform for various reasons. These can be anything from
- a request for help with a technical problem
- a request for a new feature or service
- setting up Alertmanager Receiver
- setting up pingdom integration
Support tickets are triaged by the support squad. If the support ticket is a quick change e.g. for setting an Alertmanager receiver, the ticket should be assigned to a member of the support team and should be finished in a day or two.
If the ticket involves some investigation work, then this can be assigned to a support squad member in the same sprint, or discussed in backlog refinement and added to the following sprint.
When working on support ticket, ensure that the ticket is updated with the progress and the user is informed.
Documentation
Most of our user-facing documentation is in the user guide, and documentation for the team is in the runbooks site.
There are also a lot of important README.md
files like this one, especially for our terraform modules.
We also have code samples like this for each of our terraform modules.
It is important to keep all of this up to date as the underlying code changes, so please remember to factor this in when estimating and working on tickets.
This page hosts a list of documents which are overdue for review. Please feel free to review any of the documents listed, and raise a PR making any updates (including updating the last_reviewed_on
date).