Add a new Alertmanager receiver and a slack webhook
When Development Teams require to send custom alarms to their own channels, the Alertmanager receiver has to be added to the alertmanager.
Pre-requisites
You must have the below details from the development team.
- namespace name
- team name
- application name
- slack channel
- severity level (warning/information)
- kubernetes secret which has the slack webhook url
Creating a new receiver set
Fill in the template with the details provided from development team and add the array to
terraform.tfvars
file. Theterraform.tfvars
file is encrypted so you have togit-crypt unlock
to view the contents of the file. Check git-crypt documentation in user guide for more information on how to setup git-crypt.{ severity = "<unique name based on team info provided "> webhook = " webhook url>" channel = "<slack channel>" }
Create a PR and merge the changes to master
Apply the terraform code changes to the live
Once the changes are applied, provide the severity
to the development team requested. Development teams have to use the value as severity label when creating all custom application alerts.
An example receiver set and a prometheus rule will look as below
{
severity = "cp-team"
webhook = "https://hooks.slack.com/services/HGJKHJKSH/DFJKHKIUO/DJFHKDUJFKSUHUGIDHKUDGD"
channel = "#cloud-platform-alerts"
}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
namespace: monitoring
labels:
role: alert-rules
name: prometheus-custom-rules-my-application
spec:
groups:
- name: node.rules
rules:
- alert: Quota-Exceeded
expr: 100 * kube_resourcequota{job="kube-state-metrics",type="used",namespace="monitoring"} / ignoring(instance, job, type) (kube_resourcequota{job="kube-state-metrics",type="hard"} > 0) > 90
for: 5m
labels:
severity: cp-team
annotations:
message: Namespace {{ $labels.namespace }} is using {{ printf "%0.0f" $value}}% of its {{ $labels.resource }} quota.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md###alert-name-kubequotaexceeded
Information Alerts
AlertManager receivers for slack integration are configured for 2 types of alerts.
- to send a warning upon an alert rule condition is met and a recovery message once the condition is reversed
- to send an information type alert for monitoring non-problem/non-failure events and no recovery message - e.g for rule condition to positively know something happened (like a database refresh, or app deployment etc) which is not a type of problem/failure.
All alerts are routed using the severity label set from the prometheus rule. If the severity label matches the keyword info-
info-
when setting the severity
label of the prometheus rule.
An example receiver set and a prometheus rule will look as below
{
severity = "cp-team"
webhook = "https://hooks.slack.com/services/HGJKHJKSH/DFJKHKIUO/DJFHKDUJFKSUHUGIDHKUDGD"
channel = "#cloud-platform-alerts"
}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
namespace: monitoring
labels:
role: alert-rules
name: prometheus-custom-rules-my-application
spec:
groups:
- name: application-rules
rules:
- alert: Job completed
expr: |-
rate(kube_job_status_succeeded{job="kube-state-metrics",namespace="monitoring"}[5m]) > 0
for: 30s
labels:
severity: info-cp-team # Should include `info-` before the severity mentioned in the receiver set above
status_icon: information # To have a "i" icon instead of default "?"
annotations:
message: Latest backup checker Job completed on Namespace {{ $labels.namespace }}
Check the user guide for more details on how to setup custom prometheus alerts.