Skip to main content

Grafana Dashboards

Kubernetes Number of Pods per Node

This dashboard was created to show the current number of pods per node in the cluster.

Dashboard Layout

Each box on the dashboard has the name of the node and the number of pods on the node - This number includes pods with a statuses such as “DeadlineExceeded” and completed".

The exception is the Max Pods per Node box. This is a constant number set on creation for the maximum allowed pods per node.

The current architecture does not allow instance group id to be viewed on the dashboard:

We currently have 2 instance groups:

  • Default worker node group (r6i.2xlarge)
  • Monitoring node group (r6i.8xlarge Nodes)

As the dashboard is set in descending order, the last two boxes are normally from the monitoring Nodes group (2 instances), and the rest are from the default Nodes group.

You can run the following command to confirm this and get more information about a node:

kubectl describe node <node_name>

Troubleshooting

Fixing “failed to load dashboard” errors

The kibana alert has reported an error similar to:

Grafana failed to load one or more dashboards - This could prevent new dashboards from being created ⚠️

You can also see errors from the Grafana pod by running:

kubectl logs -n monitoring prometheus-operator-grafana-<pod-id> -f -c grafana

You’ll see an error similar to:

t=2021-12-03T13:37:35+0000 lvl=eror msg="failed to load dashboard from " logger=provisioning.dashboard type=file name=sidecarProvider file=/tmp/dashboards/<MY-DASHBOARD>.json error="invalid character 'c' looking for beginning of value"

Identify the namespace and name of the configmap which contains this dashboard name by running:

kubectl get configmaps -A -ojson | jq -r '.items[] | select (.data."<MY-DASHBOARD>.json") | .metadata.namespace + "/" + .metadata.name'

This will return the namespace and name of the configmap which contains the dashboard config. Describe the namespace and find the user’s slack-channel which is a annotation on the namespace:

kubectl describe namespace <namespace>

Contact the user in the given slack-channel and ask them to fix it. Provide the list of affected dashboards and the error message to help diagnose the issue.

Fixing “duplicate dashboard uid” errors

The kibana alert has reported an error similar to:

Duplicate Grafana dashboard UID’s found

To help in identifying the dashboards, you can exec into the Grafana pod as follows:

kubectl exec --stdin --tty prometheus-operator-grafana-xxxxxx-yyyyy --container grafana-sc-dashboard  -- sh

Then, cd into the /tmp/dashboards directory, and execute the find command as below:

cd /tmp/dashboards/
for i in $(find ./ -type f); do tail $i; done|grep '"uid": '|cut -f2 -d':'|sort|uniq -c|grep -v ' 1 '

This will show output any duplicate UIDs and the number of occurrences of each.

ie:

2    "[duplicate-dashboard-uid]",
...

Now you can grep the dashboard json files within the directory to identify which dashboards contain the duplicate UID with:

grep -Rnw . -e "[duplicate-dashboard-uid]"

./my-test-dashboard.json:  "uid": "duplicate-dashboard-uid",
./my-test-dashboard-2.json:  "uid": "duplicate-dashboard-uid",

Identify the namespace and name of the configmap which contains this dashboard name by running:

kubectl get configmaps -A -ojson | jq -r '.items[] | select (.data."my-test-dashboard.json") | .metadata.namespace + "/" + .metadata.name'

This will return the namespace and name of the configmap which contains the dashboard config. Describe the namespace and find the user’s slack-channel which is a annotation on the namespace:

kubectl describe namespace <namespace>

Contact the user in the given slack-channel and ask them to fix it. Provide the list of affected dashboards and the error message to help diagnose the issue.

This page was last reviewed on 29 December 2023. It needs to be reviewed again on 29 March 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 29 March 2024 by the page owner #cloud-platform. This might mean the content is out of date.