How to Investigate External-Dns Errors
When there are errors in external-dns logs, “ErrorsInExternalDNS” alerts are sent to the low priority slack channel.
Troubleshooting
If we see an ErrorsInExternalDNS alert in low-priority-alerts, this is usually because an external-dns is unable to write records to Route-53 for a particual hosted zone. You can see errors from the external-dns pod by running:
kubectl logs -n kube-system external-dns-<pod-id> -f
Invalid Change Batch
level=error msg="InvalidChangeBatch: [RDATA character limit of 32000 exceeded.]"
level=error msg="InvalidChangeBatch: [RRSet with DNS name _external_dns.development.moj.service.justice.gov.uk., type TXT cannot be created as other RRSets exist with the same name and type.,
RRSet with DNS name development.moj.service.justice.gov.uk., type A cannot be created as other RRSets exist with the same name and type.]"
level=error msg="InvalidChangeBatch: [Tried to delete resource record set [name='_external_dns.test.live.cloud-platform.service.justice.gov.uk.', type='TXT', set-identifier='test-live-1'] but it was not found,
Tried to delete resource record set [name='test.live.cloud-platform.service.justice.gov.uk.', type='A', set-identifier='test-live-1'] but it was not found]
level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='cluster-cname-thisingress.example.', type='TXT'] but it already exists, Tried to create
resource record set [name='cluster-thisingress.example.', type='TXT'] but it already exists]
level=error msg="failed to submit all changes for the following zones: [/hostedzone/ABCDEFGHIJKLMN /hostedzone/NMLKJIHGFEDCBA]"
Identify the ingress causing the error in question and fix the ingress depending on the error message. We may need to inform users to update the ingress causing the error.
Follow this troubleshoot guide for common error messages to determine the error’s cause and how to troubleshoot it.
Once the error is cleared, you will see info logs similar to:
level=info msg="Applying provider record filter for domains: [dev.example. .dev.example.]"
level=info msg="Desired change: CREATE cluster-a-new.dev.example TXT [Id: /hostedzone/ZZZ]"
level=info msg="All records are already up to date"
Rate Limited / Throttled
level=error msg="records retrieval failed: failed to list hosted zones: Throttling: Rate exceeded\n\tstatus code: 400, request id: 9df2a-7blah"
level=error msg="failed to list resource records sets for zone /hostedzone/BLAH_MOX: Throttling: Rate exceeded\n\tstatus code: 400, request id: 0-9216-435fblah"
There isn’t much we can do about being rate limited; acknowledge the alert.