mirror of
https://github.com/rancher/rancher-docs.git
synced 2026-05-14 00:53:22 +00:00
Add DNS troubleshooting
This commit is contained in:
committed by
Denise
parent
b4f4bfb8cd
commit
05c0016072
@@ -24,6 +24,10 @@ This section contains information to help you troubleshoot issues when using Ran
|
||||
|
||||
Steps to troubleshoot networking issues can be found here.
|
||||
|
||||
- [DNS]({{< baseurl >}}/rancher/v2.x/en/troubleshooting/dns/)
|
||||
|
||||
When you experience name resolution issues in your cluster.
|
||||
|
||||
- [Rancher HA]({{< baseurl >}}/rancher/v2.x/en/troubleshooting/rancherha/)
|
||||
|
||||
If you experience issues issues with your [High Availability (HA) Install]({{< baseurl >}}/rancher/v2.x/en/installation/ha/)
|
||||
|
||||
@@ -0,0 +1,141 @@
|
||||
---
|
||||
title: DNS
|
||||
weight: 103
|
||||
---
|
||||
|
||||
The commands/steps listed on this page can be used to check name resolution issues in your cluster.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kube_config_rancher-cluster.yml` for Rancher HA) or are using the embedded kubectl via the UI.
|
||||
|
||||
Before running the DNS checks, make sure that [the overlay network is functioning correctly]({{< baseurl >}}/rancher/v2.x/en/troubleshooting/networking/#check-if-overlay-network-is-functioning-correctly) as this can also be the reason why DNS resolution (partly) fails.
|
||||
|
||||
### Check if DNS pods are running
|
||||
|
||||
```
|
||||
kubectl -n kube-system get pods -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
kube-dns-5fd74c7488-h6f7n 3/3 Running 0 4m13s
|
||||
```
|
||||
|
||||
### Check if the DNS service is present with the correct cluster-ip
|
||||
|
||||
```
|
||||
kubectl -n kube-system get svc -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
```
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP 4m13s
|
||||
```
|
||||
|
||||
### Check if domain names are resolving
|
||||
|
||||
Check if internal cluster names are resolving (in this example, `kubernetes.default`), the IP shown after `Server:` should be the same as the `CLUSTER-IP` from the `kube-dns` service.
|
||||
|
||||
```
|
||||
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Server: 10.43.0.10
|
||||
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local
|
||||
|
||||
Name: kubernetes.default
|
||||
Address 1: 10.43.0.1 kubernetes.default.svc.cluster.local
|
||||
pod "busybox" deleted
|
||||
```
|
||||
|
||||
Check if external names are resolving (in this example, `www.google.com`)
|
||||
|
||||
```
|
||||
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup www.google.com
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Server: 10.43.0.10
|
||||
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local
|
||||
|
||||
Name: www.google.com
|
||||
Address 1: 2a00:1450:4009:80b::2004 lhr35s04-in-x04.1e100.net
|
||||
Address 2: 216.58.211.100 ams15s32-in-f4.1e100.net
|
||||
pod "busybox" deleted
|
||||
```
|
||||
|
||||
If you want to check resolving of domain names on all of the hosts, execute the following steps:
|
||||
|
||||
1. Save the following file as `ds-dnstest.yml`
|
||||
|
||||
```
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: dnstest
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
name: dnstest
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: dnstest
|
||||
spec:
|
||||
tolerations:
|
||||
- operator: Exists
|
||||
containers:
|
||||
- image: busybox:1.28
|
||||
imagePullPolicy: Always
|
||||
name: alpine
|
||||
command: ["sh", "-c", "tail -f /dev/null"]
|
||||
terminationMessagePath: /dev/termination-log
|
||||
```
|
||||
|
||||
2. Launch it using `kubectl create -f ds-dnstest.yml`
|
||||
3. Wait until `kubectl rollout status ds/dnstest -w` returns: `daemon set "dnstest" successfully rolled out`.
|
||||
4. Configure the environment variable `DOMAIN` to a fully qualified domain name (FQDN) that the host should be able to resolve (`www.google.com` is used as an example) and run the following command to let each container on every host resolve the configured domain name (it's a single line command).
|
||||
|
||||
```
|
||||
export DOMAIN=www.google.com; echo "=> Start DNS resolve test"; kubectl get pods -l name=dnstest --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do kubectl exec $pod -- /bin/sh -c "nslookup $DOMAIN > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $host cannot resolve $DOMAIN; fi; done; echo "=> End DNS resolve test"
|
||||
```
|
||||
|
||||
5. When this command has finished running, the output indicating everything is correct is:
|
||||
|
||||
```
|
||||
=> Start DNS resolve test
|
||||
=> End DNS resolve test
|
||||
```
|
||||
|
||||
If you see error in the output, that means that the mentioned host(s) is/are not able to resolve the given FQDN.
|
||||
|
||||
Example error output of a situation where host with IP 209.97.182.150 had the UDP ports blocked.
|
||||
|
||||
```
|
||||
=> Start DNS resolve test
|
||||
command terminated with exit code 1
|
||||
209.97.182.150 cannot resolve www.google.com
|
||||
=> End DNS resolve test
|
||||
```
|
||||
|
||||
Cleanup the alpine DaemonSet by running `kubectl delete ds/dnstest`.
|
||||
|
||||
### Check upstream nameservers in kubedns container
|
||||
|
||||
By default, the configured nameservers on the host (in `/etc/resolv.conf`) will be used as upstream nameservers for `kube-dns`. Sometimes the host will run a local caching DNS nameserver, which means the address in `/etc/resolv.conf` will point to an address in the loopback range (`127.0.0.0/8`) which will be unreachable by the container. In case of Ubuntu 18.04, this is done by `systemd-resolved`. Since Rancher v2.0.7, we detect if `systemd-resolved` is running, and will automatically use the `/etc/resolv.conf` file with the correct upstream nameservers (which is located at `/run/systemd/resolve/resolv.conf`).
|
||||
|
||||
Use the following command to check the upstream nameservers used by the kubedns container:
|
||||
|
||||
```
|
||||
kubectl -n kube-system get pods -l k8s-app=kube-dns --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do echo "Pod ${pod} on host ${host}"; kubectl -n kube-system exec $pod -c kubedns cat /etc/resolv.conf; done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Pod kube-dns-667c7cb9dd-z4dsf on host x.x.x.x
|
||||
nameserver 1.1.1.1
|
||||
nameserver 8.8.4.4
|
||||
```
|
||||
@@ -17,50 +17,45 @@ The pod can be scheduled to any of the hosts you used for your cluster, but that
|
||||
|
||||
To test the overlay network, you can launch the following `DaemonSet` definition. This will run an `alpine` container on every host, which we will use to run a `ping` test between containers on all hosts.
|
||||
|
||||
1. Save the following file as `ds-alpine.yml`
|
||||
1. Save the following file as `ds-overlaytest.yml`
|
||||
|
||||
```
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: alpine
|
||||
name: overlaytest
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
name: alpine
|
||||
name: overlaytest
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: alpine
|
||||
name: overlaytest
|
||||
spec:
|
||||
tolerations:
|
||||
- effect: NoExecute
|
||||
key: "node-role.kubernetes.io/etcd"
|
||||
value: "true"
|
||||
- effect: NoSchedule
|
||||
key: "node-role.kubernetes.io/controlplane"
|
||||
value: "true"
|
||||
- operator: Exists
|
||||
containers:
|
||||
- image: alpine
|
||||
- image: busybox:1.28
|
||||
imagePullPolicy: Always
|
||||
name: alpine
|
||||
command: ["sh", "-c", "tail -f /dev/null"]
|
||||
terminationMessagePath: /dev/termination-log
|
||||
```
|
||||
|
||||
2. Launch it using `kubectl create -f ds-alpine.yml`
|
||||
3. Wait until `kubectl rollout status ds/alpine -w` returns: `daemon set "alpine" successfully rolled out`.
|
||||
2. Launch it using `kubectl create -f ds-overlaytest.yml`
|
||||
3. Wait until `kubectl rollout status ds/overlaytest -w` returns: `daemon set "overlaytest" successfully rolled out`.
|
||||
4. Run the following command to let each container on every host ping each other (it's a single line command).
|
||||
|
||||
```
|
||||
echo "=> Start"; kubectl get pods -l name=alpine -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' | while read spod shost; do kubectl get pods -l name=alpine -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' | while read tip thost; do kubectl --request-timeout='10s' exec $spod -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $shost cannot reach $thost; fi; done; done; echo "=> End"
|
||||
echo "=> Start network overlay test"; kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' | while read spod shost; do kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' | while read tip thost; do kubectl --request-timeout='10s' exec $spod -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $shost cannot reach $thost; fi; done; done; echo "=> End network overlay test"
|
||||
```
|
||||
|
||||
5. When this command has finished running, the output indicating everything is correct is:
|
||||
|
||||
```
|
||||
=> Start
|
||||
=> End
|
||||
=> Start network overlay test
|
||||
=> End network overlay test
|
||||
```
|
||||
|
||||
If you see error in the output, that means that the [required ports]({{< baseurl >}}/rancher/v2.x/en/installation/references/) for overlay networking are not opened between the hosts indicated.
|
||||
@@ -68,7 +63,7 @@ If you see error in the output, that means that the [required ports]({{< baseurl
|
||||
Example error output of a situation where NODE1 had the UDP ports blocked.
|
||||
|
||||
```
|
||||
=> Start
|
||||
=> Start network overlay test
|
||||
command terminated with exit code 1
|
||||
NODE2 cannot reach NODE1
|
||||
command terminated with exit code 1
|
||||
@@ -77,9 +72,11 @@ command terminated with exit code 1
|
||||
NODE1 cannot reach NODE2
|
||||
command terminated with exit code 1
|
||||
NODE1 cannot reach NODE3
|
||||
=> End
|
||||
=> End network overlay test
|
||||
```
|
||||
|
||||
Cleanup the alpine DaemonSet by running `kubectl delete ds/overlaytest`.
|
||||
|
||||
### Resolved issues
|
||||
|
||||
#### Overlay network broken when using Canal/Flannel due to missing node annotations
|
||||
|
||||
Reference in New Issue
Block a user