mirror of
https://github.com/rancher/rancher-docs.git
synced 2026-05-06 05:03:27 +00:00
Add v2.14 preview docs (#2212)
This commit is contained in:
@@ -0,0 +1,224 @@
|
||||
---
|
||||
title: DNS
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/dns"/>
|
||||
</head>
|
||||
|
||||
The commands/steps listed on this page can be used to check name resolution issues in your cluster.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kube_config_cluster.yml` for Rancher HA) or are using the embedded kubectl via the UI.
|
||||
|
||||
Before running the DNS checks, check the [default DNS provider](../../reference-guides/cluster-configuration/rancher-server-configuration/rke1-cluster-configuration.md#default-dns-provider) for your cluster and make sure that [the overlay network is functioning correctly](networking.md#check-if-overlay-network-is-functioning-correctly) as this can also be the reason why DNS resolution (partly) fails.
|
||||
|
||||
## Check if DNS pods are running
|
||||
|
||||
```
|
||||
kubectl -n kube-system get pods -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
Example output when using CoreDNS:
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
coredns-799dffd9c4-6jhlz 1/1 Running 0 76m
|
||||
```
|
||||
|
||||
Example output when using kube-dns:
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
kube-dns-5fd74c7488-h6f7n 3/3 Running 0 4m13s
|
||||
```
|
||||
|
||||
## Check if the DNS service is present with the correct cluster-ip
|
||||
|
||||
```
|
||||
kubectl -n kube-system get svc -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
```
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP 4m13s
|
||||
```
|
||||
|
||||
## Check if domain names are resolving
|
||||
|
||||
Check if internal cluster names are resolving (in this example, `kubernetes.default`), the IP shown after `Server:` should be the same as the `CLUSTER-IP` from the `kube-dns` service.
|
||||
|
||||
```
|
||||
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Server: 10.43.0.10
|
||||
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local
|
||||
|
||||
Name: kubernetes.default
|
||||
Address 1: 10.43.0.1 kubernetes.default.svc.cluster.local
|
||||
pod "busybox" deleted
|
||||
```
|
||||
|
||||
Check if external names are resolving (in this example, `www.google.com`)
|
||||
|
||||
```
|
||||
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup www.google.com
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Server: 10.43.0.10
|
||||
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local
|
||||
|
||||
Name: www.google.com
|
||||
Address 1: 2a00:1450:4009:80b::2004 lhr35s04-in-x04.1e100.net
|
||||
Address 2: 216.58.211.100 ams15s32-in-f4.1e100.net
|
||||
pod "busybox" deleted
|
||||
```
|
||||
|
||||
If you want to check resolving of domain names on all of the hosts, execute the following steps:
|
||||
|
||||
1. Save the following file as `ds-dnstest.yml`
|
||||
|
||||
```
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: dnstest
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
name: dnstest
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: dnstest
|
||||
spec:
|
||||
tolerations:
|
||||
- operator: Exists
|
||||
containers:
|
||||
- image: busybox:1.28
|
||||
imagePullPolicy: Always
|
||||
name: alpine
|
||||
command: ["sleep", "infinity"]
|
||||
terminationMessagePath: /dev/termination-log
|
||||
```
|
||||
|
||||
2. Launch it using `kubectl create -f ds-dnstest.yml`
|
||||
3. Wait until `kubectl rollout status ds/dnstest -w` returns: `daemon set "dnstest" successfully rolled out`.
|
||||
4. Configure the environment variable `DOMAIN` to a fully qualified domain name (FQDN) that the host should be able to resolve (`www.google.com` is used as an example) and run the following command to let each container on every host resolve the configured domain name (it's a single line command).
|
||||
|
||||
```
|
||||
export DOMAIN=www.google.com; echo "=> Start DNS resolve test"; kubectl get pods -l name=dnstest --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do kubectl exec $pod -- /bin/sh -c "nslookup $DOMAIN > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $host cannot resolve $DOMAIN; fi; done; echo "=> End DNS resolve test"
|
||||
```
|
||||
|
||||
5. When this command has finished running, the output indicating everything is correct is:
|
||||
|
||||
```
|
||||
=> Start DNS resolve test
|
||||
=> End DNS resolve test
|
||||
```
|
||||
|
||||
If you see error in the output, that means that the mentioned host(s) is/are not able to resolve the given FQDN.
|
||||
|
||||
Example error output of a situation where host with IP 209.97.182.150 had the UDP ports blocked.
|
||||
|
||||
```
|
||||
=> Start DNS resolve test
|
||||
command terminated with exit code 1
|
||||
209.97.182.150 cannot resolve www.google.com
|
||||
=> End DNS resolve test
|
||||
```
|
||||
|
||||
Cleanup the alpine DaemonSet by running `kubectl delete ds/dnstest`.
|
||||
|
||||
## CoreDNS specific
|
||||
|
||||
### Check CoreDNS logging
|
||||
|
||||
```
|
||||
kubectl -n kube-system logs -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
### Check configuration
|
||||
|
||||
CoreDNS configuration is stored in the configmap `coredns` in the `kube-system` namespace.
|
||||
|
||||
```
|
||||
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
|
||||
```
|
||||
|
||||
### Check upstream nameservers in resolv.conf
|
||||
|
||||
By default, the configured nameservers on the host (in `/etc/resolv.conf`) will be used as upstream nameservers for CoreDNS. You can check this file on the host or run the following Pod with `dnsPolicy` set to `Default`, which will inherit the `/etc/resolv.conf` from the host it is running on.
|
||||
|
||||
```
|
||||
kubectl run -i --restart=Never --rm test-${RANDOM} --image=ubuntu --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"dnsPolicy":"Default"}}' -- sh -c 'cat /etc/resolv.conf'
|
||||
```
|
||||
|
||||
### Enable query logging
|
||||
|
||||
Enabling query logging can be done by enabling the [log plugin](https://coredns.io/plugins/log/) in the Corefile configuration in the configmap `coredns`. You can do so by using `kubectl -n kube-system edit configmap coredns` or use the command below to replace the configuration in place:
|
||||
|
||||
```
|
||||
kubectl get configmap -n kube-system coredns -o json | sed -e 's_loadbalance_log\\n loadbalance_g' | kubectl apply -f -
|
||||
```
|
||||
|
||||
All queries will now be logged and can be checked using the command in [Check CoreDNS logging](#check-coredns-logging).
|
||||
|
||||
## kube-dns specific
|
||||
|
||||
### Check upstream nameservers in kubedns container
|
||||
|
||||
By default, the configured nameservers on the host (in `/etc/resolv.conf`) will be used as upstream nameservers for kube-dns. Sometimes the host will run a local caching DNS nameserver, which means the address in `/etc/resolv.conf` will point to an address in the loopback range (`127.0.0.0/8`) which will be unreachable by the container. In case of Ubuntu 18.04, this is done by `systemd-resolved`. We detect if `systemd-resolved` is running, and will automatically use the `/etc/resolv.conf` file with the correct upstream nameservers (which is located at `/run/systemd/resolve/resolv.conf`).
|
||||
|
||||
Use the following command to check the upstream nameservers used by the kubedns container:
|
||||
|
||||
```
|
||||
kubectl -n kube-system get pods -l k8s-app=kube-dns --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do echo "Pod ${pod} on host ${host}"; kubectl -n kube-system exec $pod -c kubedns cat /etc/resolv.conf; done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Pod kube-dns-667c7cb9dd-z4dsf on host x.x.x.x
|
||||
nameserver 1.1.1.1
|
||||
nameserver 8.8.4.4
|
||||
```
|
||||
|
||||
If the output shows an address in the loopback range (`127.0.0.0/8`), you can correct this in two ways:
|
||||
|
||||
* Make sure the correct nameservers are listed in `/etc/resolv.conf` on your nodes in the cluster, please consult your operating system documentation on how to do this. Make sure you execute this before provisioning a cluster, or reboot the nodes after making the modification.
|
||||
* Configure the `kubelet` to use a different file for resolving names, by using `extra_args` as shown below (where `/run/resolvconf/resolv.conf` is the file with the correct nameservers):
|
||||
|
||||
```
|
||||
services:
|
||||
kubelet:
|
||||
extra_args:
|
||||
resolv-conf: "/run/resolvconf/resolv.conf"
|
||||
```
|
||||
|
||||
:::note
|
||||
|
||||
As the `kubelet` is running inside a container, the path for files located in `/etc` and `/usr` are in `/host/etc` and `/host/usr` inside the `kubelet` container.
|
||||
|
||||
:::
|
||||
|
||||
See [Editing Cluster as YAML](../../reference-guides/cluster-configuration/rancher-server-configuration/rke1-cluster-configuration.md#editing-clusters-with-yaml) how to apply this change. When the provisioning of the cluster has finished, you have to remove the kube-dns pod to activate the new setting in the pod:
|
||||
|
||||
```
|
||||
kubectl delete pods -n kube-system -l k8s-app=kube-dns
|
||||
pod "kube-dns-5fd74c7488-6pwsf" deleted
|
||||
```
|
||||
|
||||
Try to resolve name again using [Check if domain names are resolving](#check-if-domain-names-are-resolving).
|
||||
|
||||
If you want to check the kube-dns configuration in your cluster (for example, to check if there are different upstream nameservers configured), you can run the following command to list the kube-dns configuration:
|
||||
|
||||
```
|
||||
kubectl -n kube-system get configmap kube-dns -o go-template='{{range $key, $value := .data}}{{ $key }}{{":"}}{{ $value }}{{"\n"}}{{end}}'
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
upstreamNameservers:["1.1.1.1"]
|
||||
```
|
||||
+33
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: Rotation of Expired Webhook Certificates
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/expired-webhook-certificate-rotation"/>
|
||||
</head>
|
||||
|
||||
For Rancher versions that have `rancher-webhook` installed, certain versions created certificates that will expire after one year. It will be necessary for you to rotate your webhook certificate if the certificate did not renew.
|
||||
|
||||
In Rancher v2.6.3 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date. If you are using v2.6.2 or below, there are two methods to work around this issue:
|
||||
|
||||
## 1. Users with Cluster Access, Run the Following Commands:
|
||||
|
||||
```
|
||||
kubectl delete secret -n cattle-system cattle-webhook-tls
|
||||
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io --ignore-not-found=true rancher.cattle.io
|
||||
kubectl delete pod -n cattle-system -l app=rancher-webhook
|
||||
```
|
||||
|
||||
## 2. Users with No Cluster Access Via `kubectl`:
|
||||
|
||||
1. Delete the `cattle-webhook-tls` secret in the `cattle-system` namespace in the local cluster.
|
||||
|
||||
2. Delete the `rancher.cattle.io` mutating webhook
|
||||
|
||||
3. Delete the `rancher-webhook` pod in the `cattle-system` namespace in the local cluster.
|
||||
|
||||
:::note
|
||||
|
||||
The webhook certificate expiration issue is not specific to `cattle-webhook-tls` as listed in the examples. You will fill in your expired certificate secret accordingly.
|
||||
|
||||
:::
|
||||
+259
@@ -0,0 +1,259 @@
|
||||
---
|
||||
title: Kubernetes Resources
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/kubernetes-resources"/>
|
||||
</head>
|
||||
|
||||
The commands/steps listed on this page can be used to check the most important Kubernetes resources and apply to [Rancher Launched Kubernetes](../../how-to-guides/new-user-guides/launch-kubernetes-with-rancher/launch-kubernetes-with-rancher.md) clusters.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kube_config_cluster.yml` for Rancher HA) or are using the embedded kubectl via the UI.
|
||||
|
||||
|
||||
## Nodes
|
||||
|
||||
### Get nodes
|
||||
|
||||
Run the command below and check the following:
|
||||
|
||||
- All nodes in your cluster should be listed, make sure there is not one missing.
|
||||
- All nodes should have the **Ready** status (if not in **Ready** state, check the `kubelet` container logs on that node using `docker logs kubelet`)
|
||||
- Check if all nodes report the correct version.
|
||||
- Check if OS/Kernel/Docker values are shown as expected (possibly you can relate issues due to upgraded OS/Kernel/Docker)
|
||||
|
||||
|
||||
```
|
||||
kubectl get nodes -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
|
||||
controlplane-0 Ready controlplane 31m v1.13.5 138.68.188.91 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
|
||||
etcd-0 Ready etcd 31m v1.13.5 138.68.180.33 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
|
||||
worker-0 Ready worker 30m v1.13.5 139.59.179.88 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
|
||||
```
|
||||
|
||||
### Get node conditions
|
||||
|
||||
Run the command below to list nodes with [Node Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition)
|
||||
|
||||
```
|
||||
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{end}}'
|
||||
```
|
||||
|
||||
Run the command below to list nodes with [Node Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) that are active that could prevent normal operation.
|
||||
|
||||
```
|
||||
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{if ne .type "Ready"}}{{if eq .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{else}}{{if ne .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{": "}}{{.status}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}'
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
worker-0: DiskPressure:True
|
||||
```
|
||||
|
||||
## Kubernetes leader election
|
||||
|
||||
### Kubernetes Controller Manager leader
|
||||
|
||||
The leader is determined by a leader election process. After the leader has been determined, the leader (`holderIdentity`) is saved in the `kube-controller-manager` endpoint (in this example, `controlplane-0`).
|
||||
|
||||
```
|
||||
kubectl -n kube-system get endpoints kube-controller-manager -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'
|
||||
{"holderIdentity":"controlplane-0_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","leaseDurationSeconds":15,"acquireTime":"2018-12-27T08:59:45Z","renewTime":"2018-12-27T09:44:57Z","leaderTransitions":0}>
|
||||
```
|
||||
|
||||
### Kubernetes Scheduler leader
|
||||
|
||||
The leader is determined by a leader election process. After the leader has been determined, the leader (`holderIdentity`) is saved in the `kube-scheduler` endpoint (in this example, `controlplane-0`).
|
||||
|
||||
```
|
||||
kubectl -n kube-system get endpoints kube-scheduler -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'
|
||||
{"holderIdentity":"controlplane-0_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","leaseDurationSeconds":15,"acquireTime":"2018-12-27T08:59:45Z","renewTime":"2018-12-27T09:44:57Z","leaderTransitions":0}>
|
||||
```
|
||||
|
||||
## Ingress Controller
|
||||
|
||||
The default Ingress Controller is NGINX and is deployed as a DaemonSet in the `ingress-nginx` namespace. The pods are only scheduled to nodes with the `worker` role.
|
||||
|
||||
Check if the pods are running on all nodes:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx get pods -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx get pods -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
default-http-backend-797c5bc547-kwwlq 1/1 Running 0 17m x.x.x.x worker-1
|
||||
nginx-ingress-controller-4qd64 1/1 Running 0 14m x.x.x.x worker-1
|
||||
nginx-ingress-controller-8wxhm 1/1 Running 0 13m x.x.x.x worker-0
|
||||
```
|
||||
|
||||
If a pod is unable to run (Status is not **Running**, Ready status is not showing `1/1` or you see a high count of Restarts), check the pod details, logs and namespace events.
|
||||
|
||||
### Pod details
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx describe pods -l app=ingress-nginx
|
||||
```
|
||||
|
||||
### Pod container logs
|
||||
|
||||
The below command can show the logs of all the pods labeled "app=ingress-nginx", but it will display only 10 lines of log because of the restrictions of the `kubectl logs` command. Refer to `--tail` of `kubectl logs -h` for more information.
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx logs -l app=ingress-nginx
|
||||
```
|
||||
|
||||
If the full log is needed, specify the pod name in the trailing command:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx logs <pod name>
|
||||
```
|
||||
|
||||
### Namespace events
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx get events
|
||||
```
|
||||
|
||||
### Debug logging
|
||||
|
||||
To enable debug logging:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx patch ds nginx-ingress-controller --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--v=5"}]'
|
||||
```
|
||||
|
||||
### Check configuration
|
||||
|
||||
Retrieve generated configuration in each pod:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx get pods -l app=ingress-nginx --no-headers -o custom-columns=.NAME:.metadata.name | while read pod; do kubectl -n ingress-nginx exec $pod -- cat /etc/nginx/nginx.conf; done
|
||||
```
|
||||
|
||||
## Rancher agents
|
||||
|
||||
Communication to the cluster (Kubernetes API via `cattle-cluster-agent`) and communication to the nodes (cluster provisioning via `cattle-node-agent`) is done through Rancher agents.
|
||||
|
||||
#### cattle-node-agent
|
||||
|
||||
Check if the cattle-node-agent pods are present on each node, have status **Running** and don't have a high count of Restarts:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get pods -l app=cattle-agent -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
cattle-node-agent-4gc2p 1/1 Running 0 2h x.x.x.x worker-1
|
||||
cattle-node-agent-8cxkk 1/1 Running 0 2h x.x.x.x etcd-1
|
||||
cattle-node-agent-kzrlg 1/1 Running 0 2h x.x.x.x etcd-0
|
||||
cattle-node-agent-nclz9 1/1 Running 0 2h x.x.x.x controlplane-0
|
||||
cattle-node-agent-pwxp7 1/1 Running 0 2h x.x.x.x worker-0
|
||||
cattle-node-agent-t5484 1/1 Running 0 2h x.x.x.x controlplane-1
|
||||
cattle-node-agent-t8mtz 1/1 Running 0 2h x.x.x.x etcd-2
|
||||
```
|
||||
|
||||
Check logging of a specific cattle-node-agent pod or all cattle-node-agent pods:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system logs -l app=cattle-agent
|
||||
```
|
||||
|
||||
#### cattle-cluster-agent
|
||||
|
||||
Check if the cattle-cluster-agent pod is present in the cluster, has status **Running** and doesn't have a high count of Restarts:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get pods -l app=cattle-cluster-agent -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
cattle-cluster-agent-54d7c6c54d-ht9h4 1/1 Running 0 2h x.x.x.x worker-1
|
||||
```
|
||||
|
||||
Check logging of cattle-cluster-agent pod:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system logs -l app=cattle-cluster-agent
|
||||
```
|
||||
|
||||
## Jobs and Pods
|
||||
|
||||
### Check that pods or jobs have status **Running**/**Completed**
|
||||
|
||||
To check, run the command:
|
||||
|
||||
```
|
||||
kubectl get pods --all-namespaces
|
||||
```
|
||||
|
||||
If a pod is not in **Running** state, you can dig into the root cause by running:
|
||||
|
||||
### Describe pod
|
||||
|
||||
```
|
||||
kubectl describe pod POD_NAME -n NAMESPACE
|
||||
```
|
||||
|
||||
### Pod container logs
|
||||
|
||||
```
|
||||
kubectl logs POD_NAME -n NAMESPACE
|
||||
```
|
||||
|
||||
If a job is not in **Completed** state, you can dig into the root cause by running:
|
||||
|
||||
### Describe job
|
||||
|
||||
```
|
||||
kubectl describe job JOB_NAME -n NAMESPACE
|
||||
```
|
||||
|
||||
### Logs from the containers of pods of the job
|
||||
|
||||
```
|
||||
kubectl logs -l job-name=JOB_NAME -n NAMESPACE
|
||||
```
|
||||
|
||||
### Evicted pods
|
||||
|
||||
Pods can be evicted based on [eviction signals](https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-policy).
|
||||
|
||||
Retrieve a list of evicted pods (podname and namespace):
|
||||
|
||||
```
|
||||
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}'
|
||||
```
|
||||
|
||||
To delete all evicted pods:
|
||||
|
||||
```
|
||||
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}' | while read epod enamespace; do kubectl -n $enamespace delete pod $epod; done
|
||||
```
|
||||
|
||||
Retrieve a list of evicted pods, scheduled node and the reason:
|
||||
|
||||
```
|
||||
kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{if eq .status.reason "Evicted"}}{{.metadata.name}}{{" "}}{{.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}' | while read epod enamespace; do kubectl -n $enamespace get pod $epod -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,MSG:.status.message; done
|
||||
```
|
||||
|
||||
### Job does not complete
|
||||
|
||||
If you have enabled Istio, and you are having issues with a Job you deployed not completing, you will need to add an annotation to your pod using [these steps.](../../how-to-guides/advanced-user-guides/istio-setup-guide/enable-istio-in-namespace.md)
|
||||
|
||||
Since Istio Sidecars run indefinitely, a Job cannot be considered complete even after its task has completed. This is a temporary workaround and will disable Istio for any traffic to/from the annotated Pod. Keep in mind this may not allow you to continue to use a Job for integration testing, as the Job will not have access to the service mesh.
|
||||
@@ -0,0 +1,96 @@
|
||||
---
|
||||
title: Logging
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/logging"/>
|
||||
</head>
|
||||
|
||||
## Log Levels
|
||||
|
||||
The following log levels are used in Rancher:
|
||||
|
||||
| Name | Description |
|
||||
|---------|-------------|
|
||||
| `info` | Logs informational messages. This is the default log level. |
|
||||
| `debug` | Logs more detailed messages that can be used to debug. |
|
||||
| `trace` | Logs very detailed messages on internal functions. This is very verbose and can contain sensitive information. |
|
||||
|
||||
### How to Configure a Log Level
|
||||
|
||||
#### Kubernetes Install
|
||||
|
||||
* Configure debug log level
|
||||
|
||||
```
|
||||
$ KUBECONFIG=./kube_config_cluster.yml
|
||||
$ kubectl -n cattle-system get pods -l app=rancher --no-headers -o custom-columns=name:.metadata.name | while read rancherpod; do kubectl -n cattle-system exec $rancherpod -c rancher -- loglevel --set debug; done
|
||||
OK
|
||||
OK
|
||||
OK
|
||||
$ kubectl -n cattle-system logs -l app=rancher -c rancher
|
||||
```
|
||||
|
||||
* Configure info log level
|
||||
|
||||
```
|
||||
$ KUBECONFIG=./kube_config_cluster.yml
|
||||
$ kubectl -n cattle-system get pods -l app=rancher --no-headers -o custom-columns=name:.metadata.name | while read rancherpod; do kubectl -n cattle-system exec $rancherpod -c rancher -- loglevel --set info; done
|
||||
OK
|
||||
OK
|
||||
OK
|
||||
```
|
||||
|
||||
#### Docker Install
|
||||
|
||||
* Configure debug log level
|
||||
|
||||
```
|
||||
$ docker exec -ti <container_id> loglevel --set debug
|
||||
OK
|
||||
$ docker logs -f <container_id>
|
||||
```
|
||||
|
||||
* Configure info log level
|
||||
|
||||
```
|
||||
$ docker exec -ti <container_id> loglevel --set info
|
||||
OK
|
||||
```
|
||||
|
||||
## Rancher Machine Debug Logs
|
||||
If you need to troubleshoot the creation of objects in your infrastructure provider of choice, `rancher-machine`
|
||||
debug logs might be helpful to you.
|
||||
|
||||
It's possible to enable debug logs for `rancher-machine` by setting environment variables when launching Rancher.
|
||||
|
||||
The `CATTLE_WHITELIST_ENVVARS` environment variable allows users to whitelist specific environment variables to be
|
||||
passed down to `rancher-machine` during provisioning.
|
||||
|
||||
The `MACHINE_DEBUG` variable enables debug logs in `rancher-machine`.
|
||||
|
||||
Thus, by setting `MACHINE_DEBUG=true` and adding `MACHINE_DEBUG` to the default list of variables in
|
||||
`CATTLE_WHITELIST_ENVVARS` (e.g. `CATTLE_WHITELIST_ENVVARS=HTTP_PROXY,HTTPS_PROXY,NO_PROXY,MACHINE_DEBUG`) it is
|
||||
possible to enable debug logs in `rancher-machine` when provisioning RKE1, RKE2 and k3s clusters.
|
||||
|
||||
:::caution
|
||||
|
||||
Just like the `trace` log level above, `rancher-machine` debug logs can contain sensitive information.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
## Cattle-cluster-agent Debug Logs
|
||||
|
||||
The `cattle-cluster-agent` log levels can be set when you initialize downstream clusters.
|
||||
|
||||
When you create a cluster under **Cluster Configuration > Agent Environment Vars** you can set variables to define the log level.
|
||||
- Trace-level logging: Set `CATTLE_TRACE` or `RANCHER_TRACE` to `true`
|
||||
|
||||
- Debug-level logging: Set `CATTLE_DEBUG` or `RANCHER_DEBUG` to `true`
|
||||
|
||||
:::caution
|
||||
|
||||
The `cattle-cluster-agent` debug logs may contain sensitive information.
|
||||
|
||||
:::
|
||||
@@ -0,0 +1,110 @@
|
||||
---
|
||||
title: Networking
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/networking"/>
|
||||
</head>
|
||||
|
||||
The commands/steps listed on this page can be used to check networking related issues in your cluster.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kube_config_cluster.yml` for Rancher HA) or are using the embedded kubectl via the UI.
|
||||
|
||||
## Double Check if All the Required Ports are Opened in Your (Host) Firewall
|
||||
|
||||
Double check if all the [required ports](../../how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/node-requirements-for-rancher-managed-clusters.md#networking-requirements) are opened in your (host) firewall. The overlay network uses UDP in comparison to all other required ports which are TCP.
|
||||
|
||||
|
||||
## Check if Overlay Network is Functioning Correctly
|
||||
|
||||
The pod can be scheduled to any of the hosts you used for your cluster, but that means that the NGINX ingress controller needs to be able to route the request from `NODE_1` to `NODE_2`. This happens over the overlay network. If the overlay network is not functioning, you will experience intermittent TCP/HTTP connection failures due to the NGINX ingress controller not being able to route to the pod.
|
||||
|
||||
To test the overlay network, you can launch the following `DaemonSet` definition. This will run a `swiss-army-knife` container on every host (image was developed by Rancher engineers and can be found here: https://github.com/rancherlabs/swiss-army-knife), which we will use to run a `ping` test between containers on all hosts.
|
||||
|
||||
:::caution
|
||||
|
||||
The `swiss-army-knife` container does not support Windows nodes. It also [does not support ARM nodes](https://github.com/leodotcloud/swiss-army-knife/issues/18), such as a Raspberry Pi. When the test encounters incompatible nodes, this is recorded in the pod logs as an error message, such as `exec user process caused: exec format error` for ARM nodes, or `ImagePullBackOff (Back-off pulling image "rancherlabs/swiss-army-knife)` for Windows nodes.
|
||||
|
||||
:::
|
||||
|
||||
1. Save the following file as `overlaytest.yml`
|
||||
|
||||
```
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: overlaytest
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
name: overlaytest
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: overlaytest
|
||||
spec:
|
||||
tolerations:
|
||||
- operator: Exists
|
||||
containers:
|
||||
- image: rancherlabs/swiss-army-knife
|
||||
imagePullPolicy: Always
|
||||
name: overlaytest
|
||||
command: ["sleep", "infinity"]
|
||||
terminationMessagePath: /dev/termination-log
|
||||
|
||||
```
|
||||
|
||||
2. Launch it using `kubectl create -f overlaytest.yml`
|
||||
3. Wait until `kubectl rollout status ds/overlaytest -w` returns: `daemon set "overlaytest" successfully rolled out`.
|
||||
4. Run the following script, from the same location. It will have each `overlaytest` container on every host ping each other:
|
||||
```
|
||||
#!/bin/bash
|
||||
echo "=> Start network overlay test"
|
||||
kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' |
|
||||
while read spod shost
|
||||
do kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' |
|
||||
while read tip thost
|
||||
do kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"
|
||||
RC=$?
|
||||
if [ $RC -ne 0 ]
|
||||
then echo FAIL: $spod on $shost cannot reach pod IP $tip on $thost
|
||||
else echo $shost can reach $thost
|
||||
fi
|
||||
done
|
||||
done
|
||||
echo "=> End network overlay test"
|
||||
```
|
||||
|
||||
5. When this command has finished running, it will output the state of each route:
|
||||
|
||||
```
|
||||
=> Start network overlay test
|
||||
Error from server (NotFound): pods "wk2" not found
|
||||
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.7.3 on wk2
|
||||
Error from server (NotFound): pods "wk2" not found
|
||||
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.0.5 on cp1
|
||||
Error from server (NotFound): pods "wk2" not found
|
||||
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.2.12 on wk1
|
||||
command terminated with exit code 1
|
||||
FAIL: overlaytest-v4qkl on cp1 cannot reach pod IP 10.42.7.3 on wk2
|
||||
cp1 can reach cp1
|
||||
cp1 can reach wk1
|
||||
command terminated with exit code 1
|
||||
FAIL: overlaytest-xpxwp on wk1 cannot reach pod IP 10.42.7.3 on wk2
|
||||
wk1 can reach cp1
|
||||
wk1 can reach wk1
|
||||
=> End network overlay test
|
||||
```
|
||||
If you see error in the output, there is some issue with the route between the pods on the two hosts. In the above output the node `wk2` has no connectivity over the overlay network. This could be because the [required ports](../../how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/node-requirements-for-rancher-managed-clusters.md#networking-requirements) for overlay networking are not opened for `wk2`.
|
||||
6. You can now clean up the DaemonSet by running `kubectl delete ds/overlaytest`.
|
||||
|
||||
|
||||
### Check if MTU is Correctly Configured on Hosts and on Peering/Tunnel Appliances/Devices
|
||||
|
||||
When the MTU is incorrectly configured (either on hosts running Rancher, nodes in created/imported clusters or on appliances/devices in between), error messages will be logged in Rancher and in the agents, similar to:
|
||||
|
||||
* `websocket: bad handshake`
|
||||
* `Failed to connect to proxy`
|
||||
* `read tcp: i/o timeout`
|
||||
|
||||
See [Google Cloud VPN: MTU Considerations](https://cloud.google.com/vpn/docs/concepts/mtu-considerations#gateway_mtu_vs_system_mtu) for an example how to configure MTU correctly when using Google Cloud VPN between Rancher and cluster nodes.
|
||||
@@ -0,0 +1,112 @@
|
||||
---
|
||||
title: Rancher HA
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/rancher-ha"/>
|
||||
</head>
|
||||
|
||||
The commands/steps listed on this page can be used to check your Rancher Kubernetes Installation.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kube_config_cluster.yml`).
|
||||
|
||||
## Check Rancher Pods
|
||||
|
||||
Rancher pods are deployed as a Deployment in the `cattle-system` namespace.
|
||||
|
||||
Check if the pods are running on all nodes:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get pods -l app=rancher -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
rancher-7dbd7875f7-n6t5t 1/1 Running 0 8m x.x.x.x x.x.x.x
|
||||
rancher-7dbd7875f7-qbj5k 1/1 Running 0 8m x.x.x.x x.x.x.x
|
||||
rancher-7dbd7875f7-qw7wb 1/1 Running 0 8m x.x.x.x x.x.x.x
|
||||
```
|
||||
|
||||
If a pod is unable to run (Status is not **Running**, Ready status is not showing `1/1` or you see a high count of Restarts), check the pod details, logs and namespace events.
|
||||
|
||||
### Pod Details
|
||||
|
||||
```
|
||||
kubectl -n cattle-system describe pods -l app=rancher
|
||||
```
|
||||
|
||||
### Pod Container Logs
|
||||
|
||||
```
|
||||
kubectl -n cattle-system logs -l app=rancher
|
||||
```
|
||||
|
||||
### Namespace Events
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get events
|
||||
```
|
||||
|
||||
## Check Ingress
|
||||
|
||||
Ingress should have the correct `HOSTS` (showing the configured FQDN) and `ADDRESS` (host address(es) it will be routed to).
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get ingress
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME HOSTS ADDRESS PORTS AGE
|
||||
rancher rancher.yourdomain.com x.x.x.x,x.x.x.x,x.x.x.x 80, 443 2m
|
||||
```
|
||||
|
||||
## Check Ingress Controller Logs
|
||||
|
||||
When accessing your configured Rancher FQDN does not show you the UI, check the ingress controller logging to see what happens when you try to access Rancher:
|
||||
|
||||
```
|
||||
kubectl -n ingress-nginx logs -l app=ingress-nginx
|
||||
```
|
||||
|
||||
## Leader Election
|
||||
|
||||
The leader is determined by a leader election process. After the leader has been determined, the leader (`holderIdentity`) is saved in the `cattle-controllers` Lease in the `kube-system` namespace (in this example, `rancher-dbc7ff869-gvg6k`).
|
||||
|
||||
```
|
||||
kubectl -n kube-system get lease cattle-controllers
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME HOLDER AGE
|
||||
cattle-controllers rancher-dbc7ff869-gvg6k 6h10m
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
_Available as of Rancher 2.8.3_
|
||||
|
||||
If the Kubernetes API experiences latency, the Rancher replica holding the leader lock may not be able to renew the lease before the lease becomes invalid, which can be observed in the Rancher logs:
|
||||
```
|
||||
E0629 04:13:07.293461 34 leaderelection.go:364] Failed to update lock: Put "https://172.17.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s": context deadline exceeded
|
||||
I0629 04:13:07.293594 34 leaderelection.go:280] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
|
||||
...
|
||||
2024/06/29 04:13:10 [FATAL] leaderelection lost for cattle-controllers
|
||||
```
|
||||
|
||||
To mitigate this, you can set environment variables in the `rancher` Deployment to modify the default parameters for leader election:
|
||||
- `CATTLE_ELECTION_LEASE_DURATION`: The [lease duration](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig.LeaseDuration). The default value is 45s.
|
||||
- `CATTLE_ELECTION_RENEW_DEADLINE`: The [renew deadline](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig.RenewDeadline). The default value is 30s.
|
||||
- `CATTLE_ELECTION_RETRY_PERIOD`: The [retry period](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig.RetryPeriod). The default value is 2s.
|
||||
|
||||
Example:
|
||||
```
|
||||
kubectl -n cattle-system set env deploy/rancher CATTLE_ELECTION_LEASE_DURATION=2m CATTLE_ELECTION_RENEW_DEADLINE=90s CATTLE_ELECTION_RETRY_PERIOD=10s
|
||||
```
|
||||
This will temporarily increase the lease duration, renew deadline and retry period to 120, 90 and 10 seconds respectively.
|
||||
Alternatively, in order to make such changes permanent, these environment variables can be set by [using Helm values](../../getting-started/installation-and-upgrade/installation-references/helm-chart-options.md#setting-extra-environment-variables) instead.
|
||||
+71
@@ -0,0 +1,71 @@
|
||||
---
|
||||
title: Registered Clusters
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/registered-clusters"/>
|
||||
</head>
|
||||
|
||||
The commands/steps listed on this page can be used to check clusters that you are registering or that are registered in Rancher.
|
||||
|
||||
Make sure you configured the correct kubeconfig (for example, `export KUBECONFIG=$PWD/kubeconfig_from_imported_cluster.yml`)
|
||||
|
||||
## Rancher Agents
|
||||
|
||||
Communication to the cluster (Kubernetes API via cattle-cluster-agent) and communication to the nodes is done through Rancher agents.
|
||||
|
||||
If the cattle-cluster-agent cannot connect to the configured `server-url`, the cluster will remain in **Pending** state, showing `Waiting for full cluster configuration`.
|
||||
|
||||
### cattle-node-agent
|
||||
|
||||
:::note
|
||||
|
||||
cattle-node-agents are only present in clusters created in Rancher with RKE.
|
||||
|
||||
:::
|
||||
|
||||
Check if the cattle-node-agent pods are present on each node, have status **Running** and don't have a high count of Restarts:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get pods -l app=cattle-agent -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
cattle-node-agent-4gc2p 1/1 Running 0 2h x.x.x.x worker-1
|
||||
cattle-node-agent-8cxkk 1/1 Running 0 2h x.x.x.x etcd-1
|
||||
cattle-node-agent-kzrlg 1/1 Running 0 2h x.x.x.x etcd-0
|
||||
cattle-node-agent-nclz9 1/1 Running 0 2h x.x.x.x controlplane-0
|
||||
cattle-node-agent-pwxp7 1/1 Running 0 2h x.x.x.x worker-0
|
||||
cattle-node-agent-t5484 1/1 Running 0 2h x.x.x.x controlplane-1
|
||||
cattle-node-agent-t8mtz 1/1 Running 0 2h x.x.x.x etcd-2
|
||||
```
|
||||
|
||||
Check logging of a specific cattle-node-agent pod or all cattle-node-agent pods:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system logs -l app=cattle-agent
|
||||
```
|
||||
|
||||
### cattle-cluster-agent
|
||||
|
||||
Check if the cattle-cluster-agent pod is present in the cluster, has status **Running** and doesn't have a high count of Restarts:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system get pods -l app=cattle-cluster-agent -o wide
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE IP NODE
|
||||
cattle-cluster-agent-54d7c6c54d-ht9h4 1/1 Running 0 2h x.x.x.x worker-1
|
||||
```
|
||||
|
||||
Check logging of cattle-cluster-agent pod:
|
||||
|
||||
```
|
||||
kubectl -n cattle-system logs -l app=cattle-cluster-agent
|
||||
```
|
||||
+26
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: User ID Tracking in Audit Logs
|
||||
---
|
||||
|
||||
<head>
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/user-id-tracking-in-audit-logs"/>
|
||||
</head>
|
||||
|
||||
The following audit logs are used in Rancher to track events occuring on the local and downstream clusters:
|
||||
|
||||
* [Kubernetes Audit Logs](https://rancher.com/docs/rke/latest/en/config-options/audit-log/)
|
||||
* [Rancher API Audit Logs](../../how-to-guides/advanced-user-guides/enable-api-audit-log.md)
|
||||
|
||||
Audit logs in Rancher v2.6 have been enhanced to include the external Identity Provider name (common name of the user in the external Auth provider) in both the Rancher and downstream Kubernetes audit logs.
|
||||
|
||||
Before v2.6, a Rancher Admin could not trace an event from the Rancher audit logs and into the Kubernetes audit logs without knowing the mapping of the external Identity Provider username to the userId (`u-xXXX`) used in Rancher.
|
||||
To know this mapping, the cluster admins needed to have access to Rancher API, UI, and the local management cluster.
|
||||
|
||||
Now with this feature, a downstream cluster admin should be able to look at the Kubernetes audit logs and know which specific external Identity Provider (IDP) user performed an action without needing to view anything in Rancher.
|
||||
If the audit logs are shipped off of the cluster, a user of the logging system should be able to identify the user in the external Identity Provider system.
|
||||
A Rancher Admin should now be able to view Rancher audit logs and follow through to the Kubernetes audit log by using the external Identity Provider username.
|
||||
|
||||
## Feature Description
|
||||
|
||||
- When Kubernetes Audit logs are enabled on the downstream cluster, in each event that is logged, the external Identity Provider's username is now logged for each request, at the "metadata" level.
|
||||
- When you enable Rancher API Audit logs for a Rancher installation, the external Identity Provider's username is also logged now at the `auditLog.level=0` for each request that hits the Rancher API server, including login requests.
|
||||
Reference in New Issue
Block a user