mirror of
https://github.com/rancher/rancher-docs.git
synced 2026-05-22 04:45:19 +00:00
Docs: Migrate etcd troubleshooting guide from RKE/Docker to RKE2/K3s/containerd
- Updates `troubleshooting-etcd-nodes.md` to replace Docker-based commands with `crictl` and `etcdctl` for RKE2 and K3s. - Replaces `curl` connectivity checks with `openssl s_client` to support etcd 3.5+ gRPC requirements and isolate transport layer testing. - Adds prerequisites section with necessary environment exports. - Updates all `etcdctl` commands to use explicit inline certificate paths for RKE2 and K3s. - Replaces shell-dependent container commands with host-side processing to support distroless images. - Updates log level configuration instructions for RKE2/K3s config files.
This commit is contained in:
@@ -6,29 +6,63 @@ title: Troubleshooting etcd Nodes
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/kubernetes-components/troubleshooting-etcd-nodes"/>
|
||||
</head>
|
||||
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role.
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role in RKE2 and K3s clusters.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
As RKE2 and K3s rely on `containerd` as the container runtime, `crictl` replaces Docker for container management. Before proceeding with the troubleshooting commands, configure your environment by exporting the following variables:
|
||||
|
||||
### RKE2
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/var/lib/rancher/rke2/bin/
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
|
||||
etcdcontainer=$(crictl ps --name etcd --quiet)
|
||||
```
|
||||
|
||||
### K3s
|
||||
|
||||
> ### ⚠️ **Warning**
|
||||
> K3s does not include `etcdctl` in the system PATH. If you need to perform etcd troubleshooting on a K3s cluster, you may need to install it or locate it within the K3s data directory.
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/usr/local/bin
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
|
||||
```
|
||||
|
||||
|
||||
## Checking if the etcd Container is Running
|
||||
|
||||
The container for etcd should have status **Up**. The duration shown after **Up** is the time the container has been running.
|
||||
**RKE2**: The container for etcd should be in the **Running** state.
|
||||
|
||||
```
|
||||
docker ps -a -f=name=etcd$
|
||||
```bash
|
||||
crictl ps --name etcd
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
d26adbd23643 rancher/mirrored-coreos-etcd:v3.5.7 "/usr/local/bin/etcd…" 30 minutes ago Up 30 minutes etcd
|
||||
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
|
||||
f1e289d202ed0 11ad16872a9cf 58 minutes ago Running etcd 0 7b56aab8204ea etcd-cluster1 kube-system
|
||||
```
|
||||
|
||||
## etcd Container Logging
|
||||
|
||||
The logging of the container can contain information on what the problem could be.
|
||||
**K3s**: Etcd runs as an embedded process in the K3s service. Check the service status:
|
||||
|
||||
```bash
|
||||
systemctl status k3s
|
||||
```
|
||||
docker logs etcd
|
||||
|
||||
## etcd Logging
|
||||
|
||||
The logs can contain information on what the problem could be.
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl logs $etcdcontainer
|
||||
```
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
journalctl -u k3s | grep -i etcd
|
||||
```
|
||||
| Log | Explanation |
|
||||
|-----|------------------|
|
||||
@@ -46,18 +80,43 @@ The address where etcd is listening depends on the address configuration of the
|
||||
|
||||
Output should contain all the nodes with the `etcd` role and the output should be identical on all nodes.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
Run the command inside the etcd container.
|
||||
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl member list \
|
||||
--cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/rke2/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl member list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl member list \
|
||||
--cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/k3s/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
1c424074df86e854, started, cluster-node1-f289ac71, https://IP:2380, https://IP:2379, false
|
||||
45c68c44c5a792ff, started, cluster-node2-67e3cf6f, https://IP:2380, https://IP:2379, false
|
||||
7c584f77c5180258, started, cluster-node3-e976bc00, https://IP:2380, https://IP:2379, false
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
The values for `RAFT TERM` should be equal and `RAFT INDEX` should be not be too far apart from each other.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint status --write-out table --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint status --write-out table --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -65,17 +124,22 @@ Example output:
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.5.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.5.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.5.7 | 25 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.6.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.6.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.6.7 | 25 MB | false | 72 | 66887 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
```
|
||||
|
||||
### Check Endpoint Health
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint health --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint health
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint health --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -84,54 +148,104 @@ https://IP:2379 is healthy: successfully committed proposal: took = 2.113189ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.649963ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.451201ms
|
||||
```
|
||||
### Check Connectivity on etcd Ports
|
||||
|
||||
### Check Connectivity on Port TCP/2379
|
||||
> In modern versions of Kubernetes, the etcd database (versions 3.5 and newer) introduced significant architectural changes regarding network traffic handling. Previously, etcd permitted standard HTTP REST requests on its primary client port (`2379`). However, to enhance performance and security, etcd 3.5+ strictly enforces the gRPC protocol on this port.<br />
|
||||
If you attempt to use standard HTTP tools like `curl` to test connectivity on port `2379`, the etcd server will automatically terminate the connection or return an error. This behavior often leads administrators to misinterpret the result as a closed port or a node failure.
|
||||
|
||||
Command:
|
||||
Since standard HTTP clients can no longer probe the primary etcd ports, the transport layer must be utilized for network troubleshooting. Using `openssl s_client` instead of `curl` bypasses the gRPC application requirement, allowing the raw TCP and TLS handshake to be tested directly.
|
||||
|
||||
These script isolate the network and security infrastructure from the database application. A successful `Verify return code: 0 (ok)` explicitly confirms four critical infrastructure components:
|
||||
|
||||
* **Network Path:** Routing is functional, and firewalls permit traffic on TCP port `2379` or `2380`.
|
||||
* **Process Availability:** The etcd service is running and actively listening on the designated port.
|
||||
* **Certificate Validity:** The TLS certificates are active, correctly formatted, and have not expired.
|
||||
* **Mutual Authentication (mTLS):** The node successfully authenticates against the cluster's specific Certificate Authority (CA).
|
||||
|
||||
**How these tests differ from the `etcdctl endpoint health` test**:
|
||||
|
||||
If `etcdctl endpoint health` test is failing, run these Connectivity Ports test scripts. If the scripts succeed, your network and certificates are intact, and the issue is likely confined to the etcd database itself. If these scripts fail, the issue is related to a firewall/network restriction, or certificate expiration.
|
||||
|
||||
#### Port TCP/2379
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint}/health"
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/health"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
### Check Connectivity on Port TCP/2380
|
||||
#### Port TCP/2380
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint}/version";
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl --http1.1 -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/version"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
## etcd Alarms
|
||||
|
||||
etcd will trigger alarms, for instance when it runs out of space.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output when NOSPACE alarm is triggered:
|
||||
@@ -154,10 +268,16 @@ Resolutions:
|
||||
|
||||
### Compact the Keyspace
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
rev=$(crictl exec $etcdcontainer etcdctl endpoint status --write-out json --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
crictl exec $etcdcontainer etcdctl compact "$rev" --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
rev=$(docker exec etcd etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
|
||||
docker exec etcd etcdctl compact "$rev"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
rev=$(etcdctl endpoint status --write-out json --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
etcdctl compact "$rev" --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -167,55 +287,39 @@ compacted revision xxx
|
||||
|
||||
### Defrag All etcd Members
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl defrag --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl defrag
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl defrag --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | e973e4419737125 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | 4a509c997b26c206 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | b217e736575e9dd3 | 3.5.7 | 553 kB | true | 32 | 2449410 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
```
|
||||
|
||||
### Disarm Alarm
|
||||
|
||||
After verifying that the DB size went down after compaction and defragmenting, the alarm needs to be disarmed for etcd to allow writes again.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm disarm --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm disarm --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
## Configure Log Level
|
||||
@@ -228,7 +332,7 @@ You can no longer dynamically change the log level in etcd v3.5 or later.
|
||||
|
||||
### etcd v3.5 And Later
|
||||
|
||||
To configure the log level for etcd, edit the cluster YAML:
|
||||
To configure the log level for etcd, edit the cluster configuration YAML:
|
||||
|
||||
```
|
||||
services:
|
||||
@@ -237,20 +341,7 @@ services:
|
||||
log-level: "debug"
|
||||
```
|
||||
|
||||
### etcd v3.4 And Earlier
|
||||
|
||||
In earlier etcd versions, you can use the API to dynamically change the log level. Configure debug logging using the commands below:
|
||||
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"DEBUG"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
|
||||
To reset the log level back to the default (`INFO`), you can use the following command.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"INFO"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
After modifying the configuration, restart the service (`systemctl restart rke2-server` or `systemctl restart k3s`) if you are configuring a stand-alone cluster.
|
||||
|
||||
## etcd Content
|
||||
|
||||
@@ -258,24 +349,40 @@ If you want to investigate the contents of your etcd, you can either watch strea
|
||||
|
||||
### Watch Streaming Events
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl watch --prefix /registry --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl watch --prefix /registry
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl watch --prefix /registry --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
If you only want to see the affected keys (and not the binary data), you can append `| grep -a ^/registry` to the command to filter for keys only.
|
||||
|
||||
### Query etcd Directly
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
You can process the data to get a summary of count per key, using the command below:
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
|
||||
## Replacing Unhealthy etcd Nodes
|
||||
|
||||
+209
-102
@@ -6,29 +6,63 @@ title: Troubleshooting etcd Nodes
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/kubernetes-components/troubleshooting-etcd-nodes"/>
|
||||
</head>
|
||||
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role.
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role in RKE2 and K3s clusters.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
As RKE2 and K3s rely on `containerd` as the container runtime, `crictl` replaces Docker for container management. Before proceeding with the troubleshooting commands, configure your environment by exporting the following variables:
|
||||
|
||||
### RKE2
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/var/lib/rancher/rke2/bin/
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
|
||||
etcdcontainer=$(crictl ps --name etcd --quiet)
|
||||
```
|
||||
|
||||
### K3s
|
||||
|
||||
> ### ⚠️ **Warning**
|
||||
> K3s does not include `etcdctl` in the system PATH. If you need to perform etcd troubleshooting on a K3s cluster, you may need to install it or locate it within the K3s data directory.
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/usr/local/bin
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
|
||||
```
|
||||
|
||||
|
||||
## Checking if the etcd Container is Running
|
||||
|
||||
The container for etcd should have status **Up**. The duration shown after **Up** is the time the container has been running.
|
||||
**RKE2**: The container for etcd should be in the **Running** state.
|
||||
|
||||
```
|
||||
docker ps -a -f=name=etcd$
|
||||
```bash
|
||||
crictl ps --name etcd
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
d26adbd23643 rancher/mirrored-coreos-etcd:v3.5.7 "/usr/local/bin/etcd…" 30 minutes ago Up 30 minutes etcd
|
||||
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
|
||||
f1e289d202ed0 11ad16872a9cf 58 minutes ago Running etcd 0 7b56aab8204ea etcd-cluster1 kube-system
|
||||
```
|
||||
|
||||
## etcd Container Logging
|
||||
|
||||
The logging of the container can contain information on what the problem could be.
|
||||
**K3s**: Etcd runs as an embedded process in the K3s service. Check the service status:
|
||||
|
||||
```bash
|
||||
systemctl status k3s
|
||||
```
|
||||
docker logs etcd
|
||||
|
||||
## etcd Logging
|
||||
|
||||
The logs can contain information on what the problem could be.
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl logs $etcdcontainer
|
||||
```
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
journalctl -u k3s | grep -i etcd
|
||||
```
|
||||
| Log | Explanation |
|
||||
|-----|------------------|
|
||||
@@ -46,18 +80,43 @@ The address where etcd is listening depends on the address configuration of the
|
||||
|
||||
Output should contain all the nodes with the `etcd` role and the output should be identical on all nodes.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
Run the command inside the etcd container.
|
||||
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl member list \
|
||||
--cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/rke2/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl member list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl member list \
|
||||
--cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/k3s/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
1c424074df86e854, started, cluster-node1-f289ac71, https://IP:2380, https://IP:2379, false
|
||||
45c68c44c5a792ff, started, cluster-node2-67e3cf6f, https://IP:2380, https://IP:2379, false
|
||||
7c584f77c5180258, started, cluster-node3-e976bc00, https://IP:2380, https://IP:2379, false
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
The values for `RAFT TERM` should be equal and `RAFT INDEX` should be not be too far apart from each other.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint status --write-out table --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint status --write-out table --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -65,17 +124,22 @@ Example output:
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.5.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.5.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.5.7 | 25 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.6.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.6.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.6.7 | 25 MB | false | 72 | 66887 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
```
|
||||
|
||||
### Check Endpoint Health
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint health --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint health
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint health --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -84,54 +148,104 @@ https://IP:2379 is healthy: successfully committed proposal: took = 2.113189ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.649963ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.451201ms
|
||||
```
|
||||
### Check Connectivity on etcd Ports
|
||||
|
||||
### Check Connectivity on Port TCP/2379
|
||||
> In modern versions of Kubernetes, the etcd database (versions 3.5 and newer) introduced significant architectural changes regarding network traffic handling. Previously, etcd permitted standard HTTP REST requests on its primary client port (`2379`). However, to enhance performance and security, etcd 3.5+ strictly enforces the gRPC protocol on this port.<br />
|
||||
If you attempt to use standard HTTP tools like `curl` to test connectivity on port `2379`, the etcd server will automatically terminate the connection or return an error. This behavior often leads administrators to misinterpret the result as a closed port or a node failure.
|
||||
|
||||
Command:
|
||||
Since standard HTTP clients can no longer probe the primary etcd ports, the transport layer must be utilized for network troubleshooting. Using `openssl s_client` instead of `curl` bypasses the gRPC application requirement, allowing the raw TCP and TLS handshake to be tested directly.
|
||||
|
||||
These script isolate the network and security infrastructure from the database application. A successful `Verify return code: 0 (ok)` explicitly confirms four critical infrastructure components:
|
||||
|
||||
* **Network Path:** Routing is functional, and firewalls permit traffic on TCP port `2379` or `2380`.
|
||||
* **Process Availability:** The etcd service is running and actively listening on the designated port.
|
||||
* **Certificate Validity:** The TLS certificates are active, correctly formatted, and have not expired.
|
||||
* **Mutual Authentication (mTLS):** The node successfully authenticates against the cluster's specific Certificate Authority (CA).
|
||||
|
||||
**How these tests differ from the `etcdctl endpoint health` test**:
|
||||
|
||||
If `etcdctl endpoint health` test is failing, run these Connectivity Ports test scripts. If the scripts succeed, your network and certificates are intact, and the issue is likely confined to the etcd database itself. If these scripts fail, the issue is related to a firewall/network restriction, or certificate expiration.
|
||||
|
||||
#### Port TCP/2379
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint}/health"
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/health"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
### Check Connectivity on Port TCP/2380
|
||||
#### Port TCP/2380
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint}/version";
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl --http1.1 -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/version"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
## etcd Alarms
|
||||
|
||||
etcd will trigger alarms, for instance when it runs out of space.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output when NOSPACE alarm is triggered:
|
||||
@@ -154,10 +268,16 @@ Resolutions:
|
||||
|
||||
### Compact the Keyspace
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
rev=$(crictl exec $etcdcontainer etcdctl endpoint status --write-out json --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
crictl exec $etcdcontainer etcdctl compact "$rev" --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
rev=$(docker exec etcd etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
|
||||
docker exec etcd etcdctl compact "$rev"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
rev=$(etcdctl endpoint status --write-out json --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
etcdctl compact "$rev" --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -167,55 +287,39 @@ compacted revision xxx
|
||||
|
||||
### Defrag All etcd Members
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl defrag --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl defrag
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl defrag --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | e973e4419737125 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | 4a509c997b26c206 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | b217e736575e9dd3 | 3.5.7 | 553 kB | true | 32 | 2449410 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
```
|
||||
|
||||
### Disarm Alarm
|
||||
|
||||
After verifying that the DB size went down after compaction and defragmenting, the alarm needs to be disarmed for etcd to allow writes again.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm disarm --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm disarm --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
## Configure Log Level
|
||||
@@ -228,7 +332,7 @@ You can no longer dynamically change the log level in etcd v3.5 or later.
|
||||
|
||||
### etcd v3.5 And Later
|
||||
|
||||
To configure the log level for etcd, edit the cluster YAML:
|
||||
To configure the log level for etcd, edit the cluster configuration YAML:
|
||||
|
||||
```
|
||||
services:
|
||||
@@ -237,20 +341,7 @@ services:
|
||||
log-level: "debug"
|
||||
```
|
||||
|
||||
### etcd v3.4 And Earlier
|
||||
|
||||
In earlier etcd versions, you can use the API to dynamically change the log level. Configure debug logging using the commands below:
|
||||
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"DEBUG"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
|
||||
To reset the log level back to the default (`INFO`), you can use the following command.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"INFO"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
After modifying the configuration, restart the service (`systemctl restart rke2-server` or `systemctl restart k3s`) if you are configuring a stand-alone cluster.
|
||||
|
||||
## etcd Content
|
||||
|
||||
@@ -258,24 +349,40 @@ If you want to investigate the contents of your etcd, you can either watch strea
|
||||
|
||||
### Watch Streaming Events
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl watch --prefix /registry --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl watch --prefix /registry
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl watch --prefix /registry --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
If you only want to see the affected keys (and not the binary data), you can append `| grep -a ^/registry` to the command to filter for keys only.
|
||||
|
||||
### Query etcd Directly
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
You can process the data to get a summary of count per key, using the command below:
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
|
||||
## Replacing Unhealthy etcd Nodes
|
||||
|
||||
+209
-102
@@ -6,29 +6,63 @@ title: Troubleshooting etcd Nodes
|
||||
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/troubleshooting/kubernetes-components/troubleshooting-etcd-nodes"/>
|
||||
</head>
|
||||
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role.
|
||||
This section contains commands and tips for troubleshooting nodes with the `etcd` role in RKE2 and K3s clusters.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
As RKE2 and K3s rely on `containerd` as the container runtime, `crictl` replaces Docker for container management. Before proceeding with the troubleshooting commands, configure your environment by exporting the following variables:
|
||||
|
||||
### RKE2
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/var/lib/rancher/rke2/bin/
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
|
||||
etcdcontainer=$(crictl ps --name etcd --quiet)
|
||||
```
|
||||
|
||||
### K3s
|
||||
|
||||
> ### ⚠️ **Warning**
|
||||
> K3s does not include `etcdctl` in the system PATH. If you need to perform etcd troubleshooting on a K3s cluster, you may need to install it or locate it within the K3s data directory.
|
||||
|
||||
```bash
|
||||
export PATH=$PATH:/usr/local/bin
|
||||
export CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
|
||||
```
|
||||
|
||||
|
||||
## Checking if the etcd Container is Running
|
||||
|
||||
The container for etcd should have status **Up**. The duration shown after **Up** is the time the container has been running.
|
||||
**RKE2**: The container for etcd should be in the **Running** state.
|
||||
|
||||
```
|
||||
docker ps -a -f=name=etcd$
|
||||
```bash
|
||||
crictl ps --name etcd
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
d26adbd23643 rancher/mirrored-coreos-etcd:v3.5.7 "/usr/local/bin/etcd…" 30 minutes ago Up 30 minutes etcd
|
||||
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
|
||||
f1e289d202ed0 11ad16872a9cf 58 minutes ago Running etcd 0 7b56aab8204ea etcd-cluster1 kube-system
|
||||
```
|
||||
|
||||
## etcd Container Logging
|
||||
|
||||
The logging of the container can contain information on what the problem could be.
|
||||
**K3s**: Etcd runs as an embedded process in the K3s service. Check the service status:
|
||||
|
||||
```bash
|
||||
systemctl status k3s
|
||||
```
|
||||
docker logs etcd
|
||||
|
||||
## etcd Logging
|
||||
|
||||
The logs can contain information on what the problem could be.
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl logs $etcdcontainer
|
||||
```
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
journalctl -u k3s | grep -i etcd
|
||||
```
|
||||
| Log | Explanation |
|
||||
|-----|------------------|
|
||||
@@ -46,18 +80,43 @@ The address where etcd is listening depends on the address configuration of the
|
||||
|
||||
Output should contain all the nodes with the `etcd` role and the output should be identical on all nodes.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
Run the command inside the etcd container.
|
||||
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl member list \
|
||||
--cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/rke2/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl member list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl member list \
|
||||
--cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
--key /var/lib/rancher/k3s/server/tls/etcd/server-client.key \
|
||||
--cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
1c424074df86e854, started, cluster-node1-f289ac71, https://IP:2380, https://IP:2379, false
|
||||
45c68c44c5a792ff, started, cluster-node2-67e3cf6f, https://IP:2380, https://IP:2379, false
|
||||
7c584f77c5180258, started, cluster-node3-e976bc00, https://IP:2380, https://IP:2379, false
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
The values for `RAFT TERM` should be equal and `RAFT INDEX` should be not be too far apart from each other.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint status --write-out table --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint status --write-out table --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -65,17 +124,22 @@ Example output:
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.5.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.5.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.5.7 | 25 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 333ef673fc4add56 | 3.6.7 | 24 MB | false | 72 | 66887 |
|
||||
| https://IP:2379 | 5feed52d940ce4cf | 3.6.7 | 24 MB | true | 72 | 66887 |
|
||||
| https://IP:2379 | db6b3bdb559a848d | 3.6.7 | 25 MB | false | 72 | 66887 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
```
|
||||
|
||||
### Check Endpoint Health
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl endpoint health --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint health
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl endpoint health --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -84,54 +148,104 @@ https://IP:2379 is healthy: successfully committed proposal: took = 2.113189ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.649963ms
|
||||
https://IP:2379 is healthy: successfully committed proposal: took = 2.451201ms
|
||||
```
|
||||
### Check Connectivity on etcd Ports
|
||||
|
||||
### Check Connectivity on Port TCP/2379
|
||||
> In modern versions of Kubernetes, the etcd database (versions 3.5 and newer) introduced significant architectural changes regarding network traffic handling. Previously, etcd permitted standard HTTP REST requests on its primary client port (`2379`). However, to enhance performance and security, etcd 3.5+ strictly enforces the gRPC protocol on this port.<br />
|
||||
If you attempt to use standard HTTP tools like `curl` to test connectivity on port `2379`, the etcd server will automatically terminate the connection or return an error. This behavior often leads administrators to misinterpret the result as a closed port or a node failure.
|
||||
|
||||
Command:
|
||||
Since standard HTTP clients can no longer probe the primary etcd ports, the transport layer must be utilized for network troubleshooting. Using `openssl s_client` instead of `curl` bypasses the gRPC application requirement, allowing the raw TCP and TLS handshake to be tested directly.
|
||||
|
||||
These script isolate the network and security infrastructure from the database application. A successful `Verify return code: 0 (ok)` explicitly confirms four critical infrastructure components:
|
||||
|
||||
* **Network Path:** Routing is functional, and firewalls permit traffic on TCP port `2379` or `2380`.
|
||||
* **Process Availability:** The etcd service is running and actively listening on the designated port.
|
||||
* **Certificate Validity:** The TLS certificates are active, correctly formatted, and have not expired.
|
||||
* **Mutual Authentication (mTLS):** The node successfully authenticates against the cluster's specific Certificate Authority (CA).
|
||||
|
||||
**How these tests differ from the `etcdctl endpoint health` test**:
|
||||
|
||||
If `etcdctl endpoint health` test is failing, run these Connectivity Ports test scripts. If the scripts succeed, your network and certificates are intact, and the issue is likely confined to the etcd database itself. If these scripts fail, the issue is related to a firewall/network restriction, or certificate expiration.
|
||||
|
||||
#### Port TCP/2379
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint}/health"
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/health"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5); do
|
||||
echo "Validating connection to ${endpoint} (Client)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health
|
||||
{"health": "true"}
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2379/health (Client)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
### Check Connectivity on Port TCP/2380
|
||||
#### Port TCP/2380
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
for endpoint in $(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
for endpoint in $(docker exec etcd etcdctl member list | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint}/version";
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl --http1.1 -s -w "\n" --cacert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CACERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --cert $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_CERT" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) --key $(docker inspect -f '{{range $index, $value := .Config.Env}}{{if eq (index (split $value "=") 0) "ETCDCTL_KEY" }}{{range $i, $part := (split $value "=")}}{{if gt $i 1}}{{print "="}}{{end}}{{if gt $i 0}}{{print $part}}{{end}}{{end}}{{end}}{{end}}' etcd) "${endpoint}/version"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
for endpoint in $(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f4); do
|
||||
echo "Validating connection to ${endpoint} (Peer)";
|
||||
echo | openssl s_client -connect ${endpoint#https://} \
|
||||
-CAfile /var/lib/rancher/k3s/server/tls/etcd/peer-ca.crt \
|
||||
-cert /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.crt \
|
||||
-key /var/lib/rancher/k3s/server/tls/etcd/peer-server-client.key 2>/dev/null | grep -E 'Verify return code' || echo "Connection Failed/Timeout"
|
||||
done
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version
|
||||
{"etcdserver":"3.5.7","etcdcluster":"3.5.0"}
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
Validating connection to https://IP:2380/version (Peer)
|
||||
Verify return code: 0 (ok)
|
||||
```
|
||||
|
||||
## etcd Alarms
|
||||
|
||||
etcd will trigger alarms, for instance when it runs out of space.
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output when NOSPACE alarm is triggered:
|
||||
@@ -154,10 +268,16 @@ Resolutions:
|
||||
|
||||
### Compact the Keyspace
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
rev=$(crictl exec $etcdcontainer etcdctl endpoint status --write-out json --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
crictl exec $etcdcontainer etcdctl compact "$rev" --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
rev=$(docker exec etcd etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
|
||||
docker exec etcd etcdctl compact "$rev"
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
rev=$(etcdctl endpoint status --write-out json --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' | head -1)
|
||||
etcdctl compact "$rev" --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
@@ -167,55 +287,39 @@ compacted revision xxx
|
||||
|
||||
### Defrag All etcd Members
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl defrag --endpoints=$(crictl exec $etcdcontainer etcdctl member list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl defrag
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl defrag --endpoints=$(etcdctl member list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
Finished defragmenting etcd member[https://IP:2379]
|
||||
```
|
||||
|
||||
### Check Endpoint Status
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl endpoint status --write-out table
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
| https://IP:2379 | e973e4419737125 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | 4a509c997b26c206 | 3.5.7 | 553 kB | false | 32 | 2449410 |
|
||||
| https://IP:2379 | b217e736575e9dd3 | 3.5.7 | 553 kB | true | 32 | 2449410 |
|
||||
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
Finished defragmenting etcd member[https://IP:2379]. took xx.xxxxxxms
|
||||
```
|
||||
|
||||
### Disarm Alarm
|
||||
|
||||
After verifying that the DB size went down after compaction and defragmenting, the alarm needs to be disarmed for etcd to allow writes again.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm disarm --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
crictl exec $etcdcontainer etcdctl alarm list --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
Example output:
|
||||
```
|
||||
docker exec etcd etcdctl alarm list
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
memberID:x alarm:NOSPACE
|
||||
docker exec etcd etcdctl alarm disarm
|
||||
docker exec etcd etcdctl alarm list
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm disarm --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
etcdctl alarm list --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
## Configure Log Level
|
||||
@@ -228,7 +332,7 @@ You can no longer dynamically change the log level in etcd v3.5 or later.
|
||||
|
||||
### etcd v3.5 And Later
|
||||
|
||||
To configure the log level for etcd, edit the cluster YAML:
|
||||
To configure the log level for etcd, edit the cluster configuration YAML:
|
||||
|
||||
```
|
||||
services:
|
||||
@@ -237,20 +341,7 @@ services:
|
||||
log-level: "debug"
|
||||
```
|
||||
|
||||
### etcd v3.4 And Earlier
|
||||
|
||||
In earlier etcd versions, you can use the API to dynamically change the log level. Configure debug logging using the commands below:
|
||||
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"DEBUG"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
|
||||
To reset the log level back to the default (`INFO`), you can use the following command.
|
||||
|
||||
Command:
|
||||
```
|
||||
docker run --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -XPUT -d '{"Level":"INFO"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv ETCDCTL_ENDPOINTS)/config/local/log
|
||||
```
|
||||
After modifying the configuration, restart the service (`systemctl restart rke2-server` or `systemctl restart k3s`) if you are configuring a stand-alone cluster.
|
||||
|
||||
## etcd Content
|
||||
|
||||
@@ -258,24 +349,40 @@ If you want to investigate the contents of your etcd, you can either watch strea
|
||||
|
||||
### Watch Streaming Events
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl watch --prefix /registry --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl watch --prefix /registry
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl watch --prefix /registry --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
If you only want to see the affected keys (and not the binary data), you can append `| grep -a ^/registry` to the command to filter for keys only.
|
||||
|
||||
### Query etcd Directly
|
||||
|
||||
Command:
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
|
||||
```
|
||||
|
||||
You can process the data to get a summary of count per key, using the command below:
|
||||
|
||||
**RKE2**:
|
||||
```bash
|
||||
crictl exec $etcdcontainer etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
docker exec etcd etcdctl get /registry --prefix=true --keys-only | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
|
||||
**K3s**:
|
||||
```bash
|
||||
etcdctl get /registry --prefix=true --keys-only --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -nr
|
||||
```
|
||||
|
||||
## Replacing Unhealthy etcd Nodes
|
||||
|
||||
Reference in New Issue
Block a user