Files
rancher-docs/versioned_docs/version-2.9/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
T
Marty Hernandez Avedon c869ea69ac Fix order of headings (#1465)
* Fix 'title out of sequence' errors

fixed Dockershim.md

* fixed deprecated-features.md

* fixed install-and-configure-kubectl.md

* fixed rancher-is-no-longer-needed.md

* fixed security.md

* fixed technical-items.md + spacing, duplicate section, admonitions

* fixed telemetry.md

* fixed upgrades.md

* fixed upgrade-kubernetes-without-upgrading-rancher.md

* fixed air-gapped-upgrades.md

* fixed dockershim.md

* fixed docker-install-commands.md

* fixed install-kubernetes.md

* fixed infrastructure-private-registry.md

* fixed install-rancher-ha

* fixed manage-namespaces and tune-etcd-for-large-installs.md

* fixed cis-scans/configuration-reference.md

* fixed custom-benchmark.md

* fixed supportconfig.md

* fixed harvester/overview.md

* fixed logging-architecture.md

* fixed logging-helm-chart-options.md + rm'd unnecessary annotation title

* fixed taints-and-tolerances.md

* fixed longhorn/overview.md

* fixed neuvector/overview.md

* fixed monitoring-and-alerting

* fixed rancher-cli.md

* fixed cluster-configuration.md

* fixed monitoring-v2-configuration/examples.md

* fixed servicemonitors-and-podmonitors.md

* fixed other-troubleshooting-tips/dns.md
2024-09-18 14:02:55 -04:00

2.7 KiB

title
title
Tuning etcd for Large Installations
<head> </head>

When Rancher is used to manage a large infrastructure it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.

The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see mvcc: database space exceeded errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the quota-backend-bytes setting on the etcd servers.

Example: This Snippet of the RKE Cluster.yml file Increases the Keyspace Size to 5GB

# RKE cluster.yml
---
services:
  etcd:
    extra_args:
      quota-backend-bytes: 5368709120

Scaling etcd Disk Performance

You can follow the recommendations from the etcd docs on how to tune the disk priority on the host.

Additionally, to reduce IO contention on the disks for etcd, you can use a dedicated device for the data and wal directory. Based on etcd best practices, mirroring RAID configurations are unnecessary because etcd replicates data between the nodes in the cluster. You can use striping RAID configurations to increase available IOPS.

To implement this solution in an RKE cluster, the /var/lib/etcd/data and /var/lib/etcd/wal directories will need to have disks mounted and formatted on the underlying host. In the extra_args directive of the etcd service, you must include the wal_dir directory. Without specifying the wal_dir, etcd process will try to manipulate the underlying wal mount with insufficient permissions.

# RKE cluster.yml
---
services:
  etcd:
    extra_args:
      data-dir: '/var/lib/rancher/etcd/data/'
      wal-dir: '/var/lib/rancher/etcd/wal/wal_dir'
    extra_binds:
      - '/var/lib/etcd/data:/var/lib/rancher/etcd/data'
      - '/var/lib/etcd/wal:/var/lib/rancher/etcd/wal'