From dde8bfe94910915b24180d8220670c551208cd63 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 12:19:05 +0200 Subject: [PATCH 01/47] installation-requirements: respect RKE2, k3s, RKE ordering Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 60 +++++++++---------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index b7214336b13..5cdcbc3ac40 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md) -### RKE Specific Requirements +### RKE2 Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). ### K3s Specific Requirements @@ -55,11 +55,11 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup. -### RKE2 Specific Requirements +### RKE Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +For the container runtime, RKE should work with any modern Docker version. -For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). +For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) ## Hardware Requirements @@ -69,6 +69,30 @@ The following sections describe the CPU, memory, and disk requirements for the n Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. +### RKE2 Kubernetes + +These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. + +| Deployment Size | Clusters | Nodes | vCPUs | RAM | +| --------------- | -------- | --------- | ----- | ---- | +| Small | Up to 5 | Up to 50 | 2 | 5 GB | +| Medium | Up to 15 | Up to 200 | 3 | 9 GB | + + +### K3s Kubernetes + +These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) + +| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | +| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | +| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | +| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | +| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | +| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | +| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | + +Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. + ### RKE and Hosted Kubernetes These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. @@ -85,30 +109,6 @@ These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubern Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. -### K3s Kubernetes - -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) - -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | - -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. - - -### RKE2 Kubernetes - -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. - -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | - ### Docker These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. From 2cb15e0459a1f565824f8c83becf762f7b40f9d5 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 14:59:18 +0200 Subject: [PATCH 02/47] installation-requirements: new preamble on hardware numbers Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 23 ++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 5cdcbc3ac40..1797b507914 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -63,11 +63,28 @@ For more information see [Installing Docker,](../getting-started/installation-an ## Hardware Requirements -The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. -## CPU and Memory +### Premise + +Rancher's hardware footprint depends on a number of factors, including: + - size of managed infrastructure (eg. node count, cluster count) + - complexity of the desired access control rules (eg. `RoleBinding` object count) + - number of workloads (eg. Kubernetes deployments, Fleet deployments) + - usage patterns (eg. subset of functionality actively used, frequency of use, number of concurrently active users) + +Because of the high number of influencing factors and their variability over time, requirements in this document have to be interpreted as reasonable starting points that worked acceptably well in most observed use cases so far. Nevertheless, particular use cases may differ - with higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. + +:::note Evolution: + +Rancher's code base evolves, use cases change, and the body of accumulated Rancher experience grows every day. + +Because of that, this document's recommendations are subject to change over time with the aim of providing increasingly more concrete and accurate guidelines. + +If you find your Rancher deployment does not comply with recommendations in this document, while it used to comply with earlier revisions, you can [contact Rancher](https://rancher.com/contact/) for an ad-hoc evaluation. + +::: -Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. ### RKE2 Kubernetes From 7f7eb121a5d48aaacbe7dba1843b1415950aa27e Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 14:37:01 +0200 Subject: [PATCH 03/47] installation-requirements: uniform header and trailer Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 1797b507914..274917f3706 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -88,7 +88,9 @@ If you find your Rancher deployment does not comply with recommendations in this ### RKE2 Kubernetes -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. +Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. + +Please note that a highly available setup with at least 3 nodes is required for all production usages. | Deployment Size | Clusters | Nodes | vCPUs | RAM | | --------------- | -------- | --------- | ----- | ---- | @@ -98,7 +100,9 @@ These CPU and memory requirements apply to each instance with RKE2 installed. Mi ### K3s Kubernetes -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) +Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. + +Please note that a highly available setup with at least 3 nodes is required for all production usages. | Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | | --------------- | ---------- | ------------ | -------| ---------| ------------------------- | @@ -108,11 +112,11 @@ These CPU and memory requirements apply to each host in a [K3s Kubernetes cluste | X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | | XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. - ### RKE and Hosted Kubernetes -These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. +Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. + +Please note that a highly available setup with at least 3 nodes is required for all production usages. These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. @@ -124,11 +128,11 @@ These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubern | X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | | XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. - ### Docker -These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. +Minimum CPU and memory requirements for a [single node installation of Rancher](rancher-on-a-single-node-with-docker.md) are listed in the table below. + +Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. | Deployment Size | Clusters | Nodes | vCPUs | RAM | | --------------- | -------- | --------- | ----- | ---- | From 29b46512d483d559b815f2a9f109b8886fb3a419 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 14:45:49 +0200 Subject: [PATCH 04/47] installation-requirements: drop all references to installations >500 clusters Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 274917f3706..d5deaf57d36 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -109,8 +109,6 @@ Please note that a highly available setup with at least 3 nodes is required for | Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | | Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | | Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | ### RKE and Hosted Kubernetes @@ -125,8 +123,6 @@ These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubern | Small | Up to 150 | Up to 1500 | 2 | 8 GB | | Medium | Up to 300 | Up to 3000 | 4 | 16 GB | | Large | Up to 500 | Up to 5000 | 8 | 32 GB | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | ### Docker From 59d7613de9b1a30ff42362c15b3411daec78c52e Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 15:13:39 +0200 Subject: [PATCH 05/47] Bump up k3s-based requirements, align them with RKE2 Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index d5deaf57d36..ea03a1cb4f1 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -92,11 +92,11 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | - +| Deployment Size | Clusters | Nodes | vCPUs | RAM | +| --------------- | ---------- | ------------ |-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | ### K3s Kubernetes @@ -104,11 +104,11 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | +| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | +| --------------- | ---------- | ------------ |-------|-------| ------------------------- | +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | ### RKE and Hosted Kubernetes From d3aa37b40cd7a9d4e44566a4b2e9ba96cafead85 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 15:15:49 +0200 Subject: [PATCH 06/47] Add larger deployments note Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index ea03a1cb4f1..a5e694f4a72 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -92,11 +92,14 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ |-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | +| Deployment Size | Clusters | Nodes | vCPUs | RAM | +|-----------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | +| Larger | (†) | (†) | (†) | (†) | + +(†): Depending on various factors, larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a customized evaluation. ### K3s Kubernetes From ca24819edc4a8e78311c216915aa5a1c992ccf3d Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:05:29 +0200 Subject: [PATCH 07/47] split off Hosted Kubernetes from RKE1 Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 39 +++++++++++++------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index a5e694f4a72..f14a946ab03 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -107,25 +107,40 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ |-------|-------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | +| Deployment Size | Clusters | Nodes | vCPUs | RAM | External Database Host | +| --------------- | ---------- | ------------ |-------|-------|----------------------------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | -### RKE and Hosted Kubernetes +Note: External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. + +### Hosted Kubernetes Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. Please note that a highly available setup with at least 3 nodes is required for all production usages. -These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. +These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. They do not apply to Rancher SaaS solutions such as Rancher Prime Hosted](https://www.rancher.com/products/rancher). + +| Deployment Size | Clusters | Nodes | vCPUs | RAM | +|-----------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | + +### RKE + +Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. + +Please note that a highly available setup with at least 3 nodes is required for all production usages. + +| Deployment Size | Clusters | Nodes | vCPUs | RAM | +|-----------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large | Up to 500 | Up to 5000 | 16 | 64 GB | -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ | -------| ------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | ### Docker From 4e46a3fd0ffc0eb7963006496bc8a11e70220951 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:13:58 +0200 Subject: [PATCH 08/47] add references to distro docs Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index f14a946ab03..3dd5ab7863f 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -101,6 +101,8 @@ Please note that a highly available setup with at least 3 nodes is required for (†): Depending on various factors, larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a customized evaluation. +Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). + ### K3s Kubernetes Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. @@ -115,6 +117,8 @@ Please note that a highly available setup with at least 3 nodes is required for Note: External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. +Refer to k3s documentation for more detailed information on [k3s general requirements](https://docs.k3s.io/installation/requirements). + ### Hosted Kubernetes Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. @@ -141,6 +145,7 @@ Please note that a highly available setup with at least 3 nodes is required for | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | | Large | Up to 500 | Up to 5000 | 16 | 64 GB | +Refer to RKE documentation for more detailed information on [RKE general requirements](https://rke.docs.rancher.com/os). ### Docker From 569f07207c3bb49fe23a83f94527cd2f04269783 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:24:22 +0200 Subject: [PATCH 09/47] add reference to tuning page Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 26 ++++++++++++------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 3dd5ab7863f..df9e0185cb1 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -96,9 +96,11 @@ Please note that a highly available setup with at least 3 nodes is required for |-----------------|-----------|------------|-------|-------| | Small | Up to 150 | Up to 1500 | 4 | 16 GB | | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | | Larger | (†) | (†) | (†) | (†) | +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). + (†): Depending on various factors, larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a customized evaluation. Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). @@ -109,13 +111,15 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | External Database Host | -| --------------- | ---------- | ------------ |-------|-------|----------------------------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | +| Deployment Size | Clusters | Nodes | vCPUs | RAM | External Database Host (†) | +|-----------------|-----------|------------|-------|-------|----------------------------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | -Note: External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). + +(†): External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. Refer to k3s documentation for more detailed information on [k3s general requirements](https://docs.k3s.io/installation/requirements). @@ -131,7 +135,9 @@ These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. Th |-----------------|-----------|------------|-------|-------| | Small | Up to 150 | Up to 1500 | 4 | 16 GB | | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | + +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). ### RKE @@ -143,7 +149,9 @@ Please note that a highly available setup with at least 3 nodes is required for |-----------------|-----------|------------|-------|-------| | Small | Up to 150 | Up to 1500 | 4 | 16 GB | | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large | Up to 500 | Up to 5000 | 16 | 64 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | + +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). Refer to RKE documentation for more detailed information on [RKE general requirements](https://rke.docs.rancher.com/os). From 03318b55b165ccef52dcd621cc9cc4765e566c26 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:37:01 +0200 Subject: [PATCH 10/47] make scaling tips page visible in the menu Signed-off-by: Silvio Moioli --- sidebars.js | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sidebars.js b/sidebars.js index d9c5f34deb4..d1126022a3b 100644 --- a/sidebars.js +++ b/sidebars.js @@ -828,7 +828,8 @@ const sidebars = { items: [ "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", - "reference-guides/best-practices/rancher-server/tips-for-running-rancher" + "reference-guides/best-practices/rancher-server/tips-for-running-rancher", + "reference-guides/best-practices/rancher-server/tips-for-scaling-rancher" ] }, { From 5a23d6d994682a8d7d4a5a5b7ca460649532027d Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:54:32 +0200 Subject: [PATCH 11/47] do not use deployments (overloaded term) Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 52 +++++++++---------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index df9e0185cb1..dabdca2318f 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -63,7 +63,7 @@ For more information see [Installing Docker,](../getting-started/installation-an ## Hardware Requirements -The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed based on the size of the infrastructure it manages. ### Premise @@ -92,12 +92,12 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -|-----------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | -| Larger | (†) | (†) | (†) | (†) | +| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | +|-----------------------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). @@ -111,11 +111,11 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | External Database Host (†) | -|-----------------|-----------|------------|-------|-------|----------------------------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | +| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | External Database Host (†) | +|-----------------------------|-----------|------------|-------|-------|----------------------------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). @@ -131,11 +131,11 @@ Please note that a highly available setup with at least 3 nodes is required for These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. They do not apply to Rancher SaaS solutions such as Rancher Prime Hosted](https://www.rancher.com/products/rancher). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -|-----------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | +| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | +|-----------------------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). @@ -145,11 +145,11 @@ Minimum CPU and memory requirements for each individual node in the [Kubernetes Please note that a highly available setup with at least 3 nodes is required for all production usages. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -|-----------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | +| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | +|-----------------------------|-----------|------------|-------|-------| +| Small | Up to 150 | Up to 1500 | 4 | 16 GB | +| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | +| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). @@ -161,10 +161,10 @@ Minimum CPU and memory requirements for a [single node installation of Rancher]( Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | +|-----------------------------|----------|-----------|-------|------| +| Small | Up to 5 | Up to 50 | 1 | 4 GB | +| Medium | Up to 15 | Up to 200 | 2 | 8 GB | ## Ingress From 0bc7552f2e34c0d3aa3c99923dbccb94cd14462d Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 21 Sep 2023 16:54:40 +0200 Subject: [PATCH 12/47] describe typical use Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index dabdca2318f..ddfc2dcd855 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -75,6 +75,14 @@ Rancher's hardware footprint depends on a number of factors, including: Because of the high number of influencing factors and their variability over time, requirements in this document have to be interpreted as reasonable starting points that worked acceptably well in most observed use cases so far. Nevertheless, particular use cases may differ - with higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. +In particular, requirements on this page are subject to "typical use" conditions, which include: + - total count of Kubernetes resources under 60k per type + - up to 120 pods per node + - up to 200 CRDs in the upstream (local) cluster + - up to 100 CRDs in downstream clusters + - up to 50 Fleet deployments +Higher numbers are possible but requirements might be higher. + :::note Evolution: Rancher's code base evolves, use cases change, and the body of accumulated Rancher experience grows every day. From 6279e31f366ef1946caad4003874f50355d60296 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:06:38 +0200 Subject: [PATCH 13/47] Specify UI might slow down before 60k items Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index ddfc2dcd855..4eb138d46ab 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -76,12 +76,13 @@ Rancher's hardware footprint depends on a number of factors, including: Because of the high number of influencing factors and their variability over time, requirements in this document have to be interpreted as reasonable starting points that worked acceptably well in most observed use cases so far. Nevertheless, particular use cases may differ - with higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to "typical use" conditions, which include: - - total count of Kubernetes resources under 60k per type + - total count of Kubernetes resources under 60 thousand per type - up to 120 pods per node - up to 200 CRDs in the upstream (local) cluster - up to 100 CRDs in downstream clusters - up to 50 Fleet deployments Higher numbers are possible but requirements might be higher. +Note that visualization in the UI might be impacted for users with visibility on all resources in a type above 20 thousand total items. :::note Evolution: From 4329a632ad732abdee992c37cee72e8cb2294cc9 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:16:01 +0200 Subject: [PATCH 14/47] tips-for-scaling-rancher: use correct capitalization for etcd Signed-off-by: Silvio Moioli --- .../rancher-server/tips-for-scaling-rancher.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index 691ef18f576..9bf2ea9417c 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -19,7 +19,7 @@ This guide aims to introduce the approaches that should be considered to scale R The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. ### Managing Your Object Counts -ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. +etcd eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. Rolebindings are created in the local cluster as a side effect of many operations. @@ -59,7 +59,7 @@ A large component of performance is the local cluster and how it was configured. ### Keeping Kubernetes Versions Up to Date Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. -### Optimizing ETCD -The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). +### Optimizing etcd +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). -Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication. +Theoretically, the more nodes in an etcd cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that etcd performance will be inversely affected by distance between nodes as that will slow down network communication. From 6e6864f2f083f5a3ca7a698c742d2be382837c11 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:17:02 +0200 Subject: [PATCH 15/47] tips-for-scaling-rancher: use correct capitalization for RoleBindings Signed-off-by: Silvio Moioli --- .../rancher-server/tips-for-scaling-rancher.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index 9bf2ea9417c..44da173550a 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -19,18 +19,18 @@ This guide aims to introduce the approaches that should be considered to scale R The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. ### Managing Your Object Counts -etcd eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. +etcd eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is `RoleBindings`. -Rolebindings are created in the local cluster as a side effect of many operations. +`RoleBindings` are created in the local cluster as a side effect of many operations. -Considerations when attempting reduce rolebindings in the local cluster: +Considerations when attempting reduce `RoleBindings` in the local cluster: * Only add users to clusters and projects when necessary * Remove clusters and projects when they are no longer needed * Only use custom roles if necessary * Use as few rules as possible in custom roles * Consider whether adding a role to a user is redundant * Consider that using less, but more powerful, clusters may be more efficient -* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case. +* Experiment to see if creating new projects or creating new clusters manifests in fewer `RoleBindings` for your specific use case. ### Using New Apps Over Legacy Apps There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster. From 66adb074d5b24ff9bcf65c529aa009c70725f4c7 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:24:17 +0200 Subject: [PATCH 16/47] tips-for-scaling-rancher: add suggestions to minimize RoleBindings Signed-off-by: Silvio Moioli --- .../best-practices/rancher-server/tips-for-scaling-rancher.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index 44da173550a..662ba4c7dcc 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -23,7 +23,9 @@ etcd eventually encounters limitations to the number of a single Kubernetes reso `RoleBindings` are created in the local cluster as a side effect of many operations. -Considerations when attempting reduce `RoleBindings` in the local cluster: +Considerations when attempting to reduce `RoleBindings` in the local cluster: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role, preferring others wherever applicable +* If [external authentication](../../../pages-for-subheaders/authentication-config) is configured, use groups to assign roles preferably * Only add users to clusters and projects when necessary * Remove clusters and projects when they are no longer needed * Only use custom roles if necessary From 3498a2f3603ddc5df462b58f9f33771e2e303698 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:30:52 +0200 Subject: [PATCH 17/47] tips-for-scaling-rancher: clarify language on etcd recommendations Signed-off-by: Silvio Moioli --- .../rancher-server/tips-for-scaling-rancher.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index 662ba4c7dcc..0eb3fec7864 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -62,6 +62,8 @@ A large component of performance is the local cluster and how it was configured. Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. ### Optimizing etcd -The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. It is thus recommended that etcd runs on dedicated nodes with SSDs with high IOPS and a fast network setup. For more information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks) page. -Theoretically, the more nodes in an etcd cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that etcd performance will be inversely affected by distance between nodes as that will slow down network communication. +As adding more nodes in an etcd cluster will make operations slower, for best performance it is recommended to run etcd on exactly 3 nodes. This may be counter-intuitive to common scaling approaches, and it is due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). + +etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication, so it is recommended that etcd nodes are all colocated together with Rancher nodes. From 59e94a05926aa587a0919cbaa1d50354b55fd8e2 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:39:35 +0200 Subject: [PATCH 18/47] tune-etcd-for-large-installs: remove reference to 15 clusters being large Signed-off-by: Silvio Moioli --- .../advanced-user-guides/tune-etcd-for-large-installs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index e024f1dd779..a61a83bd5ac 100644 --- a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When running larger Rancher installations it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. From fa3e39189a4d1d9441bb6166215a3362bb6afd0b Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 12:49:17 +0200 Subject: [PATCH 19/47] tips-for-scaling-rancher: recommend unmanaged Kubernetes distros Signed-off-by: Silvio Moioli --- .../rancher-server/tips-for-scaling-rancher.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index 0eb3fec7864..cc867ffaa0a 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -58,6 +58,11 @@ In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY ## Optimizations Outside of Rancher A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess. +### Manage local cluster nodes directly, use RKE2 as the Kubernetes distribution of choice +Managed Kubernetes services make it easier to deploy and run Kubernetes clusters, but they also typically limit control on configuration and insights on individual nodes and services. As Rancher can be particularly demanding on the local cluster, especially in large scale scenarios, it is recommended to have full control of the nodes and their configuration. + +Among Rancher Kubernetes distributions RKE2 is recommended for all Rancher large scale use cases. + ### Keeping Kubernetes Versions Up to Date Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. From 1006603d73907f344fe92932478f0616e8051564 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 13:34:43 +0200 Subject: [PATCH 20/47] tips-for-scaling-rancher: rename to tuning-and-best-practices-for-rancher-at-scale Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 8 ++++---- ... => tuning-and-best-practices-for-rancher-at-scale.md} | 6 ++++-- docusaurus.config.js | 4 ++++ sidebars.js | 2 +- 4 files changed, 13 insertions(+), 7 deletions(-) rename docs/reference-guides/best-practices/rancher-server/{tips-for-scaling-rancher.md => tuning-and-best-practices-for-rancher-at-scale.md} (96%) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 4eb138d46ab..a20374c46eb 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -108,7 +108,7 @@ Please note that a highly available setup with at least 3 nodes is required for | Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | | Larger (†) | (†) | (†) | (†) | (†) | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). (†): Depending on various factors, larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a customized evaluation. @@ -126,7 +126,7 @@ Please note that a highly available setup with at least 3 nodes is required for | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | | Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). (†): External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. @@ -146,7 +146,7 @@ These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. Th | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | | Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). ### RKE @@ -160,7 +160,7 @@ Please note that a highly available setup with at least 3 nodes is required for | Medium | Up to 300 | Up to 3000 | 8 | 32 GB | | Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md). +(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). Refer to RKE documentation for more detailed information on [RKE general requirements](https://rke.docs.rancher.com/os). diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md similarity index 96% rename from docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md rename to docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index cc867ffaa0a..6d35c512935 100644 --- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -1,11 +1,13 @@ --- -title: Tips for Scaling Rancher +title: Tuning and Best Practices for Rancher at Scale --- - + +:docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md + This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups. ## General Tips on Optimizing Rancher's Performance diff --git a/docusaurus.config.js b/docusaurus.config.js index 3ab17e54bc1..7789a046093 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -1166,6 +1166,10 @@ module.exports = { { to: "/v2.7/reference-guides/rancher-security/hardening-guides/rke2-hardening-guide/rke2-self-assessment-guide-with-cis-v1.7-k8s-v1.25", from: "/v2.7/reference-guides/rancher-security/hardening-guides/rke2-hardening-guide/rke2-self-assessment-guide-with-cis-v1.23-k8s-v1.25" + }, + { + to: "/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale", + from: "/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher" } ], }, diff --git a/sidebars.js b/sidebars.js index d1126022a3b..a0e4dcfcfcf 100644 --- a/sidebars.js +++ b/sidebars.js @@ -829,7 +829,7 @@ const sidebars = { "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", "reference-guides/best-practices/rancher-server/tips-for-running-rancher", - "reference-guides/best-practices/rancher-server/tips-for-scaling-rancher" + "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale" ] }, { From 234ae7be4d34f53b2d3ccd5c9c90fe8af3370694 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 14:07:45 +0200 Subject: [PATCH 21/47] tuning-and-best-practices-for-rancher-at-scale: make language more assertive Signed-off-by: Silvio Moioli --- ...and-best-practices-for-rancher-at-scale.md | 38 ++++++++++--------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index 6d35c512935..ba399ebece8 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -8,24 +8,24 @@ title: Tuning and Best Practices for Rancher at Scale -This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups. +This guide describes best practices and tuning approaches to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can be taken to minimize the load put on Rancher, as well as optimize Rancher's ability to manage larger infrastructures. -## General Tips on Optimizing Rancher's Performance -* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases. +## General Guidelines on Optimizing Rancher's Performance +* Keep Rancher up to date with patch releases. Performance improvements and bug fixes are made continuously, and the latest release incorporates the largest set ofperformance related development, experience and feedback from many users. -* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations. +* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. -* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. +* Reduce network latency between Rancher's cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if a user or organization requires clusters/nodes all over the world or spread across many regions, consider multiple Rancher installations. ## Minimizing Load on the local cluster -The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. +One typical bottleneck when scaling up Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. ### Managing Your Object Counts -etcd eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is `RoleBindings`. +etcd is the backing database for Kubernetes and for Rancher, and is known to eventually encounters limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors, however experience indicates performance issues frequently arise once a single resource type's object count exceeds 60 thousand, and often that type is `RoleBindings`. -`RoleBindings` are created in the local cluster as a side effect of many operations. +This is typical in Rancher, as `RoleBindings` are created in the local cluster as a side effect of many operations. -Considerations when attempting to reduce `RoleBindings` in the local cluster: +It is recommended to attempt reducing `RoleBindings` in the local cluster in the following ways: * Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role, preferring others wherever applicable * If [external authentication](../../../pages-for-subheaders/authentication-config) is configured, use groups to assign roles preferably * Only add users to clusters and projects when necessary @@ -37,14 +37,14 @@ Considerations when attempting to reduce `RoleBindings` in the local cluster: * Experiment to see if creating new projects or creating new clusters manifests in fewer `RoleBindings` for your specific use case. ### Using New Apps Over Legacy Apps -There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster. +There are two app Kubernetes resources that Rancher uses: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, `apps.projects.cattle.io`, were first introduced with the former UI (Cluster Manager) and are now outdated. New apps, `apps.catalog.cattle.io`, are found in the current UI (Cluster Explorer) for their respective cluster. New apps are preferable because their data resides in downstream clusters, freeing up resources in the local cluster. -We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. +It is recommended to remove any remaining legacy apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. ### Using the Authorized Cluster Endpoint (ACE) -There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it. +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) option exist to access the Kubernetes API of Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to generated kubeconfig files generated for the cluster that uses a direct endpoint to the cluster, thereby bypassing Rancher. That reduces load on Rancher for use cases where unmediated API access is acceptable or preferable. -Without using ACE, all kubeconfig requests first route through Rancher. +Note that, in order for `kubeconfig` to take advantage of ACE, users need to issue the `kubectl use-context ` command in order to start using it. ### Experimental: Option to Reduce Event Handler Executions The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. @@ -58,17 +58,21 @@ The value for the environment variable can be a comma separated list of the foll In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. ## Optimizations Outside of Rancher -A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess. +Important influencing factors in Rancher performance are its underlying cluster's own performance and its configuration. The local cluster, if misconfigured, can indeed introduce a bottleneck Rancher software has no chance to resolve. ### Manage local cluster nodes directly, use RKE2 as the Kubernetes distribution of choice -Managed Kubernetes services make it easier to deploy and run Kubernetes clusters, but they also typically limit control on configuration and insights on individual nodes and services. As Rancher can be particularly demanding on the local cluster, especially in large scale scenarios, it is recommended to have full control of the nodes and their configuration. +As Rancher can be particularly demanding on the local cluster, especially in large scale scenarios, it is recommended to have full control of its configuration and its nodes. For example, when Rancher nodes experience high resource usage, standard Linux troubleshooting techniques and tools are recommended to identify whether Rancher, Kubernetes components, or OS components are the root cause of the excess resource consumption. -Among Rancher Kubernetes distributions RKE2 is recommended for all Rancher large scale use cases. +Consequently, although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for Rancher's local cluster in high scale scenarios, because they typically limit control on configuration and insights on individual nodes and services. + +When choosing a Kubernetes distribution, it is recommended to use RKE2 for all Rancher large scale use cases. ### Keeping Kubernetes Versions Up to Date -Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. +Similar to Rancher versions, it is recommended to keep the local Kubernetes cluster up to date. That will ensure that your cluster contains any available performance enhancements and bug fixes. ### Optimizing etcd +etcd is the backing database for Kubernetes and for Rancher, therefore it plays a very important role in Rancher performance. + The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. It is thus recommended that etcd runs on dedicated nodes with SSDs with high IOPS and a fast network setup. For more information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks) page. As adding more nodes in an etcd cluster will make operations slower, for best performance it is recommended to run etcd on exactly 3 nodes. This may be counter-intuitive to common scaling approaches, and it is due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). From 16651823dd8c5ae15d7804ce4db0eedd8a269b49 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 22 Sep 2023 14:14:15 +0200 Subject: [PATCH 22/47] tuning-and-best-practices-for-rancher-at-scale: add more detailed RBAC considerations Signed-off-by: Silvio Moioli --- .../tuning-and-best-practices-for-rancher-at-scale.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index ba399ebece8..c27a3508741 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -33,9 +33,17 @@ It is recommended to attempt reducing `RoleBindings` in the local cluster in the * Only use custom roles if necessary * Use as few rules as possible in custom roles * Consider whether adding a role to a user is redundant -* Consider that using less, but more powerful, clusters may be more efficient +* Consider using less, but more powerful, clusters +* Keep into account that Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Whenever applicable, try to minimize configurations that gives access to "all but one aspect" (cluster, project, namespace...) as that will result in the creation of a high number of `RoleBindings` * Experiment to see if creating new projects or creating new clusters manifests in fewer `RoleBindings` for your specific use case. +### RoleBinding count estimation + +Predicting exactly the number of `RoleBindings` a given configuration will create depends on many factors and is complicated to calculate. However, it is possible to give a first estimation according to considerations below: +* As a minimum estimation consider the formula `32C + U + 2UaC + 8P + 5Pa`, where `C` is the cluster count, `U` is the user count, `Ua` is the average count of users with a membership on a cluster, `P` is the project count, and `Pa` is the average number of users with a membership on a project +* The Restricted Admin role follows a different formula, as every user with Restricted Admin role will result in at least `7C + 2P + 2` additional `RoleBindings` +* The number of `RoleBindings` generally increases linearly with cluster count, project count, and user count + ### Using New Apps Over Legacy Apps There are two app Kubernetes resources that Rancher uses: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, `apps.projects.cattle.io`, were first introduced with the former UI (Cluster Manager) and are now outdated. New apps, `apps.catalog.cattle.io`, are found in the current UI (Cluster Explorer) for their respective cluster. New apps are preferable because their data resides in downstream clusters, freeing up resources in the local cluster. From c27dd902c766d1adb62404310be6133d0417917d Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 09:56:20 +0200 Subject: [PATCH 23/47] tune-etcd-for-large-installs: add backlink with definition of 'large' Signed-off-by: Silvio Moioli --- .../advanced-user-guides/tune-etcd-for-large-installs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index a61a83bd5ac..fcf9566142e 100644 --- a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. From 8e675df380dec1d6162b9eec86d17852a681e03b Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 10:06:15 +0200 Subject: [PATCH 24/47] Apply suggestions from code review - copy editing Co-authored-by: Marty Hernandez Avedon --- .../installation-requirements.md | 53 ++++++++++--------- 1 file changed, 27 insertions(+), 26 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index a20374c46eb..c4a13aa952d 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -41,7 +41,7 @@ If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../h ### RKE2 Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs. For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). @@ -59,23 +59,24 @@ If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these For the container runtime, RKE should work with any modern Docker version. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). ## Hardware Requirements -The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed based on the size of the infrastructure it manages. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. ### Premise Rancher's hardware footprint depends on a number of factors, including: - - size of managed infrastructure (eg. node count, cluster count) - - complexity of the desired access control rules (eg. `RoleBinding` object count) - - number of workloads (eg. Kubernetes deployments, Fleet deployments) - - usage patterns (eg. subset of functionality actively used, frequency of use, number of concurrently active users) -Because of the high number of influencing factors and their variability over time, requirements in this document have to be interpreted as reasonable starting points that worked acceptably well in most observed use cases so far. Nevertheless, particular use cases may differ - with higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. + - Size of the managed infrastructure (eg. node count, cluster count). + - Complexity of the desired access control rules (eg. `RoleBinding` object count). + - Number of workloads (eg. Kubernetes deployments, Fleet deployments). + - Usage patterns (eg. subset of functionality actively used, frequency of use, number of concurrent users). -In particular, requirements on this page are subject to "typical use" conditions, which include: +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. + +In particular, requirements on this page are subject to typical use assumptions, which include: - total count of Kubernetes resources under 60 thousand per type - up to 120 pods per node - up to 200 CRDs in the upstream (local) cluster @@ -86,20 +87,20 @@ Note that visualization in the UI might be impacted for users with visibility on :::note Evolution: -Rancher's code base evolves, use cases change, and the body of accumulated Rancher experience grows every day. +Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day. -Because of that, this document's recommendations are subject to change over time with the aim of providing increasingly more concrete and accurate guidelines. +Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete. -If you find your Rancher deployment does not comply with recommendations in this document, while it used to comply with earlier revisions, you can [contact Rancher](https://rancher.com/contact/) for an ad-hoc evaluation. +If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation. ::: ### RKE2 Kubernetes -Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -Please note that a highly available setup with at least 3 nodes is required for all production usages. +Please note that a highly available setup with at least three nodes is required for production. | Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | |-----------------------------|-----------|------------|-------|-------| @@ -110,15 +111,15 @@ Please note that a highly available setup with at least 3 nodes is required for (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). -(†): Depending on various factors, larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a customized evaluation. +(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). ### K3s Kubernetes -Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -Please note that a highly available setup with at least 3 nodes is required for all production usages. +Please note that a highly available setup with at least three nodes is required for production. | Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | External Database Host (†) | |-----------------------------|-----------|------------|-------|-------|----------------------------| @@ -128,17 +129,17 @@ Please note that a highly available setup with at least 3 nodes is required for (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). -(†): External Database Host refers to the optional possibility of hosting [k3s cluster data store on an external dedicated host](https://docs.k3s.io/datastore). Exact requirements will depend on the chosen data store, this table is a guideline only. +(†): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. -Refer to k3s documentation for more detailed information on [k3s general requirements](https://docs.k3s.io/installation/requirements). +Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). ### Hosted Kubernetes -Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -Please note that a highly available setup with at least 3 nodes is required for all production usages. +Please note that a highly available setup with at least three nodes is required for production. -These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. They do not apply to Rancher SaaS solutions such as Rancher Prime Hosted](https://www.rancher.com/products/rancher). +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). | Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | |-----------------------------|-----------|------------|-------|-------| @@ -150,9 +151,9 @@ These requirements apply hosted Kubernetes clusters such as EKS, AKS, or GKE. Th ### RKE -Minimum CPU and memory requirements for each individual node in the [Kubernetes cluster Rancher is installed in](install-upgrade-on-a-kubernetes-cluster.md) are listed in the table below. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -Please note that a highly available setup with at least 3 nodes is required for all production usages. +Please note that a highly available setup with at least three nodes is required for production. | Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | |-----------------------------|-----------|------------|-------|-------| @@ -162,11 +163,11 @@ Please note that a highly available setup with at least 3 nodes is required for (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). -Refer to RKE documentation for more detailed information on [RKE general requirements](https://rke.docs.rancher.com/os). +Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). ### Docker -Minimum CPU and memory requirements for a [single node installation of Rancher](rancher-on-a-single-node-with-docker.md) are listed in the table below. +The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md). Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. From d37a50c041e7379a5ec746cc1c6908c00ede1378 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 10:30:41 +0200 Subject: [PATCH 25/47] installation-requirements: use more specific title Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index c4a13aa952d..31fd20e5b24 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -65,7 +65,7 @@ For more information, see [Installing Docker](../getting-started/installation-an The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. -### Premise +### Practical Considerations Rancher's hardware footprint depends on a number of factors, including: From e57f628811cc8598ec5aa9e45b802706ae16b5a7 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 10:32:39 +0200 Subject: [PATCH 26/47] installation-requirements: make typical use assumptions more concise Co-authored-by: Marty Hernandez Avedon --- .../installation-requirements.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 31fd20e5b24..6312b02bdec 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -77,13 +77,13 @@ Rancher's hardware footprint depends on a number of factors, including: Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to typical use assumptions, which include: - - total count of Kubernetes resources under 60 thousand per type - - up to 120 pods per node - - up to 200 CRDs in the upstream (local) cluster - - up to 100 CRDs in downstream clusters - - up to 50 Fleet deployments -Higher numbers are possible but requirements might be higher. -Note that visualization in the UI might be impacted for users with visibility on all resources in a type above 20 thousand total items. + - Under 60 thousand total Kubernetes resources, per type. + - Up to 120 pods per node. + - Up to 200 CRDs in the upstream (local) cluster. + - Up to 100 CRDs in downstream clusters. + - Up to 50 Fleet deployments. + +Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, you might not be able to see all resources through the Rancher UI. :::note Evolution: From 7bb0daeb060e68d63aa425db14e25d6579eac9d8 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 10:34:22 +0200 Subject: [PATCH 27/47] installation-requirements: fix up expectation in UI loading time Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 6312b02bdec..fd2a2456af7 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -83,7 +83,7 @@ In particular, requirements on this page are subject to typical use assumptions, - Up to 100 CRDs in downstream clusters. - Up to 50 Fleet deployments. -Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, you might not be able to see all resources through the Rancher UI. +Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. :::note Evolution: From 548e4de67ae9e4146ba9fee4c4eab0cf10ff8897 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 3 Oct 2023 10:42:18 +0200 Subject: [PATCH 28/47] installation-requirements: reformat tables Signed-off-by: Silvio Moioli --- .../installation-requirements.md | 50 +++++++++---------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index fd2a2456af7..4cd83193222 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -102,12 +102,12 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | -|-----------------------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | -| Larger (†) | (†) | (†) | (†) | (†) | +| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). @@ -121,11 +121,11 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | External Database Host (†) | -|-----------------------------|-----------|------------|-------|-------|----------------------------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | +| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | External Database Host (†) | +|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| +| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (*) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). @@ -141,11 +141,11 @@ Please note that a highly available setup with at least three nodes is required These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). -| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | -|-----------------------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | +| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). @@ -155,11 +155,11 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | -|-----------------------------|-----------|------------|-------|-------| -| Small | Up to 150 | Up to 1500 | 4 | 16 GB | -| Medium | Up to 300 | Up to 3000 | 8 | 32 GB | -| Large (*) | Up to 500 | Up to 5000 | 16 | 64 GB | +| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | (*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). @@ -171,10 +171,10 @@ The following table lists minimum CPU and memory requirements for a [single Dock Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. -| Managed Infrastructure Size | Clusters | Nodes | vCPUs | RAM | -|-----------------------------|----------|-----------|-------|------| -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|------| +| Small | 5 | 50 | 1 | 4 GB | +| Medium | 15 | 200 | 2 | 8 GB | ## Ingress From 650579d39163d4a202f06a629858cf90a54978df Mon Sep 17 00:00:00 2001 From: Marty Hernandez Avedon Date: Tue, 3 Oct 2023 15:49:12 -0400 Subject: [PATCH 29/47] Apply suggestions from code review --- ...and-best-practices-for-rancher-at-scale.md | 74 ++++++++++--------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index c27a3508741..1740d676aa8 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -4,52 +4,56 @@ title: Tuning and Best Practices for Rancher at Scale -:docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md -This guide describes best practices and tuning approaches to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can be taken to minimize the load put on Rancher, as well as optimize Rancher's ability to manage larger infrastructures. +This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. + +## Optimizing Rancher Performance -## General Guidelines on Optimizing Rancher's Performance * Keep Rancher up to date with patch releases. Performance improvements and bug fixes are made continuously, and the latest release incorporates the largest set ofperformance related development, experience and feedback from many users. -* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. +* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. -* Reduce network latency between Rancher's cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if a user or organization requires clusters/nodes all over the world or spread across many regions, consider multiple Rancher installations. +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if a you require clusters or nodes spread across the world, consider multiple Rancher installations. + +## Minimizing Load on the Upstream Cluster -## Minimizing Load on the local cluster One typical bottleneck when scaling up Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. ### Managing Your Object Counts -etcd is the backing database for Kubernetes and for Rancher, and is known to eventually encounters limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors, however experience indicates performance issues frequently arise once a single resource type's object count exceeds 60 thousand, and often that type is `RoleBindings`. -This is typical in Rancher, as `RoleBindings` are created in the local cluster as a side effect of many operations. +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. -It is recommended to attempt reducing `RoleBindings` in the local cluster in the following ways: -* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role, preferring others wherever applicable -* If [external authentication](../../../pages-for-subheaders/authentication-config) is configured, use groups to assign roles preferably -* Only add users to clusters and projects when necessary -* Remove clusters and projects when they are no longer needed -* Only use custom roles if necessary -* Use as few rules as possible in custom roles -* Consider whether adding a role to a user is redundant -* Consider using less, but more powerful, clusters -* Keep into account that Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Whenever applicable, try to minimize configurations that gives access to "all but one aspect" (cluster, project, namespace...) as that will result in the creation of a high number of `RoleBindings` -* Experiment to see if creating new projects or creating new clusters manifests in fewer `RoleBindings` for your specific use case. +This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. -### RoleBinding count estimation +You can reduce the number of `RoleBindings` in the upstream cluster in the following ways: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible. +* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles. +* Only add users to clusters and projects when necessary. +* Remove clusters and projects when they are no longer needed. +* Only use custom roles if necessary. +* Use as few rules as possible in custom roles. +* Consider whether adding a role to a user is redundant. +* Consider using less, but more powerful, clusters. +* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects. +* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case. + +### RoleBinding Count Estimation Predicting exactly the number of `RoleBindings` a given configuration will create depends on many factors and is complicated to calculate. However, it is possible to give a first estimation according to considerations below: * As a minimum estimation consider the formula `32C + U + 2UaC + 8P + 5Pa`, where `C` is the cluster count, `U` is the user count, `Ua` is the average count of users with a membership on a cluster, `P` is the project count, and `Pa` is the average number of users with a membership on a project -* The Restricted Admin role follows a different formula, as every user with Restricted Admin role will result in at least `7C + 2P + 2` additional `RoleBindings` -* The number of `RoleBindings` generally increases linearly with cluster count, project count, and user count +* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. +* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. ### Using New Apps Over Legacy Apps + There are two app Kubernetes resources that Rancher uses: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, `apps.projects.cattle.io`, were first introduced with the former UI (Cluster Manager) and are now outdated. New apps, `apps.catalog.cattle.io`, are found in the current UI (Cluster Explorer) for their respective cluster. New apps are preferable because their data resides in downstream clusters, freeing up resources in the local cluster. It is recommended to remove any remaining legacy apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. ### Using the Authorized Cluster Endpoint (ACE) + An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) option exist to access the Kubernetes API of Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to generated kubeconfig files generated for the cluster that uses a direct endpoint to the cluster, thereby bypassing Rancher. That reduces load on Rancher for use cases where unmediated API access is acceptable or preferable. Note that, in order for `kubeconfig` to take advantage of ACE, users need to issue the `kubectl use-context ` command in order to start using it. @@ -57,32 +61,36 @@ Note that, in order for `kubeconfig` to take advantage of ACE, users need to iss ### Experimental: Option to Reduce Event Handler Executions The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. -The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing. +The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. * `mgmt` refers to management controllers which only run on one Rancher node. -* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former. -* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for. +* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. +* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. ## Optimizations Outside of Rancher -Important influencing factors in Rancher performance are its underlying cluster's own performance and its configuration. The local cluster, if misconfigured, can indeed introduce a bottleneck Rancher software has no chance to resolve. -### Manage local cluster nodes directly, use RKE2 as the Kubernetes distribution of choice +Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve. + +### Manage Upstream Cluster Nodes Directly with RKE2 + As Rancher can be particularly demanding on the local cluster, especially in large scale scenarios, it is recommended to have full control of its configuration and its nodes. For example, when Rancher nodes experience high resource usage, standard Linux troubleshooting techniques and tools are recommended to identify whether Rancher, Kubernetes components, or OS components are the root cause of the excess resource consumption. -Consequently, although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for Rancher's local cluster in high scale scenarios, because they typically limit control on configuration and insights on individual nodes and services. +Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. -When choosing a Kubernetes distribution, it is recommended to use RKE2 for all Rancher large scale use cases. +Use RKE2 for large scale use cases. ### Keeping Kubernetes Versions Up to Date -Similar to Rancher versions, it is recommended to keep the local Kubernetes cluster up to date. That will ensure that your cluster contains any available performance enhancements and bug fixes. + +You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes. ### Optimizing etcd -etcd is the backing database for Kubernetes and for Rancher, therefore it plays a very important role in Rancher performance. + +Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. It is thus recommended that etcd runs on dedicated nodes with SSDs with high IOPS and a fast network setup. For more information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks) page. -As adding more nodes in an etcd cluster will make operations slower, for best performance it is recommended to run etcd on exactly 3 nodes. This may be counter-intuitive to common scaling approaches, and it is due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). +It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). -etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication, so it is recommended that etcd nodes are all colocated together with Rancher nodes. +Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes. From 740202cd2b96bb0322c566499a19c26f743d6c38 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Wed, 4 Oct 2023 09:11:17 +0200 Subject: [PATCH 30/47] Apply suggestions from code review Co-authored-by: Marty Hernandez Avedon --- ...and-best-practices-for-rancher-at-scale.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index 1740d676aa8..df7ef84a935 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -11,7 +11,7 @@ This guide describes the best practices and tuning approaches to scale Rancher s ## Optimizing Rancher Performance -* Keep Rancher up to date with patch releases. Performance improvements and bug fixes are made continuously, and the latest release incorporates the largest set ofperformance related development, experience and feedback from many users. +* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback. * Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. @@ -19,7 +19,7 @@ This guide describes the best practices and tuning approaches to scale Rancher s ## Minimizing Load on the Upstream Cluster -One typical bottleneck when scaling up Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. +When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster. ### Managing Your Object Counts @@ -41,25 +41,25 @@ You can reduce the number of `RoleBindings` in the upstream cluster in the follo ### RoleBinding Count Estimation -Predicting exactly the number of `RoleBindings` a given configuration will create depends on many factors and is complicated to calculate. However, it is possible to give a first estimation according to considerations below: -* As a minimum estimation consider the formula `32C + U + 2UaC + 8P + 5Pa`, where `C` is the cluster count, `U` is the user count, `Ua` is the average count of users with a membership on a cluster, `P` is the project count, and `Pa` is the average number of users with a membership on a project +Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`, where `C` is the total number of clusters, `U` is the total number of users, `Ua` is the average number of users with a membership on a cluster, `P` is the total number of projects, and `Pa` is the average number of users with a membership on a project. * The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. * The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. ### Using New Apps Over Legacy Apps -There are two app Kubernetes resources that Rancher uses: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, `apps.projects.cattle.io`, were first introduced with the former UI (Cluster Manager) and are now outdated. New apps, `apps.catalog.cattle.io`, are found in the current UI (Cluster Explorer) for their respective cluster. New apps are preferable because their data resides in downstream clusters, freeing up resources in the local cluster. +Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster. -It is recommended to remove any remaining legacy apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. +You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI. ### Using the Authorized Cluster Endpoint (ACE) -An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) option exist to access the Kubernetes API of Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to generated kubeconfig files generated for the cluster that uses a direct endpoint to the cluster, thereby bypassing Rancher. That reduces load on Rancher for use cases where unmediated API access is acceptable or preferable. +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. Note that, in order for `kubeconfig` to take advantage of ACE, users need to issue the `kubectl use-context ` command in order to start using it. ### Experimental: Option to Reduce Event Handler Executions -The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. +The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. @@ -75,7 +75,7 @@ Important influencing factors are the underlying cluster's own performance and c ### Manage Upstream Cluster Nodes Directly with RKE2 -As Rancher can be particularly demanding on the local cluster, especially in large scale scenarios, it is recommended to have full control of its configuration and its nodes. For example, when Rancher nodes experience high resource usage, standard Linux troubleshooting techniques and tools are recommended to identify whether Rancher, Kubernetes components, or OS components are the root cause of the excess resource consumption. +As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues. Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. @@ -89,7 +89,7 @@ You should keep the local Kubernetes cluster up to date. This will ensure that y Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. -The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. It is thus recommended that etcd runs on dedicated nodes with SSDs with high IOPS and a fast network setup. For more information regarding etcd performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks) page. +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks). It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). From 34ba2c5bd9f7e9efdd48364a2f65815e13ae0cb0 Mon Sep 17 00:00:00 2001 From: Marty Hernandez Avedon Date: Wed, 4 Oct 2023 15:21:57 -0400 Subject: [PATCH 31/47] Apply suggestions from code review Co-authored-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 4cd83193222..4ebee458c31 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -109,7 +109,7 @@ Please note that a highly available setup with at least three nodes is required | Large (*) | 500 | 5000 | 16 | 64 GB | | Larger (†) | (†) | (†) | (†) | (†) | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. (†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. From 6b5aa5edfca58c97fdceba1f7a1ad7be8f9a49db Mon Sep 17 00:00:00 2001 From: Marty Hernandez Avedon Date: Wed, 4 Oct 2023 15:28:16 -0400 Subject: [PATCH 32/47] Apply suggestions from code review Co-authored-by: Silvio Moioli --- .../tuning-and-best-practices-for-rancher-at-scale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index df7ef84a935..d1bdbb8af39 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -67,7 +67,7 @@ The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list o * `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. * `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. -In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. +In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user` ## Optimizations Outside of Rancher From dddf2b28c45148a1246891066e418577238e725d Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 5 Oct 2023 08:35:44 +0200 Subject: [PATCH 33/47] installation-requirements: uniform language about large deployments Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 4ebee458c31..c45b748c1b2 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -127,7 +127,7 @@ Please note that a highly available setup with at least three nodes is required | Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | | Large (*) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. (†): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. @@ -147,7 +147,7 @@ These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Ku | Medium | 300 | 3000 | 8 | 32 GB | | Large (*) | 500 | 5000 | 16 | 64 GB | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. ### RKE @@ -161,7 +161,7 @@ Please note that a highly available setup with at least three nodes is required | Medium | 300 | 3000 | 8 | 32 GB | | Large (*) | 500 | 5000 | 16 | 64 GB | -(*): Large deployments require [additional tuning and following of best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md). +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). From 2392110de58920cdc182eb8be523ef89f10ad8c7 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 5 Oct 2023 08:49:49 +0200 Subject: [PATCH 34/47] tuning-and-best-practices-for-rancher-at-scale: move ACE configuration instructions in the reference guide Signed-off-by: Silvio Moioli --- .../tuning-and-best-practices-for-rancher-at-scale.md | 4 +--- .../communicating-with-downstream-user-clusters.md | 10 +++++++++- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index d1bdbb8af39..8f63b790f77 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -54,9 +54,7 @@ You should remove any remaining legacy apps that appear in the Cluster Manager U ### Using the Authorized Cluster Endpoint (ACE) -An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. - -Note that, in order for `kubeconfig` to take advantage of ACE, users need to issue the `kubectl use-context ` command in order to start using it. +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. ### Experimental: Option to Reduce Event Handler Executions The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. diff --git a/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md index 0c5b83a3ffe..e5c59a2e3d8 100644 --- a/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md +++ b/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md @@ -74,7 +74,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`. -You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md) +:::note + +To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it. + +::: + +For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md). + +We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. ## Impersonation From 6ea7053b233b748bde86d19b3ddd816b86fbbf2a Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Thu, 5 Oct 2023 08:53:49 +0200 Subject: [PATCH 35/47] tuning-and-best-practices-for-rancher-at-scale: reword title Signed-off-by: Silvio Moioli --- .../tuning-and-best-practices-for-rancher-at-scale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index 8f63b790f77..cf6aa839ce6 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -56,7 +56,7 @@ You should remove any remaining legacy apps that appear in the Cluster Manager U An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. -### Experimental: Option to Reduce Event Handler Executions +### Reducing Event Handler Executions The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. From 586c109b17fe40c39447353667adf728d9f421b5 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 6 Oct 2023 09:53:58 +0200 Subject: [PATCH 36/47] Apply suggestions from code review Co-authored-by: Billy Tat --- .../tune-etcd-for-large-installs.md | 2 +- .../installation-requirements.md | 14 +++++++------- ...ning-and-best-practices-for-rancher-at-scale.md | 10 ++++++++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index fcf9566142e..7d803ff697e 100644 --- a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index c45b748c1b2..3f6b50d66e7 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -69,12 +69,12 @@ The following sections describe the CPU, memory, and I/O requirements for nodes Rancher's hardware footprint depends on a number of factors, including: - - Size of the managed infrastructure (eg. node count, cluster count). - - Complexity of the desired access control rules (eg. `RoleBinding` object count). - - Number of workloads (eg. Kubernetes deployments, Fleet deployments). - - Usage patterns (eg. subset of functionality actively used, frequency of use, number of concurrent users). + - Size of the managed infrastructure (e.g., node count, cluster count). + - Complexity of the desired access control rules (e.g., `RoleBinding` object count). + - Number of workloads (e.g., Kubernetes deployments, Fleet deployments). + - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users). -Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have higher or lower requirements. For enquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to typical use assumptions, which include: - Under 60 thousand total Kubernetes resources, per type. @@ -102,7 +102,7 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | |-----------------------------|----------------------------|-------------------------|-------|-------| | Small | 150 | 1500 | 4 | 16 GB | | Medium | 300 | 3000 | 8 | 32 GB | @@ -139,7 +139,7 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). | Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | |-----------------------------|----------------------------|-------------------------|-------|-------| diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index cf6aa839ce6..e8fdf28d045 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -15,7 +15,7 @@ This guide describes the best practices and tuning approaches to scale Rancher s * Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. -* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if a you require clusters or nodes spread across the world, consider multiple Rancher installations. +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations. ## Minimizing Load on the Upstream Cluster @@ -42,7 +42,12 @@ You can reduce the number of `RoleBindings` in the upstream cluster in the follo ### RoleBinding Count Estimation Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: -* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`, where `C` is the total number of clusters, `U` is the total number of users, `Ua` is the average number of users with a membership on a cluster, `P` is the total number of projects, and `Pa` is the average number of users with a membership on a project. +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`. + * `C` is the total number of clusters. + * `U` is the total number of users. + * `Ua` is the average number of users with a membership on a cluster. + * `P` is the total number of projects. + * `Pa` is the average number of users with a membership on a project. * The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. * The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. @@ -57,6 +62,7 @@ You should remove any remaining legacy apps that appear in the Cluster Manager U An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. ### Reducing Event Handler Executions + The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. From 79e276d8cc28c72214b300b47f1f7b4c0043b620 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 6 Oct 2023 09:55:16 +0200 Subject: [PATCH 37/47] installation-requirements: uniform ordering of footnote symbols Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 3f6b50d66e7..4d2346c2eb6 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -121,15 +121,15 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | External Database Host (†) | +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) | |-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| | Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | | Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | -| Large (*) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | +| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | -(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. +(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. -(†): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. +(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). From 48c266c288bd674a4013438ed217bd53f8221604 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 6 Oct 2023 09:55:46 +0200 Subject: [PATCH 38/47] installation-requirements: uniform table header capitalization Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 4d2346c2eb6..c296e47f92d 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -141,7 +141,7 @@ Please note that a highly available setup with at least three nodes is required These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). -| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | |-----------------------------|----------------------------|-------------------------|-------|-------| | Small | 150 | 1500 | 4 | 16 GB | | Medium | 300 | 3000 | 8 | 32 GB | @@ -155,7 +155,7 @@ The following table lists minimum CPU and memory requirements for each node in t Please note that a highly available setup with at least three nodes is required for production. -| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | |-----------------------------|----------------------------|-------------------------|-------|-------| | Small | 150 | 1500 | 4 | 16 GB | | Medium | 300 | 3000 | 8 | 32 GB | @@ -171,7 +171,7 @@ The following table lists minimum CPU and memory requirements for a [single Dock Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. -| Managed Infrastructure Size | Maximum number of Clusters | Maximum number of Nodes | vCPUs | RAM | +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | |-----------------------------|----------------------------|-------------------------|-------|------| | Small | 5 | 50 | 1 | 4 GB | | Medium | 15 | 200 | 2 | 8 GB | From 5904c88effc8a51d7d0bc2b9976e9ce9fae6b707 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 6 Oct 2023 10:19:23 +0200 Subject: [PATCH 39/47] revision of hardware/scale requirements and best practices: port changes to 2.7 Signed-off-by: Silvio Moioli --- docusaurus.config.js | 4 + .../tune-etcd-for-large-installs.md | 2 +- .../installation-requirements.md | 138 +++++++++++++----- .../tips-for-scaling-rancher.md | 65 --------- ...and-best-practices-for-rancher-at-scale.md | 100 +++++++++++++ versioned_sidebars/version-2.7-sidebars.json | 3 +- 6 files changed, 205 insertions(+), 107 deletions(-) delete mode 100644 versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md create mode 100644 versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md diff --git a/docusaurus.config.js b/docusaurus.config.js index 5595c9a4e82..06739ac516f 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -1194,6 +1194,10 @@ module.exports = { { to: "/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale", from: "/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher" + }, + { + to: "/v2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale", + from: "/v2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher" } ], }, diff --git a/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index e024f1dd779..7d803ff697e 100644 --- a/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. diff --git a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md index b7214336b13..c296e47f92d 100644 --- a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md @@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md) -### RKE Specific Requirements +### RKE2 Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). ### K3s Specific Requirements @@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup. -### RKE2 Specific Requirements +### RKE Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +For the container runtime, RKE should work with any modern Docker version. -For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). +For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). ## Hardware Requirements -The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. -## CPU and Memory +### Practical Considerations -Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. +Rancher's hardware footprint depends on a number of factors, including: -### RKE and Hosted Kubernetes + - Size of the managed infrastructure (e.g., node count, cluster count). + - Complexity of the desired access control rules (e.g., `RoleBinding` object count). + - Number of workloads (e.g., Kubernetes deployments, Fleet deployments). + - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users). -These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. -These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. +In particular, requirements on this page are subject to typical use assumptions, which include: + - Under 60 thousand total Kubernetes resources, per type. + - Up to 120 pods per node. + - Up to 200 CRDs in the upstream (local) cluster. + - Up to 100 CRDs in downstream clusters. + - Up to 50 Fleet deployments. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ | -------| ------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | +Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +:::note Evolution: -### K3s Kubernetes +Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day. -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) +Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | +If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +::: ### RKE2 Kubernetes -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. + +Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). + +### K3s Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) | +|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| +| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | + +(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. + +(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). + +### Hosted Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +### RKE + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). ### Docker -These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. +The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|------| +| Small | 5 | 50 | 1 | 4 GB | +| Medium | 15 | 200 | 2 | 8 GB | ## Ingress diff --git a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md deleted file mode 100644 index e8e919bde9b..00000000000 --- a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -title: Tips for Scaling Rancher ---- - - - - - -This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups. - -## General Tips on Optimizing Rancher's Performance -* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases. - -* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations. - -* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. - -## Minimizing Load on the local cluster -The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. - -### Managing Your Object Counts -ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. - -Rolebindings are created in the local cluster as a side effect of many operations. - -Considerations when attempting reduce rolebindings in the local cluster: -* Only add users to clusters and projects when necessary -* Remove clusters and projects when they are no longer needed -* Only use custom roles if necessary -* Use as few rules as possible in custom roles -* Consider whether adding a role to a user is redundant -* Consider that using less, but more powerful, clusters may be more efficient -* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case. - -### Using New Apps Over Legacy Apps -There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster. - -We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. - -### Using the Authorized Cluster Endpoint (ACE) -There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it. - -Without using ACE, all kubeconfig requests first route through Rancher. - -### Experimental: Option to Reduce Event Handler Executions -The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. - -The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing. - -* `mgmt` refers to management controllers which only run on one Rancher node. -* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former. -* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for. - -In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. - -## Optimizations Outside of Rancher -A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess. - -### Keeping Kubernetes Versions Up to Date -Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. - -### Optimizing ETCD -The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). - -Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication. diff --git a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md new file mode 100644 index 00000000000..e8fdf28d045 --- /dev/null +++ b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -0,0 +1,100 @@ +--- +title: Tuning and Best Practices for Rancher at Scale +--- + + + + + + +This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. + +## Optimizing Rancher Performance + +* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback. + +* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. + +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations. + +## Minimizing Load on the Upstream Cluster + +When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster. + +### Managing Your Object Counts + +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. + +This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. + +You can reduce the number of `RoleBindings` in the upstream cluster in the following ways: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible. +* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles. +* Only add users to clusters and projects when necessary. +* Remove clusters and projects when they are no longer needed. +* Only use custom roles if necessary. +* Use as few rules as possible in custom roles. +* Consider whether adding a role to a user is redundant. +* Consider using less, but more powerful, clusters. +* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects. +* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case. + +### RoleBinding Count Estimation + +Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`. + * `C` is the total number of clusters. + * `U` is the total number of users. + * `Ua` is the average number of users with a membership on a cluster. + * `P` is the total number of projects. + * `Pa` is the average number of users with a membership on a project. +* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. +* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. + +### Using New Apps Over Legacy Apps + +Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster. + +You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI. + +### Using the Authorized Cluster Endpoint (ACE) + +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. + +### Reducing Event Handler Executions + +The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. + +The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. + +* `mgmt` refers to management controllers which only run on one Rancher node. +* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. +* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. + +In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user` + +## Optimizations Outside of Rancher + +Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve. + +### Manage Upstream Cluster Nodes Directly with RKE2 + +As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues. + +Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. + +Use RKE2 for large scale use cases. + +### Keeping Kubernetes Versions Up to Date + +You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes. + +### Optimizing etcd + +Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. + +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks). + +It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). + +Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes. diff --git a/versioned_sidebars/version-2.7-sidebars.json b/versioned_sidebars/version-2.7-sidebars.json index 5625fdaf67d..2afa9c9297a 100644 --- a/versioned_sidebars/version-2.7-sidebars.json +++ b/versioned_sidebars/version-2.7-sidebars.json @@ -791,7 +791,8 @@ "items": [ "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", - "reference-guides/best-practices/rancher-server/tips-for-running-rancher" + "reference-guides/best-practices/rancher-server/tips-for-running-rancher", + "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale" ] }, { From 5257b01237ea38325de804b40fb0b3d87e2a4b85 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Fri, 6 Oct 2023 10:44:34 +0200 Subject: [PATCH 40/47] revision of hardware/scale requirements and best practices: port changes to 2.6 Signed-off-by: Silvio Moioli --- .../tune-etcd-for-large-installs.md | 2 +- .../installation-requirements.md | 138 +++++++++++++----- ...and-best-practices-for-rancher-at-scale.md | 100 +++++++++++++ versioned_sidebars/version-2.6-sidebars.json | 3 +- 4 files changed, 201 insertions(+), 42 deletions(-) create mode 100644 versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md diff --git a/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index e024f1dd779..7d803ff697e 100644 --- a/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. diff --git a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md index 1f06f1167b6..25013c493d7 100644 --- a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md @@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md) -### RKE Specific Requirements +### RKE2 Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). ### K3s Specific Requirements @@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup. -### RKE2 Specific Requirements +### RKE Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +For the container runtime, RKE should work with any modern Docker version. -For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-6-10/). +For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). ## Hardware Requirements -The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. -## CPU and Memory +### Practical Considerations -Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. +Rancher's hardware footprint depends on a number of factors, including: -### RKE and Hosted Kubernetes + - Size of the managed infrastructure (e.g., node count, cluster count). + - Complexity of the desired access control rules (e.g., `RoleBinding` object count). + - Number of workloads (e.g., Kubernetes deployments, Fleet deployments). + - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users). -These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. -These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. +In particular, requirements on this page are subject to typical use assumptions, which include: + - Under 60 thousand total Kubernetes resources, per type. + - Up to 120 pods per node. + - Up to 200 CRDs in the upstream (local) cluster. + - Up to 100 CRDs in downstream clusters. + - Up to 50 Fleet deployments. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ | -------| ------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | +Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +:::note Evolution: -### K3s Kubernetes +Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day. -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) +Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | +If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +::: ### RKE2 Kubernetes -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. + +Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). + +### K3s Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) | +|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| +| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | + +(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. + +(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). + +### Hosted Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +### RKE + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). ### Docker -These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. +The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|------| +| Small | 5 | 50 | 1 | 4 GB | +| Medium | 15 | 200 | 2 | 8 GB | ## Ingress diff --git a/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md new file mode 100644 index 00000000000..e8fdf28d045 --- /dev/null +++ b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -0,0 +1,100 @@ +--- +title: Tuning and Best Practices for Rancher at Scale +--- + + + + + + +This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. + +## Optimizing Rancher Performance + +* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback. + +* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. + +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations. + +## Minimizing Load on the Upstream Cluster + +When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster. + +### Managing Your Object Counts + +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. + +This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. + +You can reduce the number of `RoleBindings` in the upstream cluster in the following ways: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible. +* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles. +* Only add users to clusters and projects when necessary. +* Remove clusters and projects when they are no longer needed. +* Only use custom roles if necessary. +* Use as few rules as possible in custom roles. +* Consider whether adding a role to a user is redundant. +* Consider using less, but more powerful, clusters. +* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects. +* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case. + +### RoleBinding Count Estimation + +Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`. + * `C` is the total number of clusters. + * `U` is the total number of users. + * `Ua` is the average number of users with a membership on a cluster. + * `P` is the total number of projects. + * `Pa` is the average number of users with a membership on a project. +* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. +* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. + +### Using New Apps Over Legacy Apps + +Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster. + +You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI. + +### Using the Authorized Cluster Endpoint (ACE) + +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. + +### Reducing Event Handler Executions + +The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. + +The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. + +* `mgmt` refers to management controllers which only run on one Rancher node. +* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. +* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. + +In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user` + +## Optimizations Outside of Rancher + +Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve. + +### Manage Upstream Cluster Nodes Directly with RKE2 + +As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues. + +Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. + +Use RKE2 for large scale use cases. + +### Keeping Kubernetes Versions Up to Date + +You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes. + +### Optimizing etcd + +Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. + +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks). + +It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). + +Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes. diff --git a/versioned_sidebars/version-2.6-sidebars.json b/versioned_sidebars/version-2.6-sidebars.json index 8de527429bf..366b1dc5f29 100644 --- a/versioned_sidebars/version-2.6-sidebars.json +++ b/versioned_sidebars/version-2.6-sidebars.json @@ -788,7 +788,8 @@ "items": [ "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", - "reference-guides/best-practices/rancher-server/tips-for-running-rancher" + "reference-guides/best-practices/rancher-server/tips-for-running-rancher", + "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale" ] }, { From 421e12c05e9f2edaf942290bd7c7ff45fffbd469 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 10 Oct 2023 13:23:59 +0200 Subject: [PATCH 41/47] installation-requirements: redirect to Support Matrix page for RKE Docker requirements Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index c296e47f92d..7c8e91bfa4c 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -57,7 +57,7 @@ If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these ### RKE Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page. For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). From d5b649a6ca948c5a417415086ba7483d4d1fa4a8 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 10 Oct 2023 13:26:08 +0200 Subject: [PATCH 42/47] uniform thousand numbers Signed-off-by: Silvio Moioli --- docs/pages-for-subheaders/installation-requirements.md | 4 ++-- .../tuning-and-best-practices-for-rancher-at-scale.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md index 7c8e91bfa4c..e90c3bbd087 100644 --- a/docs/pages-for-subheaders/installation-requirements.md +++ b/docs/pages-for-subheaders/installation-requirements.md @@ -77,13 +77,13 @@ Rancher's hardware footprint depends on a number of factors, including: Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to typical use assumptions, which include: - - Under 60 thousand total Kubernetes resources, per type. + - Under 60,000 total Kubernetes resources, per type. - Up to 120 pods per node. - Up to 200 CRDs in the upstream (local) cluster. - Up to 100 CRDs in downstream clusters. - Up to 50 Fleet deployments. -Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. +Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. :::note Evolution: diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index e8fdf28d045..ff15e8e3734 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -23,7 +23,7 @@ When scaling up Rancher, one typical bottleneck is resource growth in the upstre ### Managing Your Object Counts -Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`. This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. From 01dc68f1cbcac1b662e3bab7064bec3265b963d3 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 10 Oct 2023 13:27:55 +0200 Subject: [PATCH 43/47] tuning-and-best-practices-for-rancher-at-scale: comma fixes Signed-off-by: Silvio Moioli --- .../tuning-and-best-practices-for-rancher-at-scale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index ff15e8e3734..865f1d32f6e 100644 --- a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -7,7 +7,7 @@ title: Tuning and Best Practices for Rancher at Scale -This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. +This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures. ## Optimizing Rancher Performance From 28b9fae3f190a78a32c1a849d1ed3e2b9e0b5992 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Tue, 10 Oct 2023 13:33:04 +0200 Subject: [PATCH 44/47] revision of hardware/scale requirements and best practices: port changes to 2.7 and 2.6 Signed-off-by: Silvio Moioli --- .../pages-for-subheaders/installation-requirements.md | 6 +++--- .../tuning-and-best-practices-for-rancher-at-scale.md | 4 ++-- .../pages-for-subheaders/installation-requirements.md | 6 +++--- .../tuning-and-best-practices-for-rancher-at-scale.md | 4 ++-- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md index 25013c493d7..2c49efaa834 100644 --- a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md @@ -57,7 +57,7 @@ If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these ### RKE Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page. For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). @@ -77,13 +77,13 @@ Rancher's hardware footprint depends on a number of factors, including: Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to typical use assumptions, which include: - - Under 60 thousand total Kubernetes resources, per type. + - Under 60,000 total Kubernetes resources, per type. - Up to 120 pods per node. - Up to 200 CRDs in the upstream (local) cluster. - Up to 100 CRDs in downstream clusters. - Up to 50 Fleet deployments. -Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. +Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. :::note Evolution: diff --git a/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index e8fdf28d045..865f1d32f6e 100644 --- a/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -7,7 +7,7 @@ title: Tuning and Best Practices for Rancher at Scale -This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. +This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures. ## Optimizing Rancher Performance @@ -23,7 +23,7 @@ When scaling up Rancher, one typical bottleneck is resource growth in the upstre ### Managing Your Object Counts -Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`. This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. diff --git a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md index c296e47f92d..e90c3bbd087 100644 --- a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md @@ -57,7 +57,7 @@ If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these ### RKE Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page. For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). @@ -77,13 +77,13 @@ Rancher's hardware footprint depends on a number of factors, including: Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. In particular, requirements on this page are subject to typical use assumptions, which include: - - Under 60 thousand total Kubernetes resources, per type. + - Under 60,000 total Kubernetes resources, per type. - Up to 120 pods per node. - Up to 200 CRDs in the upstream (local) cluster. - Up to 100 CRDs in downstream clusters. - Up to 50 Fleet deployments. -Higher numbers are possible but requirements might be higher. If you have more than 20 thousand resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. +Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. :::note Evolution: diff --git a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md index e8fdf28d045..865f1d32f6e 100644 --- a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md +++ b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -7,7 +7,7 @@ title: Tuning and Best Practices for Rancher at Scale -This guide describes the best practices and tuning approaches to scale Rancher setups, and the associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps that can minimize the load put on Rancher, and optimize Rancher's ability to manage larger infrastructures. +This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures. ## Optimizing Rancher Performance @@ -23,7 +23,7 @@ When scaling up Rancher, one typical bottleneck is resource growth in the upstre ### Managing Your Object Counts -Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60 thousand. Often that type is `RoleBinding`. +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`. This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. From 7e94414028c668fa923a06b3c3413432412a2344 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Wed, 11 Oct 2023 10:19:48 +0200 Subject: [PATCH 45/47] revision of hardware/scale requirements and best practices: port changes to 2.8 Signed-off-by: Silvio Moioli --- .../tune-etcd-for-large-installs.md | 2 +- .../installation-requirements.md | 138 +++++++++++++----- .../tips-for-scaling-rancher.md | 65 --------- ...and-best-practices-for-rancher-at-scale.md | 100 +++++++++++++ ...unicating-with-downstream-user-clusters.md | 10 +- 5 files changed, 208 insertions(+), 107 deletions(-) delete mode 100644 versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md create mode 100644 versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md diff --git a/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index e024f1dd779..7d803ff697e 100644 --- a/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. diff --git a/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md index b7214336b13..e90c3bbd087 100644 --- a/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md @@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md) -### RKE Specific Requirements +### RKE2 Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). ### K3s Specific Requirements @@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup. -### RKE2 Specific Requirements +### RKE Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page. -For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). +For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). ## Hardware Requirements -The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. -## CPU and Memory +### Practical Considerations -Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. +Rancher's hardware footprint depends on a number of factors, including: -### RKE and Hosted Kubernetes + - Size of the managed infrastructure (e.g., node count, cluster count). + - Complexity of the desired access control rules (e.g., `RoleBinding` object count). + - Number of workloads (e.g., Kubernetes deployments, Fleet deployments). + - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users). -These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. -These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. +In particular, requirements on this page are subject to typical use assumptions, which include: + - Under 60,000 total Kubernetes resources, per type. + - Up to 120 pods per node. + - Up to 200 CRDs in the upstream (local) cluster. + - Up to 100 CRDs in downstream clusters. + - Up to 50 Fleet deployments. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ | -------| ------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | +Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +:::note Evolution: -### K3s Kubernetes +Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day. -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) +Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | +If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +::: ### RKE2 Kubernetes -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. + +Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). + +### K3s Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) | +|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| +| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | + +(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. + +(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). + +### Hosted Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +### RKE + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). ### Docker -These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. +The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|------| +| Small | 5 | 50 | 1 | 4 GB | +| Medium | 15 | 200 | 2 | 8 GB | ## Ingress diff --git a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md deleted file mode 100644 index e8e919bde9b..00000000000 --- a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -title: Tips for Scaling Rancher ---- - - - - - -This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups. - -## General Tips on Optimizing Rancher's Performance -* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases. - -* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations. - -* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. - -## Minimizing Load on the local cluster -The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. - -### Managing Your Object Counts -ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. - -Rolebindings are created in the local cluster as a side effect of many operations. - -Considerations when attempting reduce rolebindings in the local cluster: -* Only add users to clusters and projects when necessary -* Remove clusters and projects when they are no longer needed -* Only use custom roles if necessary -* Use as few rules as possible in custom roles -* Consider whether adding a role to a user is redundant -* Consider that using less, but more powerful, clusters may be more efficient -* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case. - -### Using New Apps Over Legacy Apps -There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster. - -We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. - -### Using the Authorized Cluster Endpoint (ACE) -There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it. - -Without using ACE, all kubeconfig requests first route through Rancher. - -### Experimental: Option to Reduce Event Handler Executions -The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. - -The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing. - -* `mgmt` refers to management controllers which only run on one Rancher node. -* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former. -* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for. - -In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. - -## Optimizations Outside of Rancher -A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess. - -### Keeping Kubernetes Versions Up to Date -Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. - -### Optimizing ETCD -The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). - -Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication. diff --git a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md new file mode 100644 index 00000000000..865f1d32f6e --- /dev/null +++ b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -0,0 +1,100 @@ +--- +title: Tuning and Best Practices for Rancher at Scale +--- + + + + + + +This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures. + +## Optimizing Rancher Performance + +* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback. + +* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. + +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations. + +## Minimizing Load on the Upstream Cluster + +When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster. + +### Managing Your Object Counts + +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`. + +This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. + +You can reduce the number of `RoleBindings` in the upstream cluster in the following ways: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible. +* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles. +* Only add users to clusters and projects when necessary. +* Remove clusters and projects when they are no longer needed. +* Only use custom roles if necessary. +* Use as few rules as possible in custom roles. +* Consider whether adding a role to a user is redundant. +* Consider using less, but more powerful, clusters. +* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects. +* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case. + +### RoleBinding Count Estimation + +Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`. + * `C` is the total number of clusters. + * `U` is the total number of users. + * `Ua` is the average number of users with a membership on a cluster. + * `P` is the total number of projects. + * `Pa` is the average number of users with a membership on a project. +* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. +* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. + +### Using New Apps Over Legacy Apps + +Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster. + +You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI. + +### Using the Authorized Cluster Endpoint (ACE) + +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. + +### Reducing Event Handler Executions + +The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. + +The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. + +* `mgmt` refers to management controllers which only run on one Rancher node. +* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. +* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. + +In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user` + +## Optimizations Outside of Rancher + +Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve. + +### Manage Upstream Cluster Nodes Directly with RKE2 + +As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues. + +Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. + +Use RKE2 for large scale use cases. + +### Keeping Kubernetes Versions Up to Date + +You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes. + +### Optimizing etcd + +Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. + +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks). + +It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). + +Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes. diff --git a/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md index 586b55b0db8..08639d4b819 100644 --- a/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md +++ b/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md @@ -78,7 +78,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`. -You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md) +:::note + +To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it. + +::: + +For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md). + +We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. ## Impersonation From f90c66885f96f3f33b2d109f099c14f361c122d9 Mon Sep 17 00:00:00 2001 From: Silvio Moioli Date: Wed, 11 Oct 2023 10:31:33 +0200 Subject: [PATCH 46/47] revision of hardware/scale requirements and best practices: port changes to latest Signed-off-by: Silvio Moioli --- .../tune-etcd-for-large-installs.md | 2 +- .../installation-requirements.md | 138 +++++++++++++----- .../tips-for-scaling-rancher.md | 65 --------- ...and-best-practices-for-rancher-at-scale.md | 100 +++++++++++++ ...unicating-with-downstream-user-clusters.md | 10 +- 5 files changed, 208 insertions(+), 107 deletions(-) create mode 100644 versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md diff --git a/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md index e024f1dd779..7d803ff697e 100644 --- a/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md +++ b/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md @@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations -When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. +When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval. The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers. diff --git a/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md index b7214336b13..e90c3bbd087 100644 --- a/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md +++ b/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md @@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md) -### RKE Specific Requirements +### RKE2 Specific Requirements -For the container runtime, RKE should work with any modern Docker version. +RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs. -For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md) +For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). ### K3s Specific Requirements @@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup. -### RKE2 Specific Requirements +### RKE Specific Requirements -For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs. +RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page. -For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions). +For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md). ## Hardware Requirements -The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed. +The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure. -## CPU and Memory +### Practical Considerations -Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster. +Rancher's hardware footprint depends on a number of factors, including: -### RKE and Hosted Kubernetes + - Size of the managed infrastructure (e.g., node count, cluster count). + - Complexity of the desired access control rules (e.g., `RoleBinding` object count). + - Number of workloads (e.g., Kubernetes deployments, Fleet deployments). + - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users). -These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed. +Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance. -These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS. +In particular, requirements on this page are subject to typical use assumptions, which include: + - Under 60,000 total Kubernetes resources, per type. + - Up to 120 pods per node. + - Up to 200 CRDs in the upstream (local) cluster. + - Up to 100 CRDs in downstream clusters. + - Up to 50 Fleet deployments. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | ---------- | ------------ | -------| ------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | +Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +:::note Evolution: -### K3s Kubernetes +Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day. -These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md) +Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete. -| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size | -| --------------- | ---------- | ------------ | -------| ---------| ------------------------- | -| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS | -| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS | -| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS | -| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS | -| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS | +If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation. -Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours. +::: ### RKE2 Kubernetes -These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here. +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 2 | 5 GB | -| Medium | Up to 15 | Up to 200 | 3 | 9 GB | +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | +| Larger (†) | (†) | (†) | (†) | (†) | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation. + +Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements). + +### K3s Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) | +|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------| +| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS | +| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS | +| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS | + +(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store. + +(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements). + +### Hosted Kubernetes + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher). + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +### RKE + +The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md). + +Please note that a highly available setup with at least three nodes is required for production. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|-------| +| Small | 150 | 1500 | 4 | 16 GB | +| Medium | 300 | 3000 | 8 | 32 GB | +| Large (*) | 500 | 5000 | 16 | 64 GB | + +(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance. + +Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os). ### Docker -These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher. +The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md). -| Deployment Size | Clusters | Nodes | vCPUs | RAM | -| --------------- | -------- | --------- | ----- | ---- | -| Small | Up to 5 | Up to 50 | 1 | 4 GB | -| Medium | Up to 15 | Up to 200 | 2 | 8 GB | +Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments. + +| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | +|-----------------------------|----------------------------|-------------------------|-------|------| +| Small | 5 | 50 | 1 | 4 GB | +| Medium | 15 | 200 | 2 | 8 GB | ## Ingress diff --git a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md index e8e919bde9b..e69de29bb2d 100644 --- a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md +++ b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md @@ -1,65 +0,0 @@ ---- -title: Tips for Scaling Rancher ---- - - - - - -This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups. - -## General Tips on Optimizing Rancher's Performance -* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases. - -* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations. - -* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms. - -## Minimizing Load on the local cluster -The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster. - -### Managing Your Object Counts -ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings. - -Rolebindings are created in the local cluster as a side effect of many operations. - -Considerations when attempting reduce rolebindings in the local cluster: -* Only add users to clusters and projects when necessary -* Remove clusters and projects when they are no longer needed -* Only use custom roles if necessary -* Use as few rules as possible in custom roles -* Consider whether adding a role to a user is redundant -* Consider that using less, but more powerful, clusters may be more efficient -* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case. - -### Using New Apps Over Legacy Apps -There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster. - -We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only. - -### Using the Authorized Cluster Endpoint (ACE) -There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it. - -Without using ACE, all kubeconfig requests first route through Rancher. - -### Experimental: Option to Reduce Event Handler Executions -The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help. - -The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing. - -* `mgmt` refers to management controllers which only run on one Rancher node. -* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former. -* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for. - -In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`. - -## Optimizations Outside of Rancher -A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess. - -### Keeping Kubernetes Versions Up to Date -Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes. - -### Optimizing ETCD -The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks). - -Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication. diff --git a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md new file mode 100644 index 00000000000..865f1d32f6e --- /dev/null +++ b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md @@ -0,0 +1,100 @@ +--- +title: Tuning and Best Practices for Rancher at Scale +--- + + + + + + +This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures. + +## Optimizing Rancher Performance + +* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback. + +* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause. + +* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations. + +## Minimizing Load on the Upstream Cluster + +When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster. + +### Managing Your Object Counts + +Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`. + +This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect. + +You can reduce the number of `RoleBindings` in the upstream cluster in the following ways: +* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible. +* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles. +* Only add users to clusters and projects when necessary. +* Remove clusters and projects when they are no longer needed. +* Only use custom roles if necessary. +* Use as few rules as possible in custom roles. +* Consider whether adding a role to a user is redundant. +* Consider using less, but more powerful, clusters. +* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects. +* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case. + +### RoleBinding Count Estimation + +Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate: +* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`. + * `C` is the total number of clusters. + * `U` is the total number of users. + * `Ua` is the average number of users with a membership on a cluster. + * `P` is the total number of projects. + * `Pa` is the average number of users with a membership on a project. +* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects. +* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users. + +### Using New Apps Over Legacy Apps + +Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster. + +You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI. + +### Using the Authorized Cluster Endpoint (ACE) + +An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions. + +### Reducing Event Handler Executions + +The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help. + +The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing. + +* `mgmt` refers to management controllers which only run on one Rancher node. +* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former. +* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability. + +In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user` + +## Optimizations Outside of Rancher + +Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve. + +### Manage Upstream Cluster Nodes Directly with RKE2 + +As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues. + +Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services. + +Use RKE2 for large scale use cases. + +### Keeping Kubernetes Versions Up to Date + +You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes. + +### Optimizing etcd + +Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance. + +The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks). + +It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size). + +Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes. diff --git a/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md index 8abfd0c9f6c..f387852c567 100644 --- a/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md +++ b/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md @@ -77,7 +77,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`. -You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md) +:::note + +To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it. + +::: + +For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md). + +We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. ## Impersonation From bdf45d4ff06cc888b9eb838823908306a3811a3f Mon Sep 17 00:00:00 2001 From: Billy Tat Date: Thu, 12 Oct 2023 08:38:46 -0700 Subject: [PATCH 47/47] Add tuning and best practices to sidebars of latest & 2.8 Signed-off-by: Billy Tat --- versioned_sidebars/version-2.8-sidebars.json | 3 ++- versioned_sidebars/version-latest-sidebars.json | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/versioned_sidebars/version-2.8-sidebars.json b/versioned_sidebars/version-2.8-sidebars.json index 2219c8e78bb..dc211c7ddf4 100644 --- a/versioned_sidebars/version-2.8-sidebars.json +++ b/versioned_sidebars/version-2.8-sidebars.json @@ -791,7 +791,8 @@ "items": [ "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", - "reference-guides/best-practices/rancher-server/tips-for-running-rancher" + "reference-guides/best-practices/rancher-server/tips-for-running-rancher", + "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale" ] }, { diff --git a/versioned_sidebars/version-latest-sidebars.json b/versioned_sidebars/version-latest-sidebars.json index 5625fdaf67d..2afa9c9297a 100644 --- a/versioned_sidebars/version-latest-sidebars.json +++ b/versioned_sidebars/version-latest-sidebars.json @@ -791,7 +791,8 @@ "items": [ "reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere", "reference-guides/best-practices/rancher-server/rancher-deployment-strategy", - "reference-guides/best-practices/rancher-server/tips-for-running-rancher" + "reference-guides/best-practices/rancher-server/tips-for-running-rancher", + "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale" ] }, {