diff --git a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
index e024f1dd779..7d803ff697e 100644
--- a/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
+++ b/docs/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
@@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations
-When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
+When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers.
diff --git a/docs/pages-for-subheaders/installation-requirements.md b/docs/pages-for-subheaders/installation-requirements.md
index b7214336b13..e90c3bbd087 100644
--- a/docs/pages-for-subheaders/installation-requirements.md
+++ b/docs/pages-for-subheaders/installation-requirements.md
@@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp
If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md)
-### RKE Specific Requirements
+### RKE2 Specific Requirements
-For the container runtime, RKE should work with any modern Docker version.
+RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs.
-For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md)
+For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
### K3s Specific Requirements
@@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow
If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup.
-### RKE2 Specific Requirements
+### RKE Specific Requirements
-For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs.
+RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page.
-For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
+For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md).
## Hardware Requirements
-The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed.
+The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure.
-## CPU and Memory
+### Practical Considerations
-Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster.
+Rancher's hardware footprint depends on a number of factors, including:
-### RKE and Hosted Kubernetes
+ - Size of the managed infrastructure (e.g., node count, cluster count).
+ - Complexity of the desired access control rules (e.g., `RoleBinding` object count).
+ - Number of workloads (e.g., Kubernetes deployments, Fleet deployments).
+ - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users).
-These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed.
+Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance.
-These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS.
+In particular, requirements on this page are subject to typical use assumptions, which include:
+ - Under 60,000 total Kubernetes resources, per type.
+ - Up to 120 pods per node.
+ - Up to 200 CRDs in the upstream (local) cluster.
+ - Up to 100 CRDs in downstream clusters.
+ - Up to 50 Fleet deployments.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | ---------- | ------------ | -------| ------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB |
+Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::note Evolution:
-### K3s Kubernetes
+Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day.
-These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md)
+Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size |
-| --------------- | ---------- | ------------ | -------| ---------| ------------------------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS |
+If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::
### RKE2 Kubernetes
-These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here.
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 2 | 5 GB |
-| Medium | Up to 15 | Up to 200 | 3 | 9 GB |
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+| Larger (†) | (†) | (†) | (†) | (†) |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation.
+
+Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements).
+
+### K3s Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) |
+|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------|
+| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS |
+| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS |
+| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS |
+
+(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store.
+
+(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements).
+
+### Hosted Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher).
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+### RKE
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os).
### Docker
-These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher.
+The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 1 | 4 GB |
-| Medium | Up to 15 | Up to 200 | 2 | 8 GB |
+Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|------|
+| Small | 5 | 50 | 1 | 4 GB |
+| Medium | 15 | 200 | 2 | 8 GB |
## Ingress
diff --git a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
deleted file mode 100644
index e8e919bde9b..00000000000
--- a/docs/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
+++ /dev/null
@@ -1,65 +0,0 @@
----
-title: Tips for Scaling Rancher
----
-
-
-
-
-
-This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups.
-
-## General Tips on Optimizing Rancher's Performance
-* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases.
-
-* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations.
-
-* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms.
-
-## Minimizing Load on the local cluster
-The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster.
-
-### Managing Your Object Counts
-ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings.
-
-Rolebindings are created in the local cluster as a side effect of many operations.
-
-Considerations when attempting reduce rolebindings in the local cluster:
-* Only add users to clusters and projects when necessary
-* Remove clusters and projects when they are no longer needed
-* Only use custom roles if necessary
-* Use as few rules as possible in custom roles
-* Consider whether adding a role to a user is redundant
-* Consider that using less, but more powerful, clusters may be more efficient
-* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case.
-
-### Using New Apps Over Legacy Apps
-There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster.
-
-We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only.
-
-### Using the Authorized Cluster Endpoint (ACE)
-There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it.
-
-Without using ACE, all kubeconfig requests first route through Rancher.
-
-### Experimental: Option to Reduce Event Handler Executions
-The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help.
-
-The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing.
-
-* `mgmt` refers to management controllers which only run on one Rancher node.
-* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former.
-* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for.
-
-In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`.
-
-## Optimizations Outside of Rancher
-A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess.
-
-### Keeping Kubernetes Versions Up to Date
-Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes.
-
-### Optimizing ETCD
-The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks).
-
-Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication.
diff --git a/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
new file mode 100644
index 00000000000..865f1d32f6e
--- /dev/null
+++ b/docs/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
@@ -0,0 +1,100 @@
+---
+title: Tuning and Best Practices for Rancher at Scale
+---
+
+
+
+
+
+
+This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures.
+
+## Optimizing Rancher Performance
+
+* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback.
+
+* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause.
+
+* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations.
+
+## Minimizing Load on the Upstream Cluster
+
+When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster.
+
+### Managing Your Object Counts
+
+Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`.
+
+This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect.
+
+You can reduce the number of `RoleBindings` in the upstream cluster in the following ways:
+* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible.
+* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles.
+* Only add users to clusters and projects when necessary.
+* Remove clusters and projects when they are no longer needed.
+* Only use custom roles if necessary.
+* Use as few rules as possible in custom roles.
+* Consider whether adding a role to a user is redundant.
+* Consider using less, but more powerful, clusters.
+* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects.
+* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case.
+
+### RoleBinding Count Estimation
+
+Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate:
+* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`.
+ * `C` is the total number of clusters.
+ * `U` is the total number of users.
+ * `Ua` is the average number of users with a membership on a cluster.
+ * `P` is the total number of projects.
+ * `Pa` is the average number of users with a membership on a project.
+* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects.
+* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users.
+
+### Using New Apps Over Legacy Apps
+
+Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster.
+
+You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI.
+
+### Using the Authorized Cluster Endpoint (ACE)
+
+An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions.
+
+### Reducing Event Handler Executions
+
+The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help.
+
+The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing.
+
+* `mgmt` refers to management controllers which only run on one Rancher node.
+* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former.
+* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability.
+
+In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user`
+
+## Optimizations Outside of Rancher
+
+Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve.
+
+### Manage Upstream Cluster Nodes Directly with RKE2
+
+As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues.
+
+Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services.
+
+Use RKE2 for large scale use cases.
+
+### Keeping Kubernetes Versions Up to Date
+
+You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes.
+
+### Optimizing etcd
+
+Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance.
+
+The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks).
+
+It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size).
+
+Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes.
diff --git a/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
index 586b55b0db8..08639d4b819 100644
--- a/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
+++ b/docs/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
@@ -78,7 +78,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service
With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`.
-You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md)
+:::note
+
+To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it.
+
+:::
+
+For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md).
+
+We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster.
## Impersonation
diff --git a/docusaurus.config.js b/docusaurus.config.js
index f012de889d2..b34fd0d83ee 100644
--- a/docusaurus.config.js
+++ b/docusaurus.config.js
@@ -1221,6 +1221,14 @@ module.exports = {
{
to: "/v2.7/reference-guides/rancher-security/hardening-guides/rke2-hardening-guide/rke2-self-assessment-guide-with-cis-v1.7-k8s-v1.25-v1.26-v1.27",
from: "/v2.7/reference-guides/rancher-security/hardening-guides/rke2-hardening-guide/rke2-self-assessment-guide-with-cis-v1.23-k8s-v1.25"
+ },
+ {
+ to: "/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale",
+ from: "/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher"
+ },
+ {
+ to: "/v2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale",
+ from: "/v2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher"
}
],
},
diff --git a/sidebars.js b/sidebars.js
index b4e164c5130..b70797bb6dc 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -828,7 +828,8 @@ const sidebars = {
items: [
"reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere",
"reference-guides/best-practices/rancher-server/rancher-deployment-strategy",
- "reference-guides/best-practices/rancher-server/tips-for-running-rancher"
+ "reference-guides/best-practices/rancher-server/tips-for-running-rancher",
+ "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale"
]
},
{
diff --git a/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
index e024f1dd779..7d803ff697e 100644
--- a/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
+++ b/versioned_docs/version-2.6/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
@@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations
-When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
+When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers.
diff --git a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md
index 1f06f1167b6..2c49efaa834 100644
--- a/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md
+++ b/versioned_docs/version-2.6/pages-for-subheaders/installation-requirements.md
@@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp
If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md)
-### RKE Specific Requirements
+### RKE2 Specific Requirements
-For the container runtime, RKE should work with any modern Docker version.
+RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs.
-For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md)
+For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
### K3s Specific Requirements
@@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow
If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup.
-### RKE2 Specific Requirements
+### RKE Specific Requirements
-For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs.
+RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page.
-For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-6-10/).
+For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md).
## Hardware Requirements
-The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed.
+The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure.
-## CPU and Memory
+### Practical Considerations
-Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster.
+Rancher's hardware footprint depends on a number of factors, including:
-### RKE and Hosted Kubernetes
+ - Size of the managed infrastructure (e.g., node count, cluster count).
+ - Complexity of the desired access control rules (e.g., `RoleBinding` object count).
+ - Number of workloads (e.g., Kubernetes deployments, Fleet deployments).
+ - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users).
-These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed.
+Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance.
-These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS.
+In particular, requirements on this page are subject to typical use assumptions, which include:
+ - Under 60,000 total Kubernetes resources, per type.
+ - Up to 120 pods per node.
+ - Up to 200 CRDs in the upstream (local) cluster.
+ - Up to 100 CRDs in downstream clusters.
+ - Up to 50 Fleet deployments.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | ---------- | ------------ | -------| ------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB |
+Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::note Evolution:
-### K3s Kubernetes
+Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day.
-These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md)
+Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size |
-| --------------- | ---------- | ------------ | -------| ---------| ------------------------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS |
+If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::
### RKE2 Kubernetes
-These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here.
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 2 | 5 GB |
-| Medium | Up to 15 | Up to 200 | 3 | 9 GB |
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+| Larger (†) | (†) | (†) | (†) | (†) |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation.
+
+Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements).
+
+### K3s Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) |
+|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------|
+| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS |
+| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS |
+| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS |
+
+(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store.
+
+(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements).
+
+### Hosted Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher).
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+### RKE
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os).
### Docker
-These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher.
+The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 1 | 4 GB |
-| Medium | Up to 15 | Up to 200 | 2 | 8 GB |
+Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|------|
+| Small | 5 | 50 | 1 | 4 GB |
+| Medium | 15 | 200 | 2 | 8 GB |
## Ingress
diff --git a/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
new file mode 100644
index 00000000000..865f1d32f6e
--- /dev/null
+++ b/versioned_docs/version-2.6/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
@@ -0,0 +1,100 @@
+---
+title: Tuning and Best Practices for Rancher at Scale
+---
+
+
+
+
+
+
+This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures.
+
+## Optimizing Rancher Performance
+
+* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback.
+
+* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause.
+
+* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations.
+
+## Minimizing Load on the Upstream Cluster
+
+When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster.
+
+### Managing Your Object Counts
+
+Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`.
+
+This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect.
+
+You can reduce the number of `RoleBindings` in the upstream cluster in the following ways:
+* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible.
+* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles.
+* Only add users to clusters and projects when necessary.
+* Remove clusters and projects when they are no longer needed.
+* Only use custom roles if necessary.
+* Use as few rules as possible in custom roles.
+* Consider whether adding a role to a user is redundant.
+* Consider using less, but more powerful, clusters.
+* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects.
+* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case.
+
+### RoleBinding Count Estimation
+
+Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate:
+* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`.
+ * `C` is the total number of clusters.
+ * `U` is the total number of users.
+ * `Ua` is the average number of users with a membership on a cluster.
+ * `P` is the total number of projects.
+ * `Pa` is the average number of users with a membership on a project.
+* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects.
+* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users.
+
+### Using New Apps Over Legacy Apps
+
+Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster.
+
+You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI.
+
+### Using the Authorized Cluster Endpoint (ACE)
+
+An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions.
+
+### Reducing Event Handler Executions
+
+The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help.
+
+The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing.
+
+* `mgmt` refers to management controllers which only run on one Rancher node.
+* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former.
+* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability.
+
+In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user`
+
+## Optimizations Outside of Rancher
+
+Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve.
+
+### Manage Upstream Cluster Nodes Directly with RKE2
+
+As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues.
+
+Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services.
+
+Use RKE2 for large scale use cases.
+
+### Keeping Kubernetes Versions Up to Date
+
+You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes.
+
+### Optimizing etcd
+
+Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance.
+
+The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks).
+
+It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size).
+
+Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes.
diff --git a/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
index e024f1dd779..7d803ff697e 100644
--- a/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
+++ b/versioned_docs/version-2.7/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
@@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations
-When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
+When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers.
diff --git a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md
index b7214336b13..e90c3bbd087 100644
--- a/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md
+++ b/versioned_docs/version-2.7/pages-for-subheaders/installation-requirements.md
@@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp
If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md)
-### RKE Specific Requirements
+### RKE2 Specific Requirements
-For the container runtime, RKE should work with any modern Docker version.
+RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs.
-For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md)
+For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
### K3s Specific Requirements
@@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow
If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup.
-### RKE2 Specific Requirements
+### RKE Specific Requirements
-For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs.
+RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page.
-For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
+For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md).
## Hardware Requirements
-The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed.
+The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure.
-## CPU and Memory
+### Practical Considerations
-Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster.
+Rancher's hardware footprint depends on a number of factors, including:
-### RKE and Hosted Kubernetes
+ - Size of the managed infrastructure (e.g., node count, cluster count).
+ - Complexity of the desired access control rules (e.g., `RoleBinding` object count).
+ - Number of workloads (e.g., Kubernetes deployments, Fleet deployments).
+ - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users).
-These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed.
+Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance.
-These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS.
+In particular, requirements on this page are subject to typical use assumptions, which include:
+ - Under 60,000 total Kubernetes resources, per type.
+ - Up to 120 pods per node.
+ - Up to 200 CRDs in the upstream (local) cluster.
+ - Up to 100 CRDs in downstream clusters.
+ - Up to 50 Fleet deployments.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | ---------- | ------------ | -------| ------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB |
+Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::note Evolution:
-### K3s Kubernetes
+Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day.
-These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md)
+Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size |
-| --------------- | ---------- | ------------ | -------| ---------| ------------------------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS |
+If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::
### RKE2 Kubernetes
-These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here.
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 2 | 5 GB |
-| Medium | Up to 15 | Up to 200 | 3 | 9 GB |
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+| Larger (†) | (†) | (†) | (†) | (†) |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation.
+
+Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements).
+
+### K3s Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) |
+|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------|
+| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS |
+| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS |
+| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS |
+
+(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store.
+
+(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements).
+
+### Hosted Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher).
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+### RKE
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os).
### Docker
-These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher.
+The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 1 | 4 GB |
-| Medium | Up to 15 | Up to 200 | 2 | 8 GB |
+Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|------|
+| Small | 5 | 50 | 1 | 4 GB |
+| Medium | 15 | 200 | 2 | 8 GB |
## Ingress
diff --git a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
deleted file mode 100644
index e8e919bde9b..00000000000
--- a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
+++ /dev/null
@@ -1,65 +0,0 @@
----
-title: Tips for Scaling Rancher
----
-
-
-
-
-
-This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups.
-
-## General Tips on Optimizing Rancher's Performance
-* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases.
-
-* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations.
-
-* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms.
-
-## Minimizing Load on the local cluster
-The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster.
-
-### Managing Your Object Counts
-ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings.
-
-Rolebindings are created in the local cluster as a side effect of many operations.
-
-Considerations when attempting reduce rolebindings in the local cluster:
-* Only add users to clusters and projects when necessary
-* Remove clusters and projects when they are no longer needed
-* Only use custom roles if necessary
-* Use as few rules as possible in custom roles
-* Consider whether adding a role to a user is redundant
-* Consider that using less, but more powerful, clusters may be more efficient
-* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case.
-
-### Using New Apps Over Legacy Apps
-There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster.
-
-We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only.
-
-### Using the Authorized Cluster Endpoint (ACE)
-There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it.
-
-Without using ACE, all kubeconfig requests first route through Rancher.
-
-### Experimental: Option to Reduce Event Handler Executions
-The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help.
-
-The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing.
-
-* `mgmt` refers to management controllers which only run on one Rancher node.
-* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former.
-* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for.
-
-In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`.
-
-## Optimizations Outside of Rancher
-A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess.
-
-### Keeping Kubernetes Versions Up to Date
-Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes.
-
-### Optimizing ETCD
-The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks).
-
-Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication.
diff --git a/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
new file mode 100644
index 00000000000..865f1d32f6e
--- /dev/null
+++ b/versioned_docs/version-2.7/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
@@ -0,0 +1,100 @@
+---
+title: Tuning and Best Practices for Rancher at Scale
+---
+
+
+
+
+
+
+This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures.
+
+## Optimizing Rancher Performance
+
+* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback.
+
+* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause.
+
+* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations.
+
+## Minimizing Load on the Upstream Cluster
+
+When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster.
+
+### Managing Your Object Counts
+
+Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`.
+
+This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect.
+
+You can reduce the number of `RoleBindings` in the upstream cluster in the following ways:
+* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible.
+* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles.
+* Only add users to clusters and projects when necessary.
+* Remove clusters and projects when they are no longer needed.
+* Only use custom roles if necessary.
+* Use as few rules as possible in custom roles.
+* Consider whether adding a role to a user is redundant.
+* Consider using less, but more powerful, clusters.
+* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects.
+* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case.
+
+### RoleBinding Count Estimation
+
+Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate:
+* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`.
+ * `C` is the total number of clusters.
+ * `U` is the total number of users.
+ * `Ua` is the average number of users with a membership on a cluster.
+ * `P` is the total number of projects.
+ * `Pa` is the average number of users with a membership on a project.
+* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects.
+* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users.
+
+### Using New Apps Over Legacy Apps
+
+Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster.
+
+You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI.
+
+### Using the Authorized Cluster Endpoint (ACE)
+
+An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions.
+
+### Reducing Event Handler Executions
+
+The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help.
+
+The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing.
+
+* `mgmt` refers to management controllers which only run on one Rancher node.
+* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former.
+* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability.
+
+In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user`
+
+## Optimizations Outside of Rancher
+
+Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve.
+
+### Manage Upstream Cluster Nodes Directly with RKE2
+
+As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues.
+
+Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services.
+
+Use RKE2 for large scale use cases.
+
+### Keeping Kubernetes Versions Up to Date
+
+You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes.
+
+### Optimizing etcd
+
+Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance.
+
+The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks).
+
+It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size).
+
+Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes.
diff --git a/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
index e024f1dd779..7d803ff697e 100644
--- a/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
+++ b/versioned_docs/version-2.8/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
@@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations
-When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
+When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers.
diff --git a/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md
index b7214336b13..e90c3bbd087 100644
--- a/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md
+++ b/versioned_docs/version-2.8/pages-for-subheaders/installation-requirements.md
@@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp
If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md)
-### RKE Specific Requirements
+### RKE2 Specific Requirements
-For the container runtime, RKE should work with any modern Docker version.
+RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs.
-For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md)
+For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
### K3s Specific Requirements
@@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow
If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup.
-### RKE2 Specific Requirements
+### RKE Specific Requirements
-For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs.
+RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page.
-For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
+For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md).
## Hardware Requirements
-The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed.
+The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure.
-## CPU and Memory
+### Practical Considerations
-Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster.
+Rancher's hardware footprint depends on a number of factors, including:
-### RKE and Hosted Kubernetes
+ - Size of the managed infrastructure (e.g., node count, cluster count).
+ - Complexity of the desired access control rules (e.g., `RoleBinding` object count).
+ - Number of workloads (e.g., Kubernetes deployments, Fleet deployments).
+ - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users).
-These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed.
+Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance.
-These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS.
+In particular, requirements on this page are subject to typical use assumptions, which include:
+ - Under 60,000 total Kubernetes resources, per type.
+ - Up to 120 pods per node.
+ - Up to 200 CRDs in the upstream (local) cluster.
+ - Up to 100 CRDs in downstream clusters.
+ - Up to 50 Fleet deployments.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | ---------- | ------------ | -------| ------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB |
+Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::note Evolution:
-### K3s Kubernetes
+Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day.
-These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md)
+Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size |
-| --------------- | ---------- | ------------ | -------| ---------| ------------------------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS |
+If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::
### RKE2 Kubernetes
-These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here.
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 2 | 5 GB |
-| Medium | Up to 15 | Up to 200 | 3 | 9 GB |
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+| Larger (†) | (†) | (†) | (†) | (†) |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation.
+
+Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements).
+
+### K3s Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) |
+|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------|
+| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS |
+| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS |
+| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS |
+
+(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store.
+
+(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements).
+
+### Hosted Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher).
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+### RKE
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os).
### Docker
-These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher.
+The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 1 | 4 GB |
-| Medium | Up to 15 | Up to 200 | 2 | 8 GB |
+Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|------|
+| Small | 5 | 50 | 1 | 4 GB |
+| Medium | 15 | 200 | 2 | 8 GB |
## Ingress
diff --git a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
deleted file mode 100644
index e8e919bde9b..00000000000
--- a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
+++ /dev/null
@@ -1,65 +0,0 @@
----
-title: Tips for Scaling Rancher
----
-
-
-
-
-
-This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups.
-
-## General Tips on Optimizing Rancher's Performance
-* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases.
-
-* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations.
-
-* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms.
-
-## Minimizing Load on the local cluster
-The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster.
-
-### Managing Your Object Counts
-ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings.
-
-Rolebindings are created in the local cluster as a side effect of many operations.
-
-Considerations when attempting reduce rolebindings in the local cluster:
-* Only add users to clusters and projects when necessary
-* Remove clusters and projects when they are no longer needed
-* Only use custom roles if necessary
-* Use as few rules as possible in custom roles
-* Consider whether adding a role to a user is redundant
-* Consider that using less, but more powerful, clusters may be more efficient
-* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case.
-
-### Using New Apps Over Legacy Apps
-There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster.
-
-We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only.
-
-### Using the Authorized Cluster Endpoint (ACE)
-There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it.
-
-Without using ACE, all kubeconfig requests first route through Rancher.
-
-### Experimental: Option to Reduce Event Handler Executions
-The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help.
-
-The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing.
-
-* `mgmt` refers to management controllers which only run on one Rancher node.
-* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former.
-* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for.
-
-In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`.
-
-## Optimizations Outside of Rancher
-A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess.
-
-### Keeping Kubernetes Versions Up to Date
-Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes.
-
-### Optimizing ETCD
-The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks).
-
-Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication.
diff --git a/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
new file mode 100644
index 00000000000..865f1d32f6e
--- /dev/null
+++ b/versioned_docs/version-2.8/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
@@ -0,0 +1,100 @@
+---
+title: Tuning and Best Practices for Rancher at Scale
+---
+
+
+
+
+
+
+This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures.
+
+## Optimizing Rancher Performance
+
+* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback.
+
+* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause.
+
+* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations.
+
+## Minimizing Load on the Upstream Cluster
+
+When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster.
+
+### Managing Your Object Counts
+
+Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`.
+
+This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect.
+
+You can reduce the number of `RoleBindings` in the upstream cluster in the following ways:
+* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible.
+* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles.
+* Only add users to clusters and projects when necessary.
+* Remove clusters and projects when they are no longer needed.
+* Only use custom roles if necessary.
+* Use as few rules as possible in custom roles.
+* Consider whether adding a role to a user is redundant.
+* Consider using less, but more powerful, clusters.
+* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects.
+* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case.
+
+### RoleBinding Count Estimation
+
+Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate:
+* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`.
+ * `C` is the total number of clusters.
+ * `U` is the total number of users.
+ * `Ua` is the average number of users with a membership on a cluster.
+ * `P` is the total number of projects.
+ * `Pa` is the average number of users with a membership on a project.
+* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects.
+* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users.
+
+### Using New Apps Over Legacy Apps
+
+Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster.
+
+You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI.
+
+### Using the Authorized Cluster Endpoint (ACE)
+
+An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions.
+
+### Reducing Event Handler Executions
+
+The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help.
+
+The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing.
+
+* `mgmt` refers to management controllers which only run on one Rancher node.
+* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former.
+* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability.
+
+In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user`
+
+## Optimizations Outside of Rancher
+
+Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve.
+
+### Manage Upstream Cluster Nodes Directly with RKE2
+
+As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues.
+
+Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services.
+
+Use RKE2 for large scale use cases.
+
+### Keeping Kubernetes Versions Up to Date
+
+You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes.
+
+### Optimizing etcd
+
+Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance.
+
+The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks).
+
+It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size).
+
+Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes.
diff --git a/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
index 586b55b0db8..08639d4b819 100644
--- a/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
+++ b/versioned_docs/version-2.8/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
@@ -78,7 +78,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service
With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`.
-You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md)
+:::note
+
+To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it.
+
+:::
+
+For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md).
+
+We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster.
## Impersonation
diff --git a/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md b/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
index e024f1dd779..7d803ff697e 100644
--- a/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
+++ b/versioned_docs/version-latest/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs.md
@@ -6,7 +6,7 @@ title: Tuning etcd for Large Installations
-When running larger Rancher installations with 15 or more clusters it is recommended to increase the default keyspace for etcd from the default 2GB. The maximum setting is 8GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
+When Rancher is used to manage [a large infrastructure](../../pages-for-subheaders/installation-requirements.md) it is recommended to increase the default keyspace for etcd from the default 2 GB. The maximum setting is 8 GB and the host should have enough RAM to keep the entire dataset in memory. When increasing this value you should also increase the size of the host. The keyspace size can also be adjusted in smaller installations if you anticipate a high rate of change of pods during the garbage collection interval.
The etcd data set is automatically cleaned up on a five minute interval by Kubernetes. There are situations, e.g. deployment thrashing, where enough events could be written to etcd and deleted before garbage collection occurs and cleans things up causing the keyspace to fill up. If you see `mvcc: database space exceeded` errors, in the etcd logs or Kubernetes API server logs, you should consider increasing the keyspace size. This can be accomplished by setting the [quota-backend-bytes](https://etcd.io/docs/v3.4.0/op-guide/maintenance/#space-quota) setting on the etcd servers.
diff --git a/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md b/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md
index b7214336b13..e90c3bbd087 100644
--- a/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md
+++ b/versioned_docs/version-latest/pages-for-subheaders/installation-requirements.md
@@ -39,11 +39,11 @@ If you don't feel comfortable doing so, you might check suggestions in the [resp
If you plan to run Rancher on ARM64, see [Running on ARM64 (Experimental).](../how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64.md)
-### RKE Specific Requirements
+### RKE2 Specific Requirements
-For the container runtime, RKE should work with any modern Docker version.
+RKE2 bundles its own container runtime, containerd. Docker is not required for RKE2 installs.
-For more information see [Installing Docker,](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md)
+For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
### K3s Specific Requirements
@@ -55,68 +55,126 @@ If you are installing Rancher on a K3s cluster with **Raspbian Buster**, follow
If you are installing Rancher on a K3s cluster with Alpine Linux, follow [these steps](https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-alpine-linux-setup) for additional setup.
-### RKE2 Specific Requirements
+### RKE Specific Requirements
-For the container runtime, RKE2 bundles its own containerd. Docker is not required for RKE2 installs.
+RKE requires a Docker container runtime. Supported Docker versions are specified in the [Support Matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/) page.
-For details on which OS versions were tested with RKE2, refer to the [Rancher support matrix](https://www.suse.com/suse-rancher/support-matrix/all-supported-versions).
+For more information, see [Installing Docker](../getting-started/installation-and-upgrade/installation-requirements/install-docker.md).
## Hardware Requirements
-The following sections describe the CPU, memory, and disk requirements for the nodes where the Rancher server is installed.
+The following sections describe the CPU, memory, and I/O requirements for nodes where Rancher is installed. Requirements vary based on the size of the infrastructure.
-## CPU and Memory
+### Practical Considerations
-Hardware requirements scale based on the size of your Rancher deployment. Provision each individual node according to the requirements. The requirements are different depending on if you are installing Rancher in a single container with Docker, or if you are installing Rancher on a Kubernetes cluster.
+Rancher's hardware footprint depends on a number of factors, including:
-### RKE and Hosted Kubernetes
+ - Size of the managed infrastructure (e.g., node count, cluster count).
+ - Complexity of the desired access control rules (e.g., `RoleBinding` object count).
+ - Number of workloads (e.g., Kubernetes deployments, Fleet deployments).
+ - Usage patterns (e.g., subset of functionality actively used, frequency of use, number of concurrent users).
-These CPU and memory requirements apply to each host in the Kubernetes cluster where the Rancher server is installed.
+Since there are a high number of influencing factors that may vary over time, the requirements listed here should be understood as reasonable starting points that work well for most use cases. Nevertheless, your use case may have different requirements. For inquiries about a specific scenario please [contact Rancher](https://rancher.com/contact/) for further guidance.
-These requirements apply to RKE Kubernetes clusters, as well as to hosted Kubernetes clusters such as EKS.
+In particular, requirements on this page are subject to typical use assumptions, which include:
+ - Under 60,000 total Kubernetes resources, per type.
+ - Up to 120 pods per node.
+ - Up to 200 CRDs in the upstream (local) cluster.
+ - Up to 100 CRDs in downstream clusters.
+ - Up to 50 Fleet deployments.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | ---------- | ------------ | -------| ------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB |
+Higher numbers are possible but requirements might be higher. If you have more than 20,000 resources of the same type, loading time of the whole list through the Rancher UI might take several seconds.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::note Evolution:
-### K3s Kubernetes
+Rancher's codebase evolves, use cases change, and the body of accumulated Rancher experience grows every day.
-These CPU and memory requirements apply to each host in a [K3s Kubernetes cluster where the Rancher server is installed.](install-upgrade-on-a-kubernetes-cluster.md)
+Hardware requirement recommendations are subject to change over time, as guidelines improve in accuracy and become more concrete.
-| Deployment Size | Clusters | Nodes | vCPUs | RAM | Database Size |
-| --------------- | ---------- | ------------ | -------| ---------| ------------------------- |
-| Small | Up to 150 | Up to 1500 | 2 | 8 GB | 2 cores, 4 GB + 1000 IOPS |
-| Medium | Up to 300 | Up to 3000 | 4 | 16 GB | 2 cores, 4 GB + 1000 IOPS |
-| Large | Up to 500 | Up to 5000 | 8 | 32 GB | 2 cores, 4 GB + 1000 IOPS |
-| X-Large | Up to 1000 | Up to 10,000 | 16 | 64 GB | 2 cores, 4 GB + 1000 IOPS |
-| XX-Large | Up to 2000 | Up to 20,000 | 32 | 128 GB | 2 cores, 4 GB + 1000 IOPS |
+If you find that your Rancher deployment no longer complies with the listed recommendations, [contact Rancher](https://rancher.com/contact/) for a re-evaluation.
-Every use case and environment is different. Please [contact Rancher](https://rancher.com/contact/) to review yours.
+:::
### RKE2 Kubernetes
-These CPU and memory requirements apply to each instance with RKE2 installed. Minimum recommendations are outlined here.
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 2 | 5 GB |
-| Medium | Up to 15 | Up to 200 | 3 | 9 GB |
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+| Larger (†) | (†) | (†) | (†) | (†) |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+(†): Larger deployment sizes are generally possible with ad-hoc hardware recommendations and tuning. You can [contact Rancher](https://rancher.com/contact/) for a custom evaluation.
+
+Refer to RKE2 documentation for more detailed information on [RKE2 general requirements](https://docs.rke2.io/install/requirements).
+
+### K3s Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM | External Database Host (*) |
+|-----------------------------|----------------------------|-------------------------|-------|-------|----------------------------|
+| Small | 150 | 1500 | 4 | 16 GB | 2 vCPUs, 8 GB + 1000 IOPS |
+| Medium | 300 | 3000 | 8 | 32 GB | 4 vCPUs, 16 GB + 2000 IOPS |
+| Large (†) | 500 | 5000 | 16 | 64 GB | 8 vCPUs, 32 GB + 4000 IOPS |
+
+(*): External Database Host refers to hosting the K3s cluster data store on an [dedicated external host](https://docs.k3s.io/datastore). This is optional. Exact requirements depend on the external data store.
+
+(†): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the K3s documentation for more detailed information on [general requirements](https://docs.k3s.io/installation/requirements).
+
+### Hosted Kubernetes
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+These requirements apply to hosted Kubernetes clusters such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). They don't apply to Rancher SaaS solutions such as [Rancher Prime Hosted](https://www.rancher.com/products/rancher).
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+### RKE
+
+The following table lists minimum CPU and memory requirements for each node in the [upstream cluster](install-upgrade-on-a-kubernetes-cluster.md).
+
+Please note that a highly available setup with at least three nodes is required for production.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|-------|
+| Small | 150 | 1500 | 4 | 16 GB |
+| Medium | 300 | 3000 | 8 | 32 GB |
+| Large (*) | 500 | 5000 | 16 | 64 GB |
+
+(*): Large deployments require that you [follow best practices](../reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md) for adequate performance.
+
+Refer to the RKE documentation for more detailed information on [general requirements](https://rke.docs.rancher.com/os).
### Docker
-These CPU and memory requirements apply to a host with a [single-node](rancher-on-a-single-node-with-docker.md) installation of Rancher.
+The following table lists minimum CPU and memory requirements for a [single Docker node installation of Rancher](rancher-on-a-single-node-with-docker.md).
-| Deployment Size | Clusters | Nodes | vCPUs | RAM |
-| --------------- | -------- | --------- | ----- | ---- |
-| Small | Up to 5 | Up to 50 | 1 | 4 GB |
-| Medium | Up to 15 | Up to 200 | 2 | 8 GB |
+Please note that a Docker installation is only suitable for development or testing purposes and is not meant to be used in production environments.
+
+| Managed Infrastructure Size | Maximum Number of Clusters | Maximum Number of Nodes | vCPUs | RAM |
+|-----------------------------|----------------------------|-------------------------|-------|------|
+| Small | 5 | 50 | 1 | 4 GB |
+| Medium | 15 | 200 | 2 | 8 GB |
## Ingress
diff --git a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
index e8e919bde9b..e69de29bb2d 100644
--- a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
+++ b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tips-for-scaling-rancher.md
@@ -1,65 +0,0 @@
----
-title: Tips for Scaling Rancher
----
-
-
-
-
-
-This guide aims to introduce the approaches that should be considered to scale Rancher setups, and associated challenges with doing so. As systems grow performance will naturally reduce, but there are steps we can take to minimize the load put on Rancher, as well as optimize Rancher's ability to handle these larger setups.
-
-## General Tips on Optimizing Rancher's Performance
-* It is advisable to keep Rancher up to date with patch releases. Performance improvements and bug fixes are made throughout the life of a minor release. You can review the release notes to help inform your own decisions on whether an upgrade is necessary but we recommend keeping yourself up to date in most cases.
-
-* Performance will be negatively impacted by increased latency between Rancher's infrastructure and a downstream cluster's infrastructure (eg. geographic distance). If a user or organization requires clusters/nodes all over the world or spread across many regions, it is best to use multiple Rancher installations.
-
-* Please always try to scale up gradually, monitoring and observing any change in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, and before other problems confuse symptoms.
-
-## Minimizing Load on the local cluster
-The largest bottleneck when scaling Rancher is resource growth in the local Kubernetes cluster. The local cluster contains information for all downstream clusters. Many operations that apply to downstream clusters will create new objects in the local cluster and require computation from handlers running in the local cluster.
-
-### Managing Your Object Counts
-ETCD eventually encounters limitations to the number of a single Kubernetes resource type it can store. These exact numbers are not well documented. From internal observations we usually see performance issues once a single resource type's object count exceeds 60k, and often that type is Rolebindings.
-
-Rolebindings are created in the local cluster as a side effect of many operations.
-
-Considerations when attempting reduce rolebindings in the local cluster:
-* Only add users to clusters and projects when necessary
-* Remove clusters and projects when they are no longer needed
-* Only use custom roles if necessary
-* Use as few rules as possible in custom roles
-* Consider whether adding a role to a user is redundant
-* Consider that using less, but more powerful, clusters may be more efficient
-* Experiment to see if creating new projects or creating new clusters manifests in fewer rolebindings for your specific use case.
-
-### Using New Apps Over Legacy Apps
-There are two app kubernetes resources that Rancher uses: apps.projects.cattle.io and apps.cattle.cattle.io. The legacy apps, apps.projects.cattle.io, were introduced first in the Cluster Manager and are now outdated. The new apps, apps.catalog.cattle.io, are found in the Cluster Explorer for their respective cluster. The new apps are preferrable because they live in the downstream cluster while the legacy apps live in the local cluster.
-
-We recommend removing apps that appear in the Cluster Manager, replacing them with apps in the Cluster Explorer for their target cluster if necessary and creating any future apps in the cluster's Cluster Explorer only.
-
-### Using the Authorized Cluster Endpoint (ACE)
-There is an _Authorized Cluster Endpoint_ option for Rancher provisioned RKE1, RKE2, and K3s clusters. When enabled this adds a context to kubeconfigs generated for the cluster that uses a direct endpoint to the cluster and bypasses Rancher. However, it is not enough to only enable this option. The user of the Kubeconfig needs to use `kubectl use-context ` in order to start using it.
-
-Without using ACE, all kubeconfig requests first route through Rancher.
-
-### Experimental: Option to Reduce Event Handler Executions
-The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when caches are synced. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, this scheduled execution of handlers can be disabled using the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable. If resource allocation spikes are seen on an interval of about 15 hours it is possible this setting can help.
-
-The value for the environment variable can be a comma separated list of the following options. The values refer to types of controllers (the structures that contain and run handlers) and their handlers. Adding the controller types to the variable will disable that set of controllers from running their handlers as part of cache resyncing.
-
-* `mgmt` refers to management controllers which only run on one Rancher node.
-* `user` refers to user controllers which run for every cluster. Some of these are ran on the same node as management controllers, while other run in the downstream cluster. This will option targets the former.
-* `scaled` refers to scaled controllers which run on every Rancher node. This is not recommended to be set due to the critical functionality the scaled handlers are responsible for.
-
-In short, if you notice CPU usage peaks every 15 hours, add the CATTLE_SYNC_ONLY_CHANGED_OBJECTS environment variable to your rancher deployment with the value `mgmt,user`.
-
-## Optimizations Outside of Rancher
-A large component of performance is the local cluster and how it was configured. This cluster can introduce a bottleneck before Rancher software ever runs. When Rancher nodes experience high resource usage, you can use the command "top" to identify whether it is Rancher or a Kubernetes component that is consuming the resource in excess.
-
-### Keeping Kubernetes Versions Up to Date
-Similar to Rancher versions, it is advisable to keep your kubernetes cluster up to date. This will ensure that your cluster contains any available performance enhancements or bug fixes.
-
-### Optimizing ETCD
-The two main bottlenecks to [ETCD performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk speed and network speed. Optimization to either should improve performance. For information regarding ETCD performance see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](https://docs.ranchermanager.rancher.io/how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found [in our docs](https://docs.Ranchermanager.Rancher.io/v2.5/pages-for-subheaders/installation-requirements#disks).
-
-Theoretically, the more nodes in an ETCD cluster the slower it will be due to replication requirements [source](https://etcd.io/docs/v3.3/faq). This may be counter-intuitive to common scaling approaches. It can also be inferred that ETCD performance will be inversely affected by distance between nodes as that will slow down network communication.
diff --git a/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
new file mode 100644
index 00000000000..865f1d32f6e
--- /dev/null
+++ b/versioned_docs/version-latest/reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale.md
@@ -0,0 +1,100 @@
+---
+title: Tuning and Best Practices for Rancher at Scale
+---
+
+
+
+
+
+
+This guide describes the best practices and tuning approaches to scale Rancher setups and the associated challenges with doing so. As systems grow, performance will naturally reduce, but there are steps that can minimize the load put on Rancher and optimize Rancher's ability to manage larger infrastructures.
+
+## Optimizing Rancher Performance
+
+* Keep Rancher up to date with patch releases. We are continuously improving Rancher with performance enhancements and bug fixes. The latest Rancher release contains all accumulated improvements to performance and stability, plus updates based on developer experience and user feedback.
+
+* Always scale up gradually, and monitor and observe any changes in behavior while doing do. It is usually easier to resolve performance problems as soon as they surface, before other problems obscure the root cause.
+
+* Reduce network latency between the upstream Rancher cluster and downstream clusters to the extent possible. Note that latency is, among other factors, a function of geographic distance - if you require clusters or nodes spread across the world, consider multiple Rancher installations.
+
+## Minimizing Load on the Upstream Cluster
+
+When scaling up Rancher, one typical bottleneck is resource growth in the upstream (local) Kubernetes cluster. The upstream cluster contains information for all downstream clusters. Many operations that apply to downstream clusters create new objects in the upstream cluster and require computation from handlers running in the upstream cluster.
+
+### Managing Your Object Counts
+
+Etcd is the backing database for Kubernetes and for Rancher. The database may eventually encounter limitations to the number of a single Kubernetes resource type it can store. Exact limits vary and depend on a number of factors. However, experience indicates that performance issues frequently arise once a single resource type's object count exceeds 60,000. Often that type is `RoleBinding`.
+
+This is typical in Rancher, as many operations create new `RoleBinding` objects in the upstream cluster as a side effect.
+
+You can reduce the number of `RoleBindings` in the upstream cluster in the following ways:
+* Limit the use of the [Restricted Admin](../../../how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions#restricted-admin) role. Apply other roles wherever possible.
+* If you use [external authentication](../../../pages-for-subheaders/authentication-config), use groups to assign roles.
+* Only add users to clusters and projects when necessary.
+* Remove clusters and projects when they are no longer needed.
+* Only use custom roles if necessary.
+* Use as few rules as possible in custom roles.
+* Consider whether adding a role to a user is redundant.
+* Consider using less, but more powerful, clusters.
+* Kubernetes permissions are always "additive" (allow-list) rather than "subtractive" (deny-list). Try to minimize configurations that gives access to all but one aspect of a cluster, project, or namespace, as that will result in the creation of a high number of `RoleBinding` objects.
+* Experiment to see if creating new projects or clusters manifests in fewer `RoleBindings` for your specific use case.
+
+### RoleBinding Count Estimation
+
+Predicting how many `RoleBinding` objects a given configuration will create is complicated. However, the following considerations can offer a rough estimate:
+* For a minimum estimate, use the formula `32C + U + 2UaC + 8P + 5Pa`.
+ * `C` is the total number of clusters.
+ * `U` is the total number of users.
+ * `Ua` is the average number of users with a membership on a cluster.
+ * `P` is the total number of projects.
+ * `Pa` is the average number of users with a membership on a project.
+* The Restricted Admin role follows a different formula, as every user with this role results in at least `7C + 2P + 2` additional `RoleBinding` objects.
+* The number of `RoleBindings` increases linearly with the number of clusters, projects, and users.
+
+### Using New Apps Over Legacy Apps
+
+Rancher uses two Kubernetes app resources: `apps.projects.cattle.io` and `apps.cattle.cattle.io`. Legacy apps, represented by `apps.projects.cattle.io`, were introduced with the former Cluster Manager UI and are now outdated. Current apps, represented by `apps.catalog.cattle.io`, are found in the Cluster Explorer UI for their respective cluster. `Apps.cattle.cattle.io` apps are preferable because their data resides in downstream clusters, which frees up resources in the upstream cluster.
+
+You should remove any remaining legacy apps that appear in the Cluster Manager UI, and replace them with apps in the Cluster Explorer UI. Create any new apps only in the Cluster Explorer UI.
+
+### Using the Authorized Cluster Endpoint (ACE)
+
+An [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) (ACE) provides access to the Kubernetes API of Rancher-provisioned RKE, RKE2, and K3s clusters. When enabled, the ACE adds a context to kubeconfig files generated for the cluster. The context uses a direct endpoint to the cluster, thereby bypassing Rancher. This reduces load on Rancher for cases where unmediated API access is acceptable or preferable. See [Authorized Cluster Endpoint](../../../reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters#4-authorized-cluster-endpoint) for more information and configuration instructions.
+
+### Reducing Event Handler Executions
+
+The bulk of Rancher's logic occurs on event handlers. These event handlers run on an object whenever the object is updated, and when Rancher is started. Additionally, they run every 15 hours when Rancher syncs caches. In scaled setups these scheduled runs come with huge performance costs because every handler is being run on every applicable object. However, the scheduled handler execution can be disabled with the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable. If resource allocation spikes are seen every 15 hours, this setting can help.
+
+The value for `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` can be a comma separated list of the following options. The values refer to types of handlers and controllers (the structures that contain and run handlers). Adding the controller types to the variable disables that set of controllers from running their handlers as part of cache resyncing.
+
+* `mgmt` refers to management controllers which only run on one Rancher node.
+* `user` refers to user controllers which run for every cluster. Some of these run on the same node as management controllers, while others run in the downstream cluster. This option targets the former.
+* `scaled` refers to scaled controllers which run on every Rancher node. You should avoid setting this value, as the scaled handlers are responsible for critical functions and changes may disrupt cluster stability.
+
+In short, if you notice CPU usage peaks every 15 hours, add the `CATTLE_SYNC_ONLY_CHANGED_OBJECTS` environment variable to your Rancher deployment (in the `spec.containers.env` list) with the value `mgmt,user`
+
+## Optimizations Outside of Rancher
+
+Important influencing factors are the underlying cluster's own performance and configuration. The upstream cluster, if misconfigured, can introduce a bottleneck Rancher software has no chance to resolve.
+
+### Manage Upstream Cluster Nodes Directly with RKE2
+
+As Rancher can be very demanding on the upstream cluster, especially at scale, you should have full administrative control of the cluster's configuration and nodes. To identify the root cause of excess resource consumption, use standard Linux troubleshooting techniques and tools. This can aid in distinguishing between whether Rancher, Kubernetes, or operating system components are causing issues.
+
+Although managed Kubernetes services make it easier to deploy and run Kubernetes clusters, they are discouraged for the upstream cluster in high scale scenarios. Managed Kubernetes services typically limit access to configuration and insights on individual nodes and services.
+
+Use RKE2 for large scale use cases.
+
+### Keeping Kubernetes Versions Up to Date
+
+You should keep the local Kubernetes cluster up to date. This will ensure that your cluster has all available performance enhancements and bug fixes.
+
+### Optimizing etcd
+
+Etcd is the backend database for Kubernetes and for Rancher. It plays a very important role in Rancher performance.
+
+The two main bottlenecks to [etcd performance](https://etcd.io/docs/v3.4/op-guide/performance/) are disk and network speed. Etcd should run on dedicated nodes with a fast network setup and with SSDs that have high input/output operations per second (IOPS). For more information regarding etcd performance, see [Slow etcd performance (performance testing and optimization)](https://www.suse.com/support/kb/doc/?id=000020100) and [Tuning etcd for Large Installations](../../../how-to-guides/advanced-user-guides/tune-etcd-for-large-installs). Information on disks can also be found in the [Installation Requirements](../../../pages-for-subheaders/installation-requirements#disks).
+
+It's best to run etcd on exactly three nodes, as adding more nodes will reduce operation speed. This may be counter-intuitive to common scaling approaches, but it's due to etcd's [replication mechanisms](https://etcd.io/docs/v3.5/faq/#what-is-maximum-cluster-size).
+
+Etcd performance will also be negatively affected by network latency between nodes as that will slow down network communication. Etcd nodes should be located together with Rancher nodes.
diff --git a/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md b/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
index 8abfd0c9f6c..f387852c567 100644
--- a/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
+++ b/versioned_docs/version-latest/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters.md
@@ -77,7 +77,15 @@ Like the authorized cluster endpoint, the `kube-api-auth` authentication service
With this endpoint enabled for the downstream cluster, Rancher generates an extra Kubernetes context in the kubeconfig file in order to connect directly to the cluster. This file has the credentials for `kubectl` and `helm`.
-You will need to use a context defined in this kubeconfig file to access the cluster if Rancher goes down. Therefore, we recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster. For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file.](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md)
+:::note
+
+To use the ACE context in your kubeconfig, run `kubectl use-context ` after enabling it.
+
+:::
+
+For more information, refer to the section on accessing your cluster with [kubectl and the kubeconfig file](../../how-to-guides/new-user-guides/manage-clusters/access-clusters/use-kubectl-and-kubeconfig.md).
+
+We recommend exporting the kubeconfig file so that if Rancher goes down, you can still use the credentials in the file to access your cluster.
## Impersonation
diff --git a/versioned_sidebars/version-2.6-sidebars.json b/versioned_sidebars/version-2.6-sidebars.json
index 8de527429bf..366b1dc5f29 100644
--- a/versioned_sidebars/version-2.6-sidebars.json
+++ b/versioned_sidebars/version-2.6-sidebars.json
@@ -788,7 +788,8 @@
"items": [
"reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere",
"reference-guides/best-practices/rancher-server/rancher-deployment-strategy",
- "reference-guides/best-practices/rancher-server/tips-for-running-rancher"
+ "reference-guides/best-practices/rancher-server/tips-for-running-rancher",
+ "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale"
]
},
{
diff --git a/versioned_sidebars/version-2.7-sidebars.json b/versioned_sidebars/version-2.7-sidebars.json
index 5625fdaf67d..2afa9c9297a 100644
--- a/versioned_sidebars/version-2.7-sidebars.json
+++ b/versioned_sidebars/version-2.7-sidebars.json
@@ -791,7 +791,8 @@
"items": [
"reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere",
"reference-guides/best-practices/rancher-server/rancher-deployment-strategy",
- "reference-guides/best-practices/rancher-server/tips-for-running-rancher"
+ "reference-guides/best-practices/rancher-server/tips-for-running-rancher",
+ "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale"
]
},
{
diff --git a/versioned_sidebars/version-2.8-sidebars.json b/versioned_sidebars/version-2.8-sidebars.json
index bcd0b8430a4..a7d0f465b85 100644
--- a/versioned_sidebars/version-2.8-sidebars.json
+++ b/versioned_sidebars/version-2.8-sidebars.json
@@ -791,7 +791,8 @@
"items": [
"reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere",
"reference-guides/best-practices/rancher-server/rancher-deployment-strategy",
- "reference-guides/best-practices/rancher-server/tips-for-running-rancher"
+ "reference-guides/best-practices/rancher-server/tips-for-running-rancher",
+ "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale"
]
},
{
diff --git a/versioned_sidebars/version-latest-sidebars.json b/versioned_sidebars/version-latest-sidebars.json
index 5625fdaf67d..2afa9c9297a 100644
--- a/versioned_sidebars/version-latest-sidebars.json
+++ b/versioned_sidebars/version-latest-sidebars.json
@@ -791,7 +791,8 @@
"items": [
"reference-guides/best-practices/rancher-server/on-premises-rancher-in-vsphere",
"reference-guides/best-practices/rancher-server/rancher-deployment-strategy",
- "reference-guides/best-practices/rancher-server/tips-for-running-rancher"
+ "reference-guides/best-practices/rancher-server/tips-for-running-rancher",
+ "reference-guides/best-practices/rancher-server/tuning-and-best-practices-for-rancher-at-scale"
]
},
{