From 4b206d2457781afaacd9bb061bf018af8e0402e8 Mon Sep 17 00:00:00 2001 From: Wenhan Shi Date: Wed, 4 Aug 2021 13:22:51 +0900 Subject: [PATCH 1/2] Fix wrong deployment name of cattle-cluster-agent In a v2.5.8 rancher environment. > kubectl get deployment -n cattle-system NAME READY UP-TO-DATE AVAILABLE AGE cattle-cluster-agent 1/1 1 1 36h --- .../v2.5/en/installation/resources/update-ca-cert/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/rancher/v2.5/en/installation/resources/update-ca-cert/_index.md b/content/rancher/v2.5/en/installation/resources/update-ca-cert/_index.md index 139ca70ac76..1ff7143f372 100644 --- a/content/rancher/v2.5/en/installation/resources/update-ca-cert/_index.md +++ b/content/rancher/v2.5/en/installation/resources/update-ca-cert/_index.md @@ -132,7 +132,7 @@ Using a Kubeconfig for each downstream cluster update the environment variable f ``` $ kubectl edit -n cattle-system ds/cattle-node-agent -$ kubectl edit -n cattle-system deployment/cluster-agent +$ kubectl edit -n cattle-system deployment/cattle-cluster-agent ``` ### Method 3: Recreate Rancher agents From 223ef3f47600921cb0ea814f7796e412a8294d60 Mon Sep 17 00:00:00 2001 From: Tejeev Date: Wed, 11 Aug 2021 13:47:13 +0100 Subject: [PATCH 2/2] Added resource useage best practices We're still seeing lots of OOM events from WAL compaction, and these defaults are not good for all environments. I have attempted to address this in the docs by adding a best practice of tuning resource limits and mentioning the issue in known issues. --- content/rancher/v2.5/en/monitoring-alerting/_index.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/rancher/v2.5/en/monitoring-alerting/_index.md b/content/rancher/v2.5/en/monitoring-alerting/_index.md index 49d197877bd..297fe99f61d 100644 --- a/content/rancher/v2.5/en/monitoring-alerting/_index.md +++ b/content/rancher/v2.5/en/monitoring-alerting/_index.md @@ -219,7 +219,7 @@ For more information on configuring Alertmanager in Rancher, see [this page.](./ The resource requests and limits can be configured when installing `rancher-monitoring`. -The default values are in the [values.yaml](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/values.yaml) in the `rancher-monitoring` Helm chart. +The default values are in the [values.yaml](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/values.yaml) in the `rancher-monitoring` Helm chart. As every environment is different, it is recommended that you set limits higher than the recommended and then tune them down to accomodate what your environment uses after observing it in operation for a week or two. As you scale your clusters, you will need to also scale your requests and limits to accomodate the larger amount of data collected from them. The default values in the table below are the minimum required resource limits and requests. @@ -237,4 +237,6 @@ At least 50Gi storage is recommended. # Known Issues -There is a [known issue](https://github.com/rancher/rancher/issues/28787#issuecomment-693611821) that K3s clusters require more default memory. If you are enabling monitoring on a K3s cluster, we recommend to setting `prometheus.prometheusSpec.resources.memory.limit` to 2500 Mi and `prometheus.prometheusSpec.resources.memory.request` to 1750 Mi. +There is a [known issue](https://github.com/rancher/rancher/issues/28787#issuecomment-693611821) that K3s clusters require more default memory. If you are enabling monitoring on a K3s cluster, we recommend setting `prometheus.prometheusSpec.resources.memory.limit` to 2500 Mi and `prometheus.prometheusSpec.resources.memory.request` to 1750 Mi. + +It is common that as the amount of metrics and deployments being monitors grows, Prometheus's memory and CPU needs outgrow the limits initially placed on them. If you see Prometheus commonly crashing, try increasing the allocated memory and setting alerts for when resource usage of Monitoring pods approaches limits placed on them.