Merge pull request #2767 from rancher/master

Add monitoring docs changes to staging
2026-05-13 08:33:35 +00:00 · 2020-10-09 17:12:05 -07:00
parent df20032789 d7db443233
commit 6e88234e73
12 changed files with 404 additions and 92 deletions
@@ -179,7 +179,7 @@ helm install rancher-<CHART_REPO>/rancher \
  --set privateCA=true
 ```

-Now that Rancher is deployed, see [Adding TLS Secrets]({{<baseurl>}}/rancher/v2.x/en/installation/options/helm2/helm-rancher/tls-secrets/) to publish the certificate files so Rancher and the ingress controller can use them.
+Now that Rancher is deployed, see [Adding TLS Secrets]({{<baseurl>}}/rancher/v2.x/en/installation/resources/encryption/tls-secrets/) to publish the certificate files so Rancher and the ingress controller can use them.

 After adding the secrets, check if Rancher was rolled out successfully:

@@ -9,86 +9,197 @@ aliases:
  - /rancher/v2.x/en/cluster-admin/tools/monitoring/
 ---

-Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
+Using Rancher, you can quickly deploy leading open-source monitoring & alerting solutions such as [Prometheus](https://prometheus.io/), [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/), and [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/) onto your cluster.

-This page describes how to enable monitoring for a cluster. 
+Rancher's solution (powered by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)) allows users to:

-This section covers the following topics:
+- Monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments via [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.

- [Changes in Rancher v2.5](#changes-in-rancher-v2-5)
- [About Prometheus](#about-prometheus)
- [Monitoring scope](#monitoring-scope)
- [Enabling cluster monitoring](#enabling-cluster-monitoring)
- [Configuration](#configuration)
- [Examples](#examples)
-  - [Create ServiceMonitor Custom Resource](#create-servicemonitor-custom-resource)
-  - [PodMonitor](#podmonitor)
-  - [PrometheusRule](#prometheusrule)
-  - [Alertmanager Config](#alertmanager-config)
-  - [Configuring a Persistent Grafana Dashboard](#configuring-a-persistent-grafana-dashboard)
-  - [Configuring Grafana to Use Multiple Data Sources](#configuring-grafana-to-use-multiple-data-sources)
+- Defines alerts based on metrics collected via [Prometheus](https://prometheus.io/)
+- Creates custom dashboards to make it easy to visualize collected metrics via [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/)
+- Configures alert-based notifications via Email, Slack, PagerDuty, etc. using [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/)
+- Defines precomputed frequently needed / computationally expensive expressions as new time series based on metrics collected via [Prometheus](https://prometheus.io/) (only available in 2.5.x)
+- Exposes collected metrics from Prometheus to the Kubernetes Custom Metrics API via [Prometheus Adapter](https://github.com/DirectXMan12/k8s-prometheus-adapter) for use in HPA (only available in 2.5)

+More information about the resources that get deployed onto your cluster to support this solution can be found in the [`rancher-monitoring`](https://github.com/rancher/charts/tree/main/charts/rancher-monitoring) Helm chart, which closely tracks the upstream [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart maintained by the Prometheus community with certain changes tracked in the [CHANGELOG.md](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/CHANGELOG.md).

-# Changes in Rancher v2.5
+This page describes how to enable monitoring & alerting within a cluster using Rancher's new monitoring application, which was introduced in Rancher v2.5.

-Rancher's monitoring application is powered by the Prometheus operator, and it now relies less on Rancher's in-house monitoring tools.
+If you previously enabled Monitoring, Alerting, or Notifiers in Rancher prior to v2.5, there is no upgrade path for switching to the new monitoring/ alerting solution. You will need to disable monitoring/ alerting/notifiers in Cluster Manager before deploying the new monitoring solution via Cluster Explorer.

-This change allows Rancher to automatically support new features of the Prometheus operator API. Now all of the features exposed by the upstream Prometheus operator are available in the monitoring application, and you have more flexibility to configure monitoring.
+For more information about upgrading the Monitoring app in Rancher 2.5, please refer to the [migration docs](../migrating). 

-Previously, you would use the Rancher UI to configure monitoring. The Rancher UI created CRDs that were maintained by Rancher and updated the Prometheus state. In Rancher v2.5, you directly create CRDs for the monitoring application, and those CRDs are exposed in the Rancher UI.
+For the docs about monitoring for earlier Rancher versions, refer to [this section.](../legacy)

-The differences between Rancher's monitoring feature and the upstream Prometheus operator can be found in the [changelog.](https://github.com/rancher/charts/blob/dev-v2.5/packages/rancher-monitoring/overlay/CHANGELOG.md)
+> Before enabling monitoring, be sure to review the resource requirements. The default values in [this section](#setting-resource-limits-and-requests) are the minimum required resource limits and requests.

-# About Prometheus
+- [Monitoring Components](#monitoring-components)
+  - [Prometheus](#about-prometheus)
+  - [Grafana](#about-grafana)
+  - [Alertmanager](#about-alertmanager)
+  - [Prometheus Operator](#about-prometheus-operator)
+  - [Prometheus Adapter](#about-prometheus-adapter)
+- [Enable Monitoring](#enable-monitoring)
+  - [Default Alerts, Targets and Grafana Dashboards](#default-alerts-targets-and-grafana-dashboards)
+- [Using Monitoring](#using-monitoring)
+  - [Grafana UI](#grafana-ui)
+  - [Prometheus UI](#prometheus-ui)
+  - [Viewing the Prometheus Targets](#viewing-the-prometheus-targets)
+  - [Viewing the Prometheus Rules](#viewing-the-prometheus-rules)
+  - [Viewing Active Alerts in Alertmanager](#viewing-active-alerts-in-alertmanager)
+- [Uninstall Monitoring](#uninstall-monitoring)
+- [Setting Resource Limits and Requests](#setting-resource-limits-and-requests)
+- [Known Issues](#known-issues)

-Prometheus provides a _time series_ of your data, which is, according to [Prometheus documentation](https://prometheus.io/docs/concepts/data_model/):
+# Monitoring Components

->A stream of timestamped values belonging to the same metric and the same set of labeled dimensions, along with comprehensive statistics and metrics of the monitored cluster.
+The `rancher-monitoring` operator is powered by Prometheus, Grafana, Alertmanager, the Prometheus Operator, and the Prometheus adapter.

-In other words, Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Using timestamps, Prometheus lets you query and view these metrics in easy-to-read graphs and visuals, either through the Rancher UI or [Grafana](https://grafana.com/), which is an analytics viewing platform deployed along with Prometheus.
+### About Prometheus
+
+Prometheus provides a time series of your data, which is, according to the [Prometheus documentation:](https://prometheus.io/docs/concepts/data_model/)
+
+> A stream of timestamped values belonging to the same metric and the same set of labeled dimensions, along with comprehensive statistics and metrics of the monitored cluster.
+
+In other words, Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Using timestamps, Prometheus lets you query and view these metrics in easy-to-read graphs and visuals, either through the Rancher UI or Grafana, which is an analytics viewing platform deployed along with Prometheus.

 By viewing data that Prometheus scrapes from your cluster control plane, nodes, and deployments, you can stay on top of everything happening in your cluster. You can then use these analytics to better run your organization: stop system emergencies before they start, develop maintenance strategies, restore crashed servers, etc.

+### About Grafana
+
+[Grafana](https://grafana.com/grafana/) allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
+
 # Enabling Cluster Monitoring

 As an [administrator]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/global-permissions/) or [cluster owner]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.

-> **Prerequisite:** Make sure that you are allowing traffic on port 9796 for each of your nodes because Prometheus will scrape metrics from here.
+> If you want to set up Alertmanager, Grafana or Ingress, it has to be done with the settings on the Helm chart deployment. It's problematic to create Ingress outside the deployment.

-> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
+> **Prerequisites:**
+> 
+> - Make sure that you are allowing traffic on port 9796 for each of your nodes because Prometheus will scrape metrics from here.
+> - Make sure your cluster fulfills the resource requirements. The cluster should have at least 1950Mi memory available, 2700m CPU, and 50Gi storage. A breakdown of the resource limits and requests is [here.](#resource-requirements)

-# Configuration
+1. In the Rancher UI, go to the cluster where you want to install monitoring and click **Cluster Explorer.**
+1. Click **Apps.**
+1. Click the `rancher-monitoring` app.
+1. Optional: Click **Chart Options** and configure alerting, Prometheus and Grafana. For help, refer to the [configuration reference.](../configuration)
+1. Scroll to the bottom of the Helm chart README and click **Install.**

-For information on configuring custom Prometheus metrics and alerting rules, refer to the upstream documentation for the [Prometheus operator.](https://github.com/prometheus-operator/prometheus-operator) This documentation can help you set up RBAC, Thanos, or custom configuration.
+**Result:** The monitoring app is deployed in the `cattle-monitoring-system` namespace.

-To create an additional scrape configuration, refer to [this page.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md)
+### Default Alerts, Targets and Grafana Dashboards

-# Examples
+By default, Rancher Monitoring deploys exporters (such as [node-exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)) as well as default Prometheus alerts and Grafana dashboards (curated by the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project) onto a cluster.

-### Create ServiceMonitor Custom Resource
+To see the default alerts, go to the [Alertmanager UI](#alertmanager-ui) and click **Expand all groups.**

-An example ServiceMonitor custom resource can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml) 
+To see what services you are monitoring, you will need to see your targets. To view the default targets, refer to [Viewing the Prometheus Targets.](#viewing-the-prometheus-targets)

-### PodMonitor
+To see the default dashboards, go to the [Grafana UI.](#grafana-ui) In the left navigation bar, click the icon with four boxes and click **Manage.**

-An example PodMonitor can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/user-guides/getting-started/example-app-pod-monitor.yaml) and an example Prometheus resource that refers to it can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/user-guides/getting-started/prometheus-pod-monitor.yaml)
+### Next Steps

-### PrometheusRule
+To configure Prometheus resources from the Rancher UI, click **Apps & Marketplace > Monitoring** in the upper left corner.

-Prometheus rule files are held in PrometheusRule custom resources. Use the label selector field ruleSelector in the Prometheus object to define the rule files that you want to be mounted into Prometheus. An example PrometheusRule is on [this page.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md)
+# Using Monitoring

-### Alertmanager Config
+Installing `rancher-monitoring` makes the following dashboards available from the Rancher UI.

-The Prometheus Operator introduces an Alertmanager resource, which allows users to declaratively describe an Alertmanager cluster.
+### Grafana UI

-The upstream Prometheus documentation includes information on how to [set up](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md) and [configure](https://prometheus.io/docs/alerting/latest/configuration/) Alertmanager.
+Rancher allows any users who are authenticated by Kubernetes and have access the Grafana service deployed by the Rancher Monitoring chart to access Grafana via the Rancher Dashboard UI. By default, all users who are able to access Grafana are given the [Viewer](https://grafana.com/docs/grafana/latest/permissions/organization_roles/#viewer-role) role, which allows them to view any of the default dashboards deployed by Rancher.

-### Configuring a Persistent Grafana Dashboard
+However, users can choose to log in to Grafana as an [Admin](https://grafana.com/docs/grafana/latest/permissions/organization_roles/#admin-role) if necessary. The default Admin username and password for the Grafana instance will be `admin`/`prom-operator`, but alternative credentials can also be supplied on deploying or upgrading the chart.

-To allow the Grafana dashboard to persist after it restarts, you will need to add the configuration JSON into a ConfigMap.
+To see the Grafana UI, install `rancher-monitoring`. Then go to the **Cluster Explorer.** In the top left corner, click **Cluster Explorer > Monitoring.** Then click **Grafana.

-You can add this configuration to the ConfigMap using the Rancher UI.
+<figcaption>Cluster Compute Resources Dashboard in Grafana</figcaption>
+![Cluster Compute Resources Dashboard in Grafana]({{<baseurl>}}/img/rancher/cluster-compute-resources-dashboard.png)

-### Configuring Grafana to Use Multiple Data Sources
+<figcaption>Default Dashboards in Grafana</figcaption>
+![Default Dashboards in Grafana]({{<baseurl>}}/img/rancher/default-grafana-dashboards.png)

+To allow the Grafana dashboard to persist after it restarts, you will need to add the configuration JSON into a ConfigMap. You can add this configuration to the ConfigMap using the Rancher UI.
+
+### Prometheus UI
+
+To see the Prometheus UI, install `rancher-monitoring`. Then go to the **Cluster Explorer.** In the top left corner, click **Cluster Explorer > Monitoring.** Then click **Prometheus Graph.**
+
+<figcaption>Prometheus Graph UI</figcaption>
+![Prometheus Graph UI]({{<baseurl>}}/img/rancher/prometheus-graph-ui.png)
+
+### Viewing the Prometheus Targets
+
+To see the Prometheus Targets, install `rancher-monitoring`. Then go to the **Cluster Explorer.** In the top left corner, click **Cluster Explorer > Monitoring.** Then click **Prometheus Targets.**
+
+<figcaption>Targets in the Prometheus UI</figcaption>
+![Prometheus Targets UI]({{<baseurl>}}/img/rancher/prometheus-targets-ui.png)
+
+### Viewing the Prometheus Rules
+
+To see the Prometheus Rules, install `rancher-monitoring`. Then go to the **Cluster Explorer.** In the top left corner, click **Cluster Explorer > Monitoring.** Then click **Prometheus Rules.**
+
+<figcaption>Rules in the Prometheus UI</figcaption>
+![Prometheus Rules UI]({{<baseurl>}}/img/rancher/prometheus-rules-ui.png)
+
+### Viewing Active Alerts in Alertmanager
+
+When `rancher-monitoring` is installed, the Prometheus Alertmanager UI is deployed.
+
+The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.
+
+In the Alertmanager UI, you can view your alerts and the current Alertmanager configuration.
+
+To see the Prometheus Rules, install `rancher-monitoring`. Then go to the **Cluster Explorer.** In the top left corner, click **Cluster Explorer > Monitoring.** Then click **Alertmanager.**
+
+**Result:** The Alertmanager UI opens in a new tab. For help with configuration, refer to the [official Alertmanager documentation.](https://prometheus.io/docs/alerting/latest/alertmanager/)
+
+<figcaption>The Alertmanager UI</figcaption>
+![Alertmanager UI]({{<baseurl>}}/img/rancher/alertmanager-ui.png)
+
+# Uninstall Monitoring
+
+1. From the **Cluster Explorer,** click Apps & Marketplace.
+1. Click **Installed Apps.**
+1. Go to the `cattle-monitoring-system` namespace and check the boxes for `rancher-monitoring-crd` and `rancher-monitoring`.
+1. Click **Delete.**
+1. Confirm **Delete.**
+
+**Result:** `rancher-monitoring` is uninstalled.
+
+# Setting Resource Limits and Requests
+
+The resource requests and limits can be configured when installing `rancher-monitoring`.
+
+The default values are in the [values.yaml](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/values.yaml) in the `rancher-monitoring` Helm chart.
+
+The default values in the table below are the minimum required resource limits and requests.
+
+| Resource Name | Memory Limit | CPU Limit | Memory Request | CPU Request |
+| ------------- | ------------ | ----------- | ---------------- | ------------------ |
+| alertmanager | 500Mi | 1000m | 100Mi |  100m |
+| grafana | 200Mi | 200m | 100Mi | 100m |
+| kube-state-metrics subchart | 200Mi  | 100m | 130Mi | 100m |
+| prometheus-node-exporter subchart | 50Mi | 200m | 30Mi | 100m |
+| prometheusOperator | 500Mi | 200m | 100Mi | 100m |
+| prometheus | 2500Mi | 1000m | 1750Mi | 750m |
+| **Total**                 | **3950Mi** | **2700m** | **2210Mi** | **1250m** |
+
+At least 50Gi storage is recommended.
+
+
+<<<<<<< HEAD
+<<<<<<< HEAD
+<<<<<<< HEAD
 The data from Prometheus is used as the data source for the Grafana dashboard. Multiple data sources can be configured for Grafana.
+=======
+For more information about using the Promethus adapter, refer to this [documentation.](https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/config-walkthrough.md)
+>>>>>>> Update monitoring docs
+=======
+>>>>>>> Revise monitoring docs
+=======
+# Known Issues
+
+There is a [known issue](https://github.com/rancher/rancher/issues/28787#issuecomment-693611821) that K3s clusters require more default memory. If you are enabling monitoring on a K3s cluster, we recommend to setting `prometheus.prometheusSpec.resources.memory.limit` to 2500Mi` and `prometheus.prometheusSpec.resources.memory.request` to 1750Mi.
+>>>>>>> Add Arvind's changes to monitoring docs
@@ -0,0 +1,132 @@
+---
+title: Configuration
+weight: 3
+---
+
+This page captures some of the most important options for configuring the custom resources for monitoring.
+
+For information on configuring custom scrape targets and rules for Prometheus, please refer to the upstream documentation for the [Prometheus Operator.](https://github.com/prometheus-operator/prometheus-operator) Some of the most important custom resources are explained in the Prometheus Operator [design documentation.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md) The Prometheus Operator documentation can help also you set up RBAC, Thanos, or custom configuration. 
+
+- [Configuring Prometheus](#configuring-prometheus)
+- [Configuring Targets with ServiceMonitors and PodMonitors](#configuring-targets-with-servicemonitors-and-podmonitors)
+  - [ServiceMonitors](#servicemonitors)
+  - [PodMonitors](#podmonitors)
+  - [PrometheusRules](#prometheusrules)
+  - [Alertmanager Config](#alertmanager-config)
+- [Trusted CA for Notifiers](#trusted-ca-for-notifiers)
+- [Additional Scrape Configurations](#additional-scrape-configurations)
+- [Examples](#examples)
+
+# Configuring Prometheus
+
+The primary way that users will be able to customize this feature for specific Monitoring and Alerting use cases is by creating and/or modifying ConfigMaps, Secrets, and Custom Resources pertaining to this deployment.
+
+Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster.
+
+Prometheus Operator will automatically update your Prometheus configuration based on the live state of these custom resources.
+
+There are also certain special types of ConfigMaps/Secrets such as those corresponding to Grafana Dashboards, Grafana Datasources, and Alertmanager Configs that will automatically update your Prometheus configuration via sidecar proxies that observe the live state of those resources within your cluster.
+
+By default, a set of these resources (curated by the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project) are deployed onto your cluster as part of installing the Rancher Monitoring Application to set up a basic Monitoring / Alerting stack. For more information how to configure custom targets, alerts, notifiers, and dashboards after deploying the chart, see below.
+
+# Configuring Targets with ServiceMonitors and PodMonitors
+
+Customizing the scrape configuration used by Prometheus to determine which resources to scrape metrics from will primarily involve creating / modifying the following resources within your cluster:
+
+### ServiceMonitors
+
+This CRD declaratively specifies how groups of Kubernetes services should be monitored. Any Services in your cluster that match the labels located within the ServiceMonitor `selector` field will be monitored based on the `endpoints` specified on the ServiceMonitor. For more information on what fields can be specified, please look at the [spec](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#servicemonitor) provided by Prometheus Operator.
+
+For more information about how ServiceMonitors work, refer to the [Prometheus Operator documentation.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/running-exporters.md)
+
+### PodMonitors
+
+This CRD declaratively specifies how group of pods should be monitored. Any Pods in your cluster that match the labels located within the PodMonitor `selector` field will be monitored based on the `podMetricsEndpoints` specified on the PodMonitor. For more information on what fields can be specified, please look at the [spec](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#podmonitorspec) provided by Prometheus Operator.
+
+### PrometheusRules
+
+This CRD defines a group of Prometheus alerting and/or recording rules.
+
+To add a group of alerting / recording rules, you should create a PrometheusRule CR the defines a RuleGroup with your desired rules, each specifying:
+
+- The name of the new alert / record
+- A PromQL expression for the new alert / record
+- Labels that should be attached to the alert / record that identify it (e.g. cluster name or severity)
+- Annotations that encode any additional important pieces of information that need to be displayed on the notification for an alert (e.g. summary, description, message, runbook URL, etc.). This field is not required for recording rules.
+
+For more information on what fields can be specified, please look at the [Prometheus Operator spec.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusrulespec)
+
+### Alertmanager Config
+
+The [Alertmanager Config](https://prometheus.io/docs/alerting/latest/configuration/#configuration-file) Secret contains the configuration of an Alertmanager instance that sends out notifications based on alerts it receives from Prometheus.
+
+By default, Rancher Monitoring deploys a single Alertmanager onto a cluster that uses a default Alertmanager Config Secret. As part of the chart deployment options, you can opt to increase the number of replicas of the Alertmanager deployed onto your cluster that can all be managed using the same underlying Alertmanager Config Secret.
+ 
+This Secret should be updated or modified any time you want to:
+ 
+- Add in new notifiers or receivers
+- Change the alerts that should be sent to specific notifiers or receivers
+- Change the group of alerts that are sent out
+
+> By default, you can either choose to supply an existing Alertmanager Config Secret (i.e. any Secret in the `cattle-monitoring-system` namespace) or allow Rancher Monitoring to deploy a default Alertmanager Config Secret onto your cluster. By default, the Alertmanager Config Secret created by Rancher will never be modified / deleted on an upgrade / uninstall of the `rancher-monitoring` chart to prevent users from losing or overwriting their alerting configuration when executing operations on the chart.
+ 
+For more information on what fields can be specified in this secret, please look at the [Prometheus Alertmanager docs](https://prometheus.io/docs/alerting/latest/alertmanager/)
+
+The full spec for the Alertmanager configuration file and what it takes in can be found [here.](https://prometheus.io/docs/alerting/latest/configuration/#configuration-file)
+
+The notification integrations are configured with the `receiver`, which is documented [here.](https://prometheus.io/docs/alerting/latest/configuration/#receiver)
+
+For more information, refer to the [official Prometheus documentation about configuring routes.](https://www.prometheus.io/docs/alerting/latest/configuration/#route)
+
+# Trusted CA for Notifiers
+
+If you need to add a trusted CA to your notifier, follow these steps:
+
+1. Create the `cattle-monitoring-system` namespace.
+1. Add your trusted CA secret to the `cattle-monitoring-system` namespace.
+1. Deploy or upgrade the `rancher-monitoring` Helm chart. In the chart options, reference the secret in **Alerting > Additional Secrets.**
+
+**Result:** The default Alertmanager custom resource will have access to your trusted CA.
+
+# Additional Scrape Configurations
+
+If the scrape configuration you want cannot be specified via a ServiceMonitor or PodMonitor at the moment, you can provide an `additionalScrapeConfigSecret` on deploying or upgrading `rancher-monitoring`.
+
+A [scrape_config section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) specifies a set of targets and parameters describing how to scrape them. In the general case, one scrape configuration specifies a single job.
+
+An example of where this might be used is with Istio. For more information, see [this section.](https://rancher.com/docs/rancher/v2.x/en/istio/setup/enable-istio-in-cluster/#selectors-scrape-configs)
+
+# Examples
+
+### ServiceMonitor
+
+An example ServiceMonitor custom resource can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml) 
+
+### PodMonitor
+
+An example PodMonitor can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/user-guides/getting-started/example-app-pod-monitor.yaml) An example Prometheus resource that refers to it can be found [here.](https://github.com/prometheus-operator/prometheus-operator/blob/master/example/user-guides/getting-started/prometheus-pod-monitor.yaml)
+
+### PrometheusRule
+
+Prometheus rule files are held in PrometheusRule custom resources. Use the label selector field ruleSelector in the Prometheus object to define the rule files that you want to be mounted into Prometheus. An example PrometheusRule is on [this page.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md)
+
+### Alertmanager Config
+
+To set up notifications via Slack, the following Alertmanager Config YAML should be placed into the `alertmanager.yaml` key of the Alertmanager Config Secret, where the `api_url` should be updated to use your Webhook URL from Slack:
+
+```yaml
+route:  
+  group_by: ['job']
+  group_wait: 30s
+  group_interval: 5m
+  repeat_interval: 3h 
+  receiver: 'slack-notifications'
+receivers:
+- name: 'slack-notifications'
+  slack_configs:
+  - send_resolved: true
+    text: '{{ template "slack.rancher.text" . }}'
+    api_url: <user-provided slack webhook url here>
+templates:
+- /etc/alertmanager/config/*.tmpl
+```
@@ -0,0 +1,32 @@
+---
+title: Migrating to Rancher v2.5 Monitoring
+weight: 5
+---
+
+If you previously enabled Monitoring, Alerting, or Notifiers in Rancher prior to v2.5, there is no upgrade path for switching to the new monitoring/alerting solution. You will need to disable monitoring/alerting/notifiers in the same way it was disabled in Rancher v2.4 before deploying the new monitoring solution via Cluster Explorer. 
+
+### Monitoring Prior to Rancher v2.5
+
+As of v2.2.0, Rancher's Cluster Manager allowed users to enable Monitoring & Alerting V1 (both powered by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)) independently within a cluster. For more information on how to configure Monitoring & Alerting V1, see the [docs about monitoring prior to Rancher v2.5](/rancher/v2.x/en/monitoring-alerting/legacy).
+
+When Monitoring is enabled, Monitoring V1 deploys [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/) onto a cluster to monitor the state of processes of your cluster nodes, Kubernetes components, and software deployments and create custom dashboards to make it easy to visualize collected metrics.
+
+Monitoring V1 could be configured on both a cluster-level and on a project-level and would automatically scrape certain workloads deployed as Apps on the Rancher cluster.
+
+When Alerts or Notifiers are enabled, Alerting V1 deploys [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) and a set of Rancher controllers onto a cluster that allows users to define alerts and configure alert-based notifications via Email, Slack, PagerDuty, etc. Users can choose to create different types of alerts depending on what needs to be monitored (e.g. System Services, Resources, CIS Scans, etc.); however, PromQL Expression-based alerts can only be created if Monitoring V1 is enabled.
+
+### Monitoring/Alerting via Cluster Explorer in Rancher 2.5
+
+As of v2.5.0, Rancher's Cluster Explorer now allows users to enable Monitoring & Alerting V2 (both powered by [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)) together within a cluster. 
+
+Unlike in Monitoring & Alerting V1, both features are packaged in a single Helm chart found [here](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring). The behavior of this chart and configurable fields closely matches [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack), a Prometheus Community Helm chart, and any deviations from the upstream chart can be found in the [CHANGELOG.md](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/CHANGELOG.md) maintained with the chart.
+
+Monitoring V2 can only be configured on the cluster level. Project-level monitoring and alerting is no longer supported.
+
+For more information on how to configure Monitoring & Alerting V2, see the [docs for monitoring in Rancher v2.5](/rancher/v2.x/en/monitoring-alerting).
+
+### Changes to Role-based Access Control
+
+Project owners and members no longer get access to Grafana or Prometheus by default. If view-only users had access to Grafana, they would be able to see data from any namespace. For Kiali, any user can edit things they don’t own in any namespace.
+
+For more information about role-based access control in `rancher-monitoring`, refer to [this page.](../rbac)
@@ -4,14 +4,11 @@ weight: 3
 aliases:
  - /rancher/v2.x/en/cluster-admin/tools/monitoring/rbac
 ---
+This section describes the expectations for RBAC for Rancher Monitoring.

-This section describes the permissions required to access Monitoring features.
+## Cluster Admins

-The `rancher-monitoring` chart installs three `ClusterRoles`.
-
-# Cluster-Admin Access
-
-By default, only those with the cluster-admin `ClusterRole` can:
+By default, only those with the cluster-admin `ClusterRole` should be able to:

 - Install the `rancher-monitoring` App onto a cluster and all other relevant configuration performed on the chart deploy
  - e.g. whether default dashboards are created, what exporters are deployed onto the cluster to collect metrics, etc.
@@ -20,44 +17,83 @@ By default, only those with the cluster-admin `ClusterRole` can:
 - Persist new Grafana dashboards or datasources via creating ConfigMaps in the appropriate namespace
 - Expose certain Prometheus metrics to the k8s Custom Metrics API for HPA via a Secret in the `cattle-monitoring-system` namespace

-## Admin and Edit access
+## Users with k8s ClusterRole-based Permissions

-By default, only Admin and Edit roles can:
+The `rancher-monitoring` chart installs the following three `ClusterRoles`. By default, they aggregate into the corresponding k8s `ClusterRoles`:
+
+| ClusterRole | Aggregates To Default K8s ClusterRole  |
+| ------------------------------| ---------------------------|
+| `monitoring-admin` | `admin`|
+| `monitoring-edit` | `edit` |
+| `monitoring-view` | `view ` |
+
+These `ClusterRoles` provide different levels of access to the Monitoring CRDs based on the actions that can be performed:
+
+| CRDs (monitoring.coreos.com) | Admin | Edit | View |
+| ------------------------------| ---------------------------| ---------------------------| ---------------------------|
+| <ul><li>`prometheuses`</li><li>`alertmanagers`</li></ul>| Get, List, Watch | Get, List, Watch | Get, List, Watch |
+| <ul><li>`servicemonitors`</li><li>`podmonitors`</li><li>`prometheusrules`</li></ul>| * | * | Get, List, Watch |
+
+On a high level, the following permissions are assigned by default as a result.
+
+### Users with k8s Admin / Edit Permissions
+
+Only those with the the cluster-admin / admin / edit `ClusterRole` should be able to:

- View the configuration of Prometheuses that are deployed within the cluster
- View the configuraiton of Alertmanagers that are deployed within the cluster
 - Modify the scrape configuration of Prometheus deployments via ServiceMonitor and PodMonitor CRs
 - Modify the alerting / recording rules of a Prometheus deployment via PrometheusRules CRs

-# Summary of Default Permissions for Kubernetes Default Roles
+### Users with k8s View Permissions

-Monitoring creates three `ClusterRoles` and adds Monitoring CRD access to the following default K8s `ClusterRoles`:
+Only those with who have some k8s `ClusterRole` should be able to:

-| ClusterRole created by chart | Default K8s ClusterRole  | 
+- View the configuration of Prometheuses that are deployed within the cluster
+- View the configuraiton of Alertmanagers that are deployed within the cluster
+- View the scrape configuration of Prometheus deployments via ServiceMonitor and PodMonitor CRs
+- View the alerting / recording rules of a Prometheus deployment via PrometheusRules CRs
+
+## Additional Monitoring Roles
+
+Monitoring also creates six additional `Roles` that are not assigned to users by default but are created within the cluster. Admins should use these roles to provide more fine-grained access to users:
+
+| Role | Purpose  |
 | ------------------------------| ---------------------------|
-| `monitoring-admin` | `admin`|
-| `monitoring-edit`| `edit` |
-| `monitoring-view` | `view `| 
+| monitoring-config-admin | Allow admins to assign roles to users to be able to view / modify Secrets and ConfigMaps within the cattle-monitoring-system namespace. Modifying Secrets / ConfigMaps in this namespace could allow users to alter the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
+| monitoring-config-edit | Allow admins to assign roles to users to be able to view / modify Secrets and ConfigMaps within the cattle-monitoring-system namespace. Modifying Secrets / ConfigMaps in this namespace could allow users to alter the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
+| monitoring-config-view | Allow admins to assign roles to users to be able to view Secrets and ConfigMaps within the cattle-monitoring-system namespace. Viewing Secrets / ConfigMaps in this namespace could allow users to observe the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
+| monitoring-dashboard-admin | Allow admins to assign roles to users to be able to edit / view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |
+| monitoring-dashboard-edit | Allow admins to assign roles to users to be able to edit / view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |
+| monitoring-dashboard-view | Allow admins to assign roles to users to be able to view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |

-Rancher will continue to use cluster-owner, cluster-member, project-owner, project-member, etc as role names, but will utilize default roles to determine access. For each default K8s `ClusterRole` there are different Istio CRD permissions and K8s actions (Create (C), Get (G), List (L), Update (U), Patch (P), Delete(D), All (*)) that can be performed. 
+## Users with Rancher Cluster Manager Based Permissions

+The relationship between the default roles deployed by Rancher Cluster Manager (i.e. cluster-owner, cluster-member, project-owner, project-member), the default k8s roles, and the roles deployed by the rancher-monitoring chart are detailed in the table below:

-|CRDs                        | Admin | Edit | View | 
-|----------------------------| ------| -----| -----|
-| <ul><li>`monitoring.coreos.com`</li><ul><li>`prometheuses`</li><li>`alertmanagers`</li></ul></ul>| GLW | GLW | GLW|
-| <ul><li>`monitoring.coreos.com`</li><ul><li>`servicemonitors`</li><li>`podmonitors`</li><li>`prometheusrules`</li></ul></ul>| * | * | GLW|
+| Cluster Manager Role | k8s Role | Monitoring ClusterRole / Role | ClusterRoleBinding or RoleBinding? |
+| --------- | --------- | --------- | --------- |
+| cluster-owner | cluster-admin | N/A | ClusterRoleBinding |
+| cluster-member | admin | monitoring-admin | ClusterRoleBinding |
+| project-owner | edit | monitoring-admin | RoleBinding within Project namespace |
+| project-member | view | monitoring-edit | RoleBinding within Project namespace |

-# Additional Roles
+### Differences in 2.5.x

-Monitoring also creates six `Roles` to enable admins to assign more fine-grained access to monitoring within a cluster:
+Users with the project-member or project-owners roles assigned will not be given access to either Prometheus or Grafana in Rancher 2.5.x since we only create Grafana or Prometheus on a cluster-level.

-| Role created by chart | Purpose  | 
-| ------------------------------| ---------------------------|
-monitoring-config-admin | Allow admins to assign roles to users to be able to view / modify Secrets and ConfigMaps within the cattle-monitoring-system namespace. Modifying Secrets / ConfigMaps in this namespace could allow users to alter the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
-monitoring-config-edit | Allow admins to assign roles to users to be able to view / modify Secrets and ConfigMaps within the cattle-monitoring-system namespace. Modifying Secrets / ConfigMaps in this namespace could allow users to alter the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
-monitoring-config-view | Allow admins to assign roles to users to be able to view Secrets and ConfigMaps within the cattle-monitoring-system namespace. Viewing Secrets / ConfigMaps in this namespace could allow users to observe the cluster's Alertmanager configuration, Prometheus Adapter configuration, additional Grafana datasources, TLS secrets, etc. |
-monitoring-dashboard-admin | Allow admins to assign roles to users to be able to edit / view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |
-monitoring-dashboard-edit | Allow admins to assign roles to users to be able to edit / view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |
-monitoring-dashboard-view | Allow admins to assign roles to users to be able to view ConfigMaps within the cattle-dashboards namespace. ConfigMaps in this namespace will correspond to Grafana Dashboards that are persisted onto the cluster. |
+In addition, while project owners will still be only able to add ServiceMonitors / PodMonitors that scrape resources within their project's namespace by default, PrometheusRules are not scoped to a single namespace / project. Therefore, any alert rules or recording rules created by project-owners within their project namespace will be applied across the entire cluster, although they will be unable to view / edit / delete any rules that were created outside the project's namespace.

-These Roles are not assigned by default but will be created in the cluster.
+### Assigning Additional Access
+
+If cluster-admins would like to provide additional admin/edit access to users outside of the roles offered by the rancher-monitoring chart, the following table identifies the potential impact:
+
+|CRDs (monitoring.coreos.com) | Can it cause impact outside of a namespace / project? | Impact |
+|----------------------------| ------| ----------------------------|
+| `prometheuses`| Yes, this resource can scrape metrics from any targets across the entire cluster (unless the Operator itself is otherwise configured). | User will be able to define the configuration of new cluster-level Prometheus deployments that should be created in the cluster. |
+| `alertmanagers`| No | User will be able to define the configuration of new cluster-level Alertmanager deployments that should be created in the cluster. Note: if you just want to allow users to configure settings like Routes and Receivers, you should just provide access to the Alertmanager Config Secret instead. |
+| <ul><li>`servicemonitors`</li><li>`podmonitors`</li></ul>| No, not by default; this is configurable via `ignoreNamespaceSelectors` on the Prometheus CR. | User will be able to set up scrapes by Prometheus on endpoints exposed by Services / Pods within the namespace they are given this permission in. |
+| `prometheusrules`| Yes, PrometheusRules are cluster-scoped. | User will be able to define alert or recording rules on Prometheus based on any series collected across the entire cluster. |
+
+| k8s Resources | Namespace | Can it cause impact outside of a namespace / project? | Impact |
+|----------------------------| ------| ------| ----------------------------|
+| <ul><li>`secrets`</li><li>`configmaps`</li></ul>| `cattle-monitoring-system` | Yes, Configs and Secrets in this namespace can impact the entire monitoring / alerting pipeline. | User will be able to create or edit Secrets / ConfigMaps such as the Alertmanager Config, Prometheus Adapter Config, TLS secrets, additional Grafana datasoruces, etc. This can have broad impact on all cluster monitoring / alerting. |
+| <ul><li>`secrets`</li><li>`configmaps`</li></ul>| `cattle-dashboards` | Yes, Configs and Secrets in this namespace can create dashboards that make queries on all metrics collected at a cluster-level. | User will be able to create Secrets / ConfigMaps that persist new Grafana Dashboards only. |
@@ -23,22 +23,23 @@ Given the following:
 The corresponding configuration for the provider would then be as follows:

 ```yaml
-(...)
-cloud_provider:
-  name: vsphere
-  vsphereCloudProvider:
-    virtual_center:
-      vc.example.com:
-        user: provisioner
-        password: secret
-        port: 443
-        datacenters: /eu-west-1
-    workspace:
-      server: vc.example.com
-      folder: /eu-west-1/folder/myvmfolder
-      default-datastore: /eu-west-1/datastore/ds-1
-      datacenter: /eu-west-1
-      resourcepool-path: /eu-west-1/host/hn1/resources/myresourcepool
+rancher_kubernetes_engine_config:
+  (...)
+  cloud_provider:
+    name: vsphere
+    vsphereCloudProvider:
+      virtual_center:
+        vc.example.com:
+          user: provisioner
+          password: secret
+          port: 443
+          datacenters: /eu-west-1
+      workspace:
+        server: vc.example.com
+        folder: myvmfolder
+        default-datastore: /eu-west-1/datastore/ds-1
+        datacenter: /eu-west-1
+        resourcepool-path: /eu-west-1/host/hn1/resources/myresourcepool

 ```
 # Configuration Options