Edit docs on routes, receivers and PrometheusRules

This commit is contained in:
Catherine Luse
2020-12-15 23:09:33 -07:00
parent ebfd135caa
commit cab6c790aa
5 changed files with 51 additions and 33 deletions
@@ -63,7 +63,7 @@ Set up a notifier so that you can begin configuring and sending alerts.
| URL | From Slack, create a webhook. For instructions, see the [Slack Documentation](https://get.slack.help/hc/en-us/articles/115005265063-Incoming-WebHooks-for-Slack). Then enter the Slack webhook URL. |
| Default Channel | Enter the name of the channel that you want to send alert notifications in the following format: `#<channelname>`. Both public and private channels are supported. |
| Proxy URL | Proxy for the Slack webhook. |
| Send Resolved Alerts | _Available as of v2.3.0_ When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | _Available as of v2.3.0_ Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test**. If the test is successful, the Slack channel you're configuring for the notifier outputs **Slack setting validated.**
@@ -73,7 +73,7 @@ Set up a notifier so that you can begin configuring and sending alerts.
|----------|----------------------|
| Name | Enter a **Name** for the notifier. |
| Default Recipient Address | Enter the email address that you want to receive the notification. |
| Send Resolved Alerts | _Available as of v2.3.0_ When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | _Available as of v2.3.0_ Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
SMTP Server Configuration:
@@ -95,7 +95,7 @@ SMTP Server Configuration:
| Name | Enter a **Name** for the notifier. |
| Default Integration Key | From PagerDuty, create a Prometheus integration. For instructions, see the [PagerDuty Documentation](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/). Then enter the integration key.
| Service Key | The same as the integration key. For instructions on creating a Prometheus integration, see the [PagerDuty Documentation](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/). Then enter the integration key. |
| Send Resolved Alerts | _Available as of v2.3.0_ When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | _Available as of v2.3.0_ Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test**. If the test is successful, your PagerDuty endpoint outputs **PagerDuty setting validated.**
@@ -106,7 +106,7 @@ SMTP Server Configuration:
| Name | Enter a **Name** for the notifier. |
| URL | Using the app of your choice, create a webhook URL. |
| Proxy URL | Proxy for the webhook. |
| Send Resolved Alerts | _Available as of v2.3.0_ When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | _Available as of v2.3.0_ Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test**. If the test is successful, the URL you're configuring as a notifier outputs **Webhook setting validated.**
@@ -123,7 +123,7 @@ _Available as of v2.2.0_
| Recipient Type | Party, tag, or user. |
| Default Recipient | The default recipient ID should correspond to the recipient type. It should be the party ID, tag ID or user account that you want to receive the notification. You could get contact information from [Contacts page](https://work.weixin.qq.com/wework_admin/frame#contacts). |
| Proxy URL | If you are using a proxy, enter the proxy URL. |
| Send Resolved Alerts | _Available as of v2.3.0_ When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | _Available as of v2.3.0_ Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test.** If the test is successful, you should receive an alert message.
@@ -137,7 +137,7 @@ _Available as of v2.4.6_
| Webhook URL | Enter the DingTalk webhook URL. For help setting up the webhook, refer to the [DingTalk documentation.](https://www.alibabacloud.com/help/doc-detail/52872.htm) |
| Secret | Optional: Enter a secret for the DingTalk webhook. |
| Proxy URL | Optional: Enter a proxy for the DingTalk webhook. |
| Send Resolved Alerts | When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test.** If the test is successful, the DingTalk notifier output is **DingTalk setting validated.**
@@ -150,7 +150,7 @@ _Available as of v2.4.6_
| Name | Enter a **Name** for the notifier. |
| Webhook URL | Enter the Microsoft Teams webhook URL. For help setting up the webhook, refer to the [Teams Documentation.](https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook) |
| Proxy URL | Optional: Enter a proxy for the Teams webhook. |
| Send Resolved Alerts | When enabled, you will receive resolved alerts, such as an alert about high CPU usage after the CPU has returned to normal levels. |
| Send Resolved Alerts | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage) |
**Validation:** Click **Test.** If the test is successful, the Teams notifier output is **MicrosoftTeams setting validated.**
@@ -136,6 +136,8 @@ To see the Prometheus Rules, install `rancher-monitoring`. Then go to the **Clus
<figcaption>Rules in the Prometheus UI</figcaption>
![Prometheus Rules UI]({{<baseurl>}}/img/rancher/prometheus-rules-ui.png)
For more information on Prometheus Rules in Rancher, see [this page.](./configuration/prometheusrules)
### Viewing Active Alerts in Alertmanager
When `rancher-monitoring` is installed, the Prometheus Alertmanager UI is deployed.
@@ -148,6 +150,8 @@ To see the Prometheus Rules, install `rancher-monitoring`. Then go to the **Clus
**Result:** The Alertmanager UI opens in a new tab. For help with configuration, refer to the [official Alertmanager documentation.](https://prometheus.io/docs/alerting/latest/alertmanager/)
For more information on configuring Alertmanager in Rancher, see [this page.](./configuration/alertmanager)
<figcaption>The Alertmanager UI</figcaption>
![Alertmanager UI]({{<baseurl>}}/img/rancher/alertmanager-ui.png)
@@ -85,7 +85,9 @@ An example PodMonitor can be found [here.](https://github.com/prometheus-operato
### PrometheusRule
Prometheus rule files are held in PrometheusRule custom resources. Use the label selector field ruleSelector in the Prometheus object to define the rule files that you want to be mounted into Prometheus. An example PrometheusRule is on [this page.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md)
Prometheus rule files are held in PrometheusRule custom resources. For users who are familiar with Prometheus, a PrometheusRule contains the alerting and recording rules that you would normally place in a [Prometheus rule file](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/).
Use the label selector field ruleSelector in the Prometheus object to define the rule files that you want to be mounted into Prometheus. An example PrometheusRule is on [this page.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/alerting.md)
### Alertmanager Config
@@ -6,6 +6,7 @@ weight: 1
The [Alertmanager Config](https://prometheus.io/docs/alerting/latest/configuration/#configuration-file) Secret contains the configuration of an Alertmanager instance that sends out notifications based on alerts it receives from Prometheus.
- [Overview](#overview)
- [Connecting Routes and PrometheusRules](#connecting-routes-and-prometheusrules)
- [Creating Receivers in the Rancher UI](#creating-receivers-in-the-rancher-ui)
- [Receiver Configuration](#receiver-configuration)
- [Slack](#slack)
@@ -38,10 +39,17 @@ The full spec for the Alertmanager configuration file and what it takes in can b
For more information, refer to the [official Prometheus documentation about configuring routes.](https://www.prometheus.io/docs/alerting/latest/configuration/#route)
### Connecting Routes and PrometheusRules
When you define a Rule within a RuleGroup of a PrometheusRule, the spec of the Rule itself contains labels that are used by Prometheus to figure out which Route should receive this Alert. For example, an Alert with the label `team: front-end` will be sent to all Routes that match on that label.
# Creating Receivers in the Rancher UI
_Available as of v2.5.4_
> **Prerequisite:** The monitoring application needs to be installed.
> **Prerequisites:**
>
>- The monitoring application needs to be installed.
>- If you configured monitoring with an existing Alertmanager Secret, it must have a format that is supported by Rancher's UI. Otherwise you will only be able to make changes based on modifying the Alertmanager Secret directly. Note: We are continuing to make enhancements to what kinds of Alertmanager Configurations we can support using the Routes and Receivers UI, so please [file an issue](https://github.com/rancher/rancher/issues/new) if you have a request for a feature enhancement.
To create notification receivers in the Rancher UI,
@@ -56,7 +64,7 @@ To create notification receivers in the Rancher UI,
The notification integrations are configured with the `receiver`, which is explained in the [Prometheus documentation.](https://prometheus.io/docs/alerting/latest/configuration/#receiver)
Rancher v2.5.4 introduced the capability to configure reducers by filling out forms in the Rancher UI.
Rancher v2.5.4 introduced the capability to configure receivers by filling out forms in the Rancher UI.
{{% tabs %}}
{{% tab "Rancher v2.5.4+" %}}
@@ -77,23 +85,23 @@ The custom receiver option can be used to configure any receiver in YAML that ca
| Field | Type | Description |
|------|--------------|------|
| URL | String | Enter your Slack webhook URL. For instructions to create a Slack webhook, see the [Slack documentation.](https://get.slack.help/hc/en-us/articles/115005265063-Incoming-WebHooks-for-Slack) |
| Default Channel | String | Enter the name of the channel that you want to send alert notifications in the following format: `#<channelname>` |
| Default Channel | String | Enter the name of the channel that you want to send alert notifications in the following format: `#<channelname>`. |
| Proxy URL | String | Proxy for the webhook notifications. |
| Enable send resolved alerts | Bool | When true, you will receive alerts through the notifier even if the alert condition is no longer true. For example, if an alert is triggered because your CPU is too high, you will still receive the alert after CPU goes back to normal levels. |
| Enable Send Resolved Alerts | Bool | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage). |
### Email
| Field | Type | Description |
|------|--------------|------|
| Default Recipient Address | String | The email address that will receive notifications. |
| Enable send resolved alerts | Bool | When true, you will receive alerts through the notifier even if the alert condition is no longer true. For example, if an alert is triggered because your CPU is too high, you will still receive the alert after CPU goes back to normal levels. |
| Enable Send Resolved Alerts | Bool | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage). |
SMTP options:
| Field | Type | Description |
|------|--------------|------|
| Sender | String | Enter an email address available on your SMTP mail server that you want to send the notification from. |
| Host | String | Enter the IP address or hostname for your SMTP server. Example: `smtp.email.com` |
| Host | String | Enter the IP address or hostname for your SMTP server. Example: `smtp.email.com`. |
| Use TLS | Bool | Use TLS for encryption. |
| Username | String | Enter a username to authenticate with the SMTP server. |
| Password | String | Enter a password to authenticate with the SMTP server. |
@@ -102,10 +110,10 @@ SMTP options:
| Field | Type | Description |
|------|------|-------|
| Integration Type | String | Events API v2 or Prometheus. |
| Integration Type | String | `Events API v2` or `Prometheus`. |
| Default Integration Key | String | For instructions to get an integration key, see the [PagerDuty documentation.](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/) |
| Proxy URL | String | Proxy for the PagerDuty notifications. |
| Enable send resolved alerts | Bool | When true, you will receive alerts through the notifier even if the alert condition is no longer true. For example, if an alert is triggered because your CPU is too high, you will still receive the alert after CPU goes back to normal levels. |
| Enable Send Resolved Alerts | Bool | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage). |
### Opsgenie
@@ -113,7 +121,7 @@ SMTP options:
|------|-------------|
| API Key | For instructions to get an API key, refer to the [Opsgenie documentation.](https://docs.opsgenie.com/docs/api-key-management) |
| Proxy URL | Proxy for the Opsgenie notifications. |
| Enable send resolved alerts | When true, you will receive alerts through the notifier even if the alert condition is no longer true. For example, if an alert is triggered because your CPU is too high, you will still receive the alert after CPU goes back to normal levels. |
| Enable Send Resolved Alerts | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage). |
Opsgenie Responders:
@@ -127,8 +135,8 @@ Opsgenie Responders:
| Field | Description |
|-------|--------------|
| URL | Webhook URL for the app of your choice. |
| Proxy URL | Proxy for the webhook notification |
| Enable send resolved alerts | When true, you will receive alerts through the notifier even if the alert condition is no longer true. For example, if an alert is triggered because your CPU is too high, you will still receive the alert after CPU goes back to normal levels. |
| Proxy URL | Proxy for the webhook notification. |
| Enable Send Resolved Alerts | Whether to send a follow-up notification if an alert has been resolved (e.g. [Resolved] High CPU Usage). |
### Custom
@@ -153,21 +161,21 @@ The route needs to refer to a [receiver](#receiver-configuration) that has alrea
| Field | Default | Description |
|-------|--------------|---------|
| Group By | N/a | The labels by which incoming alerts are grouped together. For example, `[ group_by: '[' <labelname>, ... ']' ]` Multiple alerts coming in for labels such as `cluster=A` and `alertname=LatencyHigh` can be batched into a single group. To aggregate by all possible labels, use the special value `'...'` as the sole label name, for example: `group_by: ['...']` Grouping by `...` effectively disables aggregation entirely, passing through all alerts as-is. This is unlikely to be what you want, unless you have a very low alert volume or your upstream notification system performs its own grouping.
| Group By | N/a | The labels by which incoming alerts are grouped together. For example, `[ group_by: '[' <labelname>, ... ']' ]` Multiple alerts coming in for labels such as `cluster=A` and `alertname=LatencyHigh` can be batched into a single group. To aggregate by all possible labels, use the special value `'...'` as the sole label name, for example: `group_by: ['...']` Grouping by `...` effectively disables aggregation entirely, passing through all alerts as-is. This is unlikely to be what you want, unless you have a very low alert volume or your upstream notification system performs its own grouping. |
| Group Wait | 30s | How long to wait to buffer alerts of the same group before sending initially. |
| Group Interval | 5m | How long to wait before sending an alert that has been added to a group of alerts for which an initial notification has already been sent. |
| Repeat Interval | 4h | How long to wait before re-sending a given alert that has already been sent. |
### Matching
The **Match** field refers to a set of equality matchers an alert has to fulfill to match the node. When you add key-value pairs to the Rancher UI, they correspond to the YAML in this format:
The **Match** field refers to a set of equality matchers used to identify which alerts to send to a given Route based on labels defined on that alert. When you add key-value pairs to the Rancher UI, they correspond to the YAML in this format:
```yaml
match:
[ <labelname>: <labelvalue>, ... ]
```
The **Match Regex** field refers to a set of regex-matchers an alert has to fulfill to match the node. When you add key-value pairs in the Rancher UI, they correspond to the YAML in this format:
The **Match Regex** field refers to a set of regex-matchers used to identify which alerts to send to a given Route based on labels defined on that alert. When you add key-value pairs in the Rancher UI, they correspond to the YAML in this format:
```yaml
match_re:
@@ -3,27 +3,28 @@ title: PrometheusRules
weight: 2
---
The PrometheusRules CRD defines a group of Prometheus alerting and/or recording rules.
A PrometheusRule defines a group of Prometheus alerting and/or recording rules.
- [About PrometheusRule Custom Resources](#about-prometheusrule-custom-resources)
- [Connecting Routes and PrometheusRules](#connecting-routes-and-prometheusrules)
- [Creating PrometheusRules in the Rancher UI](#creating-prometheusrules-in-the-rancher-ui)
- [Configuration](#configuration)
- [Rule Group](#rule-group)
- [Alerting Rules](#alerting-rules)
- [Recording Rules](#recording-rules)
# About PrometheusRule Custom Resources
### About PrometheusRule Custom Resources
Prometheus rule files are held in PrometheusRule custom resources.
The PrometheusRule custom resource defines a RuleGroup with your desired rules. Each specifies the following:
A PrometheusRule allows you to define one or more RuleGroups. Each RuleGroup consists of a set of alerting or recording rules with the following fields:
- The name of the new alert or record
- A PromQL (Prometheus query language) expression for the new alert or record
- Labels that should be attached to the alert or record that identify it (e.g. cluster name or severity)
- Annotations that encode any additional important pieces of information that need to be displayed on the notification for an alert (e.g. summary, description, message, runbook URL, etc.). This field is not required for recording rules.
Alerting rules define alert conditions based on PromQL queries, and recording rules precompute frequently needed or computationally expensive queries at defined intervals.
Alerting rules define alert conditions based on PromQL queries. Recording rules precompute frequently needed or computationally expensive queries at defined intervals.
For more information on what fields can be specified, please look at the [Prometheus Operator spec.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusrulespec)
@@ -31,8 +32,11 @@ Use the label selector field `ruleSelector` in the Prometheus object to define t
For examples, refer to the Prometheus documentation on [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) and [alerting rules.](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
### Connecting Routes and PrometheusRules
# Creating PrometheusRules in the Rancher UI
When you define a Rule within a RuleGroup of a PrometheusRule, the spec of the Rule itself contains labels that are used by Prometheus to figure out which Route should receive this Alert. For example, an Alert with the label `team: front-end` will be sent to all Routes that match on that label.
### Creating PrometheusRules in the Rancher UI
_Available as of v2.5.4_
@@ -43,7 +47,7 @@ To create rule groups in the Rancher UI,
1. Click **Cluster Explorer > Monitoring** and click **Prometheus Rules.**
1. Click **Create.**
1. Enter a **Group Name.**
1. Configure the rules. A rule group may contain either alert rules or recording rules, but not both. For help filling out the forms, refer to the configuration options below.
1. Configure the rules. In Rancher's UI, we expect a rule group to contain either alert rules or recording rules, but not both. For help filling out the forms, refer to the configuration options below.
1. Click **Create.**
**Result:** Alerts can be configured to send notifications to the receiver(s).
@@ -52,7 +56,7 @@ To create rule groups in the Rancher UI,
{{% tabs %}}
{{% tab "Rancher v2.5.4" %}}
Rancher v2.5.4 introduced the capability to configure reducers by filling out forms in the Rancher UI.
Rancher v2.5.4 introduced the capability to configure PrometheusRules by filling out forms in the Rancher UI.
### Rule Group
@@ -70,12 +74,12 @@ Rancher v2.5.4 introduced the capability to configure reducers by filling out fo
| Field | Description |
|-------|----------------|
| Alert Name | The name of the alert. Must be a valid label value. |
| Wait to fire for | Duration in seconds. Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. |
| PromQL Expression | The PromQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and all resultant time series become pending/firing alerts. For more information, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) or our [example PromQL expressions.](../expression) |
| Wait To Fire For | Duration in seconds. Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. |
| PromQL Expression | The PromQL expression to evaluate. Prometheus will evaluate the current value of this PromQL expression on every evaluation cycle and all resultant time series become pending/firing alerts. For more information, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) or our [example PromQL expressions.](../expression) |
| Labels | Labels to add or overwrite for each alert. |
| Severity | When enabled, labels are attached to the alert or record that identify it by the severity level. |
| Severity Label Value | Critical, warning, or none |
| Annotations | Annotations are a set of informational labels that can be used to store longer additional information, such as alert descriptions or runbook links. A [runbook](https://docs.gitlab.com/ee/user/project/clusters/runbooks/) is a set of documentation about how to handle alerts. The annotation values can be [templated.](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating) |
| Annotations | Annotations are a set of informational labels that can be used to store longer additional information, such as alert descriptions or runbook links. A [runbook](https://en.wikipedia.org/wiki/Runbook) is a set of documentation about how to handle alerts. The annotation values can be [templated.](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating) |
### Recording Rules
@@ -84,7 +88,7 @@ Rancher v2.5.4 introduced the capability to configure reducers by filling out fo
| Field | Description |
|-------|----------------|
| Time Series Name | The name of the time series to output to. Must be a valid metric name. |
| PromQL Expression | The PromQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and the result recorded as a new set of time series with the metric name as given by 'record'. For more information about expressions, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) or our [example PromQL expressions.](../expression) |
| PromQL Expression | The PromQL expression to evaluate. Prometheus will evaluate the current value of this PromQL expression on every evaluation cycle and the result recorded as a new set of time series with the metric name as given by 'record'. For more information about expressions, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) or our [example PromQL expressions.](../expression) |
| Labels | Labels to add or overwrite before storing the result. |
{{% /tab %}}