From 92e08b2f2adc5660b5c63fec63fdf7fc0f93ad1c Mon Sep 17 00:00:00 2001 From: Frank Mai Date: Wed, 13 Mar 2019 15:35:34 +0800 Subject: [PATCH] Update _index.md --- .../en/tools/notifiers-and-alerts/_index.md | 133 +++++++++--------- 1 file changed, 67 insertions(+), 66 deletions(-) diff --git a/content/rancher/v2.x/en/tools/notifiers-and-alerts/_index.md b/content/rancher/v2.x/en/tools/notifiers-and-alerts/_index.md index 2bc31fbfebd..d0faa79f7de 100644 --- a/content/rancher/v2.x/en/tools/notifiers-and-alerts/_index.md +++ b/content/rancher/v2.x/en/tools/notifiers-and-alerts/_index.md @@ -21,7 +21,7 @@ Rancher integrates with a variety of popular IT services, including: - **Email**: Choose email recipients for alert notifications. - **PagerDuty**: Route notifications to staff by phone, SMS, or personal email. - **WebHooks**: Update a webpage with alert notifications. -- **Wechat**: Send alert notifications to your Wechat at Work contacts. +- **WeChat**: Send alert notifications to your Enterprise WeChat contacts.

@@ -65,12 +65,11 @@ Set up a notifier so that you can begin configuring and sending alerts. 1. Enter your webhook **URL**. 1. Click **Test**. If the test is successful, the URL you're configuring as a notifier outputs `Webhook setting validated`. {{% /accordion %}} -{{% accordion id="Wechat" label="Wechat" %}} +{{% accordion id="WeChat" label="WeChat" %}} 1. Enter a **Name** for the notifier. -1. In the **Corporation ID** field, enter corporation id of your corporation, you could get it from [corporation info](https://work.weixin.qq.com/wework_admin/frame#profile). -1. From Wechat, create an application in the Wechat at Work [Application page](https://work.weixin.qq.com/wework_admin/frame#apps). Enter the **Application Agent ID** and **Application Secret** for this application. -1. In the **Recipient Type** field, select one of the recipient types. -1. In the **Default Recipient** field, enter the party name, tag name or user account that you want to receive the notification. For contact information, see [Wechat Contacts](https://work.weixin.qq.com/wework_admin/frame#contacts) +1. In the **Corporation ID** field, enter the "EnterpriseID" of your corporation, you could get it from [Profile page](https://work.weixin.qq.com/wework_admin/frame#profile). +1. From Enterprise WeChat, create an application in the [Application page](https://work.weixin.qq.com/wework_admin/frame#apps), and then enter the "AgentId" and "Secret" of this application to the **Application Agent ID** and **Application Secret** fields. +1. Select the **Recipient Type** and then enter a corresponding name of type to **Default Recipient** field, for example, the party name, tag name or user account that you want to receive the notification. You could get contact information from [Contacts page](https://work.weixin.qq.com/wework_admin/frame#contacts). {{% /accordion %}} 1. Click **Add** to complete adding the notifier. @@ -149,11 +148,11 @@ This alert type monitor for events that affect one of the Kubernetes master comp
Select the urgency level based on the importance of the service and how many nodes fill the role within your cluster. For example, if you're making an alert for the `etcd` service, select **Critical**. If you're making an alert for redundant schedulers, **Warning** is more appropriate. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="resource-event" label="Resource Event Alerts" %}} @@ -184,11 +183,11 @@ This alert type monitors for specific events that are thrown from a resource typ - If you set a normal alert for pods, you're likely to receive alerts often, and individual pods usually self-heal, so select an urgency of **Info**. - If you set a warning alert for StatefulSets, it's very likely to impact operations, so select an urgency of **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. The group wait time is configured to 1s to receive the alert when the event happened. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. The group wait time is configured to 1s to receive the alert when the event happened. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="node" label="Node Alerts" %}} @@ -211,11 +210,11 @@ This alert type monitors for events that occur on a specific node.
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="node-selector" label="Node Selector Alerts" %}} @@ -238,27 +237,27 @@ This alert type monitors for events that occur on any node on marked with a labe
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="cluster-expression" label="Metric Expression Alerts" %}} -This alert type monitors prometheus querying expressions crossed the threshold, it would available after you enable monitoring. +This alert type monitors for the overload from Prometheus expression querying, it would be available after you enable monitoring. -1. Input or select an **Expression**, the drop down shows the original expressions from prometheus, below monitoring information are exposed at the cluster level. - - **Kube State Metric**: Add-on agent to generate and expose cluster-level metrics. For more information, see [Kube-State-Metric](https://github.com/kubernetes/kube-state-metrics) - - **Kubelet Cadvisor**: Analyzes resource usage and performance characteristics of running containers. For more information, see [Cadvisor](https://github.com/google/cadvisor) - - **Node Exporter**: Expose machine monitoring information. For more information, see [Node Exporter](https://github.com/prometheus/node_exporter) - - **Kubernetes Metric**: Kubernetes related metrics. For more information, see [Kubernetes](https://github.com/kubernetes/metrics) - - **ETCD Metric**: Expose ETCD monitoring information. For cluster use rke to deploy, could get etcd metrics. For more information, see [Kubernetes](https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/monitoring.md) - - **Fluentd Metric**: Expose the Fluentd monitoring information. For more information, see [Fluentd](https://docs.fluentd.org/v1.0/articles/monitoring-prometheus) - - **Prometheus Metric**: Prometheus itself metrics. - - **Grafana Metric**: Expose Grafana monitoring information. For more information, see [Grafana](http://docs.grafana.org/administration/metrics/) +1. Input or select an **Expression**, the drop down shows the original metrics from Prometheus, including: + + - [**Node**](https://github.com/prometheus/node_exporter) + - [**Container**](https://github.com/google/cadvisor) + - [**ETCD**](https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/monitoring.md) + - [**Kubernetes Components**](https://github.com/kubernetes/metrics) + - [**Kubernetes Resources**](https://github.com/kubernetes/kube-state-metrics) + - [**Fluentd**](https://docs.fluentd.org/v1.0/articles/monitoring-prometheus) (supported by [Logging]({{< baseurl >}}/rancher/v2.x/en/tools/logging)) + - [**Cluster Level Grafana**](http://docs.grafana.org/administration/metrics/) + - **Cluster Level Prometheus** - 1. Choose a **Comparison**. - **Equal**: Trigger alert when expression value equal to the threshold. @@ -268,11 +267,11 @@ This alert type monitors prometheus querying expressions crossed the threshold, - **Greater or Equal**: Trigger alert when expression value greater to equal to the threshold. - **Less or Equal**: Trigger alert when expression value less or equal to the threshold. -1. Input a **Threshold**, will trigger alert when the value of expression cross the threshold. +1. Input a **Threshold**, for trigger alert when the value of expression cross the threshold. 1. Choose a **Comparison**. -1. Select a duration, will trigger alert when expression value crosses the threshold longer than the configured duration. +1. Select a duration, for trigger alert when expression value crosses the threshold longer than the configured duration. 1. Select the urgency level of the alert. @@ -283,11 +282,11 @@ This alert type monitors prometheus querying expressions crossed the threshold,
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's load expression ```sum(node_load5) / count(node_cpu_seconds_total{mode="system"})``` raises above 0.6 deems an urgency of **Info**, but 1 deems an urgency of **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} @@ -339,11 +338,11 @@ This alert type monitors for the status of a specific pod. Select the urgency level of the alert based on pod state and expandability. For example, a stateless pod that's not can be easily replaced, so select **Info**. However, if an important pod isn't scheduled, it may affect operations, so choose **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="workload" label="Workload Alerts" %}} @@ -361,11 +360,11 @@ This alert type monitors for the availability of a workload. Select the urgency level of the alert based on the percentage you choose and the importance of the workload. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} {{% accordion id="workload-selector" label="Workload Selector Alerts" %}} @@ -381,21 +380,23 @@ This alert type monitors for the availability of all workloads marked with tags Select the urgency level of the alert based on the percentage you choose and the importance of the workload. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} -{{% accordion id="project-expression" label="Metric Expression" %}} -This alert type monitors prometheus querying expressions crossed the threshold, it would available after you enable monitoring. +{{% accordion id="project-expression" label="Metric Expression Alerts" %}} +This alert type monitors for the overload from Prometheus expression querying, it would be available after you enable monitoring. -1. Input or select an **Expression**, the drop down shows the original expressions from prometheus, below monitoring information are exposed at the project level. - - **Kube State Metric**: Add-on agent to generate and expose cluster-level metrics. For more information, see [Kube-State-Metric](https://github.com/kubernetes/kube-state-metrics) - - **Kubelet Cadvisor**: Analyzes resource usage and performance characteristics of running containers. For more information, see [Cadvisor](https://github.com/google/cadvisor) - - **Prometheus Metric**: Project level Prometheus itself metrics. - - **Grafana Metric**: Expose project level Grafana monitoring information. For more information, see [Grafana](http://docs.grafana.org/administration/metrics/) +1. Input or select an **Expression**, the drop down shows the original metrics from Prometheus, including: + + - [**Container**](https://github.com/google/cadvisor) + - [**Kubernetes Resources**](https://github.com/kubernetes/kube-state-metrics) + - [**Customize**]({{< baseurl >}}/rancher/v2.x/en/tools/monitoring/#custom-metrics) + - [**Project Level Grafana**](http://docs.grafana.org/administration/metrics/) + - **Project Level Prometheus** 1. Choose a comparison. @@ -406,11 +407,11 @@ This alert type monitors prometheus querying expressions crossed the threshold, - **Greater or Equal**: Trigger alert when expression value greater to equal to the threshold. - **Less or Equal**: Trigger alert when expression value less or equal to the threshold. -1. Input a **Threshold**, will trigger alert when the value of expression cross the threshold. +1. Input a **Threshold**, for trigger alert when the value of expression cross the threshold. 1. Choose a **Comparison**. -1. Select a **Duration**, will trigger alert when expression value crosses the threshold longer than the configured duration. +1. Select a **Duration**, for trigger alert when expression value crosses the threshold longer than the configured duration. 1. Select the urgency level of the alert. @@ -421,11 +422,11 @@ This alert type monitors prometheus querying expressions crossed the threshold,
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a expression for container memory close to the limit raises above 60% deems an urgency of **Info**, but raised about 95% deems an urgency of **Critical**. -1. Config advance options, it inheriting the advance option from group level by default, you configure alert rule use a customized advance option by disabling inherited. +1. Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule. - - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially. - - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. - - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts. + - **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds. + - **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds. + - **Repeat Wait Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 1 hour. {{% /accordion %}} @@ -440,10 +441,10 @@ This alert type monitors prometheus querying expressions crossed the threshold, #### Managing Project Alerts -To manage project alerts, browse to the project that alerts you want to manage. Then select **Resources > Alerts**. You can: +To manage project alerts, browse to the project that alerts you want to manage. Then select **Tools > Alerts**. You can: - Deactivate/Reactive alerts - Edit alert settings - Delete unnecessary alerts - Mute firing alerts -- Unmute muted alerts \ No newline at end of file +- Unmute muted alerts