From 950c8ce9c561ed0f8e88850763da879b7cb5e6ca Mon Sep 17 00:00:00 2001 From: Catherine Luse Date: Thu, 8 Aug 2019 13:33:07 -0700 Subject: [PATCH] Change default alert paragraphs to tables --- .../tools/alerts/default-alerts/_index.md | 58 ++++++++----------- 1 file changed, 25 insertions(+), 33 deletions(-) diff --git a/content/rancher/v2.x/en/cluster-admin/tools/alerts/default-alerts/_index.md b/content/rancher/v2.x/en/cluster-admin/tools/alerts/default-alerts/_index.md index 12a393236cf..9c5c327c9a4 100644 --- a/content/rancher/v2.x/en/cluster-admin/tools/alerts/default-alerts/_index.md +++ b/content/rancher/v2.x/en/cluster-admin/tools/alerts/default-alerts/_index.md @@ -17,58 +17,50 @@ Etcd is the key-value store that contains the state of the Kubernetes cluster. I A leader is the node that handles all client requests that need cluster consensus. For more information, you can refer to this [explanation of how etcd works.](https://rancher.com/blog/2019/2019-01-29-what-is-etcd/#how-does-etcd-work) -### Alert: A high number of leader changes within the etcd cluster are happening -A warning alert is triggered when the leader changes more than three times in one hour. - The leader of the cluster can change in response to certain events. It is normal for the leader to change, but too many changes can indicate a problem with the network or a high CPU load. With longer latencies, the default etcd configuration may cause frequent heartbeat timeouts, which trigger a new leader election. -### Alert: Database usage close to the quota 500M -A warning alert is triggered when the size of etcd exceeds 524,288,000 bytes. +| Alert | Explanation | +|-------|-------------| +| A high number of leader changes within the etcd cluster are happening | A warning alert is triggered when the leader changes more than three times in one hour. | +| Database usage close to the quota 500M | A warning alert is triggered when the size of etcd exceeds 524.288M.| +| Etcd is unavailable | A critical alert is triggered when etcd becomes unavailable. | +| Etcd member has no leader | A critical alert is triggered when etcd does not have a leader for at least three minutes. | -### Alert: Etcd is unavailable -A critical alert is triggered when etcd becomes unavailable. - -### Alert: Etcd member has no leader -A critical alert is triggered when etcd does not have a leader for at least three minutes. # Alerts for Kube Components Rancher provides alerts when core Kubernetes system components become unhealthy. -### Alert: Controller Manager is unavailable -A critical warning is triggered when the cluster’s controller-manager becomes unavailable. - Controllers update Kubernetes resources based on changes in etcd. The [controller manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) monitors the cluster desired state through the Kubernetes API server and makes the necessary changes to the current state to reach the desired state. -### Alert: Scheduler is unavailable -A critical warning is triggered when the cluster’s scheduler becomes unavailable. - The [scheduler](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/) service is a core component of Kubernetes. It is responsible for scheduling cluster workloads to nodes, based on various configurations, metrics, resource requirements and workload-specific requirements. +| Alert | Explanation | +|-------|-------------| +| Controller Manager is unavailable | A critical warning is triggered when the cluster’s controller-manager becomes unavailable. | +| Scheduler is unavailable | A critical warning is triggered when the cluster’s scheduler becomes unavailable. | + + # Alerts for Events Events can trigger alerts. -### Alert: Get warning deployment event -A warning alert is triggered when a deployment event happens. +| Alert | Explanation | +|-------|-------------| +| Get warning deployment event | A warning alert is triggered by a deployment event happens. | + # Alerts for Node Alerts can be triggered based on node metrics. -### Alert: High CPU load -A warning alert is triggered if the node uses more than 100 percent of the node’s available CPU seconds for at least three minutes. - -### Alert: High node memory utilization -A warning alert is triggered if the node uses more than 80 percent of its available memory for at least three minutes. - -### Alert: Node disk is running full within 24 hours -A critical alert is triggered if the disk space on the node is expected to run out in the next 24 hours based on the disk growth over the last 6 hours. - +| Alert | Explanation | +|-------|-------------| +| High CPU load | A warning alert is triggered if the node uses more than 100 percent of the node’s available CPU seconds for at least three minutes. | +| High node memory utilization | A warning alert is triggered if the node uses more than 80 percent of its available memory for at least three minutes. | +| Node disk is running full within 24 hours | A critical alert is triggered if the disk space on the node is expected to run out in the next 24 hours based on the disk growth over the last 6 hours. | # Project-level Alerts When you enable monitoring for the project, some project-level alerts are provided. -### Alert: Less than half workload available -A critical alert is triggered if less than half of a workload is available, based on workloads where the key is `app` and the value is `workload.` - -### Alert: Memory usage close to the quota -A warning alert is triggered if the project's memory usage exceeds the memory resource limits for the project. - +| Alert | Explanation | +|-------|-------------| +| Less than half workload available | A critical alert is triggered if less than half of a workload is available, based on workloads where the key is `app` and the value is `workload.` | +| Memory usage close to the quota | A warning alert is triggered if the project's memory usage exceeds the memory resource limits for the project. | \ No newline at end of file