[2.0-2.4] Move files out of pages-for-subheaders

This commit is contained in:
Billy Tat
2024-01-12 16:07:44 -08:00
parent 3a4b7e73c0
commit c32fd49367
256 changed files with 1241 additions and 1241 deletions
@@ -0,0 +1,26 @@
---
title: CIS Scans
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/cis-scans"/>
</head>
_Available as of v2.4.0_
- [Prerequisites](#prerequisites)
- [How-to Guides](#how-to-guides)
## Prerequisites
To run security scans on a cluster and access the generated reports, you must be an [Administrator](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions.md) or [Cluster Owner.](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md)
Rancher can only run security scans on clusters that were created with RKE, which includes custom clusters and clusters that Rancher created in an infrastructure provider such as Amazon EC2 or GCE. Imported clusters and clusters in hosted Kubernetes providers can't be scanned by Rancher.
The security scan cannot run in a cluster that has Windows nodes.
You will only be able to see the CIS scan reports for clusters that you have access to.
## How-to Guides
Please refer [here](../../../how-to-guides/advanced-user-guides/cis-scan-guides/cis-scan-guides.md) for how-to guides on CIS scans.
@@ -0,0 +1,328 @@
---
title: Cluster Alerts
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/monitoring-and-alerting"/>
</head>
To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned. When an event occurs, your alert is triggered, and you are sent a notification. You can then, if necessary, follow up with corrective actions.
## About Alerts
Notifiers and alerts are built on top of the [Prometheus Alertmanager](https://prometheus.io/docs/alerting/alertmanager/). Leveraging these tools, Rancher can notify [cluster owners](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles) and [project owners](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#project-roles) of events they need to address.
Before you can receive alerts, you must configure one or more notifier in Rancher.
When you create a cluster, some alert rules are predefined. You can receive these alerts if you configure a [notifier](../notifiers.md) for them.
For details about what triggers the predefined alerts, refer to the [documentation on default alerts.](default-alerts.md)
### Alert Event Examples
Some examples of alert events are:
- A Kubernetes master component entering an unhealthy state.
- A node or workload error occurring.
- A scheduled deployment taking place as planned.
- A node's hardware resources becoming overstressed.
### Alerts Triggered by Prometheus Queries
When you edit an alert rule, you will have the opportunity to configure the alert to be triggered based on a Prometheus expression. For examples of expressions, refer to [this page.](../cluster-monitoring/expression.md)
Monitoring must be [enabled](../cluster-monitoring/cluster-monitoring.md) before you can trigger alerts with custom Prometheus queries or expressions.
### Urgency Levels
You can set an urgency level for each alert. This urgency appears in the notification you receive, helping you to prioritize your response actions. For example, if you have an alert configured to inform you of a routine deployment, no action is required. These alerts can be assigned a low priority level. However, if a deployment fails, it can critically impact your organization, and you need to react quickly. Assign these alerts a high priority level.
### Scope of Alerts
The scope for alerts can be set at either the cluster level or [project level](../../../reference-guides/rancher-project-tools/project-alerts.md).
At the cluster level, Rancher monitors components in your Kubernetes cluster, and sends you alerts related to:
- The state of your nodes.
- The system services that manage your Kubernetes cluster.
- The resource events from specific system services.
- The Prometheus expression cross the thresholds
### Managing Cluster Alerts
After you set up cluster alerts, you can manage each alert object. To manage alerts, browse to the cluster containing the alerts, and then select **Tools > Alerts** that you want to manage. You can:
- Deactivate/Reactive alerts
- Edit alert settings
- Delete unnecessary alerts
- Mute firing alerts
- Unmute muted alerts
## Adding Cluster Alerts
As a [cluster owner](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles), you can configure Rancher to send you alerts for cluster events.
>**Prerequisite:** Before you can receive cluster alerts, you must [add a notifier](../notifiers.md).
1. From the **Global** view, navigate to the cluster that you want to configure cluster alerts for. Select **Tools > Alerts**. Then click **Add Alert Group**.
1. Enter a **Name** for the alert that describes its purpose, you could group alert rules for the different purpose.
1. Based on the type of alert you want to create, refer to the [cluster alert configuration section.](#cluster-alert-configuration)
1. Continue adding more **Alert Rule** to the group.
1. Finally, choose the [notifiers](../notifiers.md) to send the alerts to.
- You can set up multiple notifiers.
- You can change notifier recipients on the fly.
1. Click **Create.**
**Result:** Your alert is configured. A notification is sent when the alert is triggered.
## Cluster Alert Configuration
- [System Service Alerts](#system-service-alerts)
- [Resource Event Alerts](#resource-event-alerts)
- [Node Alerts](#node-alerts)
- [Node Selector Alerts](#node-selector-alerts)
- [CIS Scan Alerts](#cis-scan-alerts)
- [Metric Expression Alerts](#metric-expression-alerts)
## System Service Alerts
This alert type monitor for events that affect one of the Kubernetes master components, regardless of the node it occurs on.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **System Services** option, and then select an option from the dropdown:
- [controller-manager](https://kubernetes.io/docs/concepts/overview/components/#kube-controller-manager)
- [etcd](https://kubernetes.io/docs/concepts/overview/components/#etcd)
- [scheduler](https://kubernetes.io/docs/concepts/overview/components/#kube-scheduler)
### Is
The alert will be triggered when the selected Kubernetes master component is unhealthy.
### Send a
Select the urgency level of the alert. The options are:
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level based on the importance of the service and how many nodes fill the role within your cluster. For example, if you're making an alert for the `etcd` service, select **Critical**. If you're making an alert for redundant schedulers, **Warning** is more appropriate.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
## Resource Event Alerts
This alert type monitors for specific events that are thrown from a resource type.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Choose the type of resource event that triggers an alert. The options are:
- **Normal**: triggers an alert when any standard resource event occurs.
- **Warning**: triggers an alert when unexpected resource events occur.
Select a resource type from the **Choose a Resource** drop-down that you want to trigger an alert.
- [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/)
- [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
- [Node](https://kubernetes.io/docs/concepts/architecture/nodes/)
- [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/)
- [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert by considering factors such as how often the event occurs or its importance. For example:
- If you set a normal alert for pods, you're likely to receive alerts often, and individual pods usually self-heal, so select an urgency of **Info**.
- If you set a warning alert for StatefulSets, it's very likely to impact operations, so select an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
## Node Alerts
This alert type monitors for events that occur on a specific node.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **Node** option, and then make a selection from the **Choose a Node** drop-down.
### Is
Choose an event to trigger the alert.
- **Not Ready**: Sends you an alert when the node is unresponsive.
- **CPU usage over**: Sends you an alert when the node raises above an entered percentage of its processing allocation.
- **Mem usage over**: Sends you an alert when the node raises above an entered percentage of its memory allocation.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
## Node Selector Alerts
This alert type monitors for events that occur on any node on marked with a label. For more information, see the Kubernetes documentation for [Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/).
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **Node Selector** option, and then click **Add Selector** to enter a key value pair for a label. This label should be applied to one or more of your nodes. Add as many selectors as you'd like.
### Is
Choose an event to trigger the alert.
- **Not Ready**: Sends you an alert when selected nodes are unresponsive.
- **CPU usage over**: Sends you an alert when selected nodes raise above an entered percentage of processing allocation.
- **Mem usage over**: Sends you an alert when selected nodes raise above an entered percentage of memory allocation.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
## CIS Scan Alerts
_Available as of v2.4.0_
This alert type is triggered based on the results of a CIS scan.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select **CIS Scan.**
### Is
Choose an event to trigger the alert:
- Completed Scan
- Has Failure
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
## Metric Expression Alerts
This alert type monitors for the overload from Prometheus expression querying, it would be available after you enable monitoring.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Input or select an **Expression**, the dropdown shows the original metrics from Prometheus, including:
- [**Node**](https://github.com/prometheus/node_exporter)
- [**Container**](https://github.com/google/cadvisor)
- [**ETCD**](https://etcd.io/docs/v3.4.0/op-guide/monitoring/)
- [**Kubernetes Components**](https://github.com/kubernetes/metrics)
- [**Kubernetes Resources**](https://github.com/kubernetes/kube-state-metrics)
- [**Fluentd**](https://docs.fluentd.org/v1.0/articles/monitoring-prometheus) (supported by [Logging](../cluster-logging/cluster-logging.md))
- [**Cluster Level Grafana**](https://grafana.com/docs/grafana/latest/setup-grafana/set-up-grafana-monitoring/)
- **Cluster Level Prometheus**
### Is
Choose a comparison:
- **Equal**: Trigger alert when expression value equal to the threshold.
- **Not Equal**: Trigger alert when expression value not equal to the threshold.
- **Greater Than**: Trigger alert when expression value greater than to threshold.
- **Less Than**: Trigger alert when expression value equal or less than the threshold.
- **Greater or Equal**: Trigger alert when expression value greater to equal to the threshold.
- **Less or Equal**: Trigger alert when expression value less or equal to the threshold.
If applicable, choose a comparison value or a threshold for the alert to be triggered.
### For
Select a duration for a trigger alert when the expression value crosses the threshold longer than the configured duration.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's load expression ```sum(node_load5) / count(node_cpu_seconds_total{mode="system"})``` raises above 0.6 deems an urgency of **Info**, but 1 deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
@@ -0,0 +1,115 @@
---
title: Cluster Logging
description: Rancher integrates with popular logging services. Learn the requirements and benefits of integrating with logging services, and enable logging on your cluster.
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/logging"/>
</head>
Logging is helpful because it allows you to:
- Capture and analyze the state of your cluster
- Look for trends in your environment
- Save your logs to a safe location outside of your cluster
- Stay informed of events like a container crashing, a pod eviction, or a node dying
- More easily debug and troubleshoot problems
Rancher supports integration with the following services:
- Elasticsearch
- Splunk
- Kafka
- Syslog
- Fluentd
## How Logging Integrations Work
Rancher can integrate with popular external services used for event streams, telemetry, or search. These services can log errors and warnings in your Kubernetes infrastructure to a stream.
These services collect container log events, which are saved to the `/var/log/containers` directory on each of your nodes. The service collects both standard and error events. You can then log into your services to review the events collected, leveraging each service's unique features.
When configuring Rancher to integrate with these services, you'll have to point Rancher toward the service's endpoint and provide authentication information.
Additionally, you'll have the opportunity to enter key-value pairs to filter the log events collected. The service will only collect events for containers marked with your configured key-value pairs.
>**Note:** You can only configure one logging service per cluster or per project.
## Requirements
The Docker daemon on each node in the cluster should be [configured](https://docs.docker.com/config/containers/logging/configure/) with the (default) log-driver: `json-file`. You can check the log-driver by running the following command:
```
$ docker info | grep 'Logging Driver'
Logging Driver: json-file
```
## Logging Scope
You can configure logging at either cluster level or project level.
- Cluster logging writes logs for every pod in the cluster, i.e. in all the projects. For [RKE clusters](../../../how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/launch-kubernetes-with-rancher/launch-kubernetes-with-rancher.md), it also writes logs for all the Kubernetes system components.
- [Project logging](../../../reference-guides/rancher-project-tools/project-logging.md) writes logs for every pod in that particular project.
Logs that are sent to your logging service are from the following locations:
- Pod logs stored at `/var/log/containers`.
- Kubernetes system components logs stored at `/var/lib/rancher/rke/log/`.
## Enabling Cluster Logging
As an [administrator](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions.md) or [cluster owner](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles), you can configure Rancher to send Kubernetes logs to a logging service.
1. From the **Global** view, navigate to the cluster that you want to configure cluster logging.
1. Select **Tools > Logging** in the navigation bar.
1. Select a logging service and enter the configuration. Refer to the specific service for detailed configuration. Rancher supports integration with the following services:
- [Elasticsearch](elasticsearch.md)
- [Splunk](splunk.md)
- [Kafka](kafka.md)
- [Syslog](syslog.md)
- [Fluentd](fluentd.md)
1. (Optional) Instead of using the UI to configure the logging services, you can enter custom advanced configurations by clicking on **Edit as File**, which is located above the logging targets. This link is only visible after you select a logging service.
- With the file editor, enter raw fluentd configuration for any logging service. Refer to the documentation for each logging service on how to setup the output configuration.
- [Elasticsearch Documentation](https://github.com/uken/fluent-plugin-elasticsearch)
- [Splunk Documentation](https://github.com/fluent/fluent-plugin-splunk)
- [Kafka Documentation](https://github.com/fluent/fluent-plugin-kafka)
- [Syslog Documentation](https://github.com/dlackty/fluent-plugin-remote_syslog)
- [Fluentd Documentation](https://docs.fluentd.org/v1.0/articles/out_forward)
- If the logging service is using TLS, you also need to complete the **SSL Configuration** form.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
- You can use either a self-signed certificate or one provided by a certificate authority.
- You can generate a self-signed certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
2. If you are using a self-signed certificate, provide the **CA Certificate PEM**.
1. (Optional) Complete the **Additional Logging Configuration** form.
1. **Optional:** Use the **Add Field** button to add custom log fields to your logging configuration. These fields are key value pairs (such as `foo=bar`) that you can use to filter the logs from another system.
1. Enter a **Flush Interval**. This value determines how often [Fluentd](https://www.fluentd.org/) flushes data to the logging server. Intervals are measured in seconds.
1. **Include System Log**. The logs from pods in system project and RKE components will be sent to the target. Uncheck it to exclude the system logs.
1. Click **Test**. Rancher sends a test log to the service.
> **Note:** This button is replaced with _Dry Run_ if you are using the custom configuration editor. In this case, Rancher calls the fluentd dry run command to validate the configuration.
1. Click **Save**.
**Result:** Rancher is now configured to send logs to the selected service. Log into the logging service so that you can start viewing the logs.
## Related Links
[Logging Architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/)
@@ -38,7 +38,7 @@ Some of the biggest metrics to look out for:
### Etcd Metrics
>**Note:** Only supported for [Rancher launched Kubernetes clusters](../../../pages-for-subheaders/launch-kubernetes-with-rancher.md).
>**Note:** Only supported for [Rancher launched Kubernetes clusters](../../../how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/launch-kubernetes-with-rancher/launch-kubernetes-with-rancher.md).
Etcd metrics display the operations of the etcd database on each of your cluster nodes. After establishing a baseline of normal etcd operational metrics, observe them for abnormal deltas between metric refreshes, which indicate potential issues with etcd. Always address etcd issues immediately!
@@ -60,7 +60,7 @@ Some of the biggest metrics to look out for:
Kubernetes components metrics display data about the cluster's individual Kubernetes components. Primarily, it displays information about connections and latency for each component: the API server, controller manager, scheduler, and ingress controller.
>**Note:** The metrics for the controller manager, scheduler and ingress controller are only supported for [Rancher launched Kubernetes clusters](../../../pages-for-subheaders/launch-kubernetes-with-rancher.md).
>**Note:** The metrics for the controller manager, scheduler and ingress controller are only supported for [Rancher launched Kubernetes clusters](../../../how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/launch-kubernetes-with-rancher/launch-kubernetes-with-rancher.md).
When analyzing Kubernetes component metrics, don't be concerned about any single standalone metric in the charts and graphs that display. Rather, you should establish a baseline for metrics considered normal following a period of observation, e.g. the range of values that your components usually operate within and are considered normal. After you establish this baseline, be on the lookout for large deltas in the charts and graphs, as these big changes usually indicate a problem that you need to investigate.
@@ -90,7 +90,7 @@ Some of the more important component metrics to monitor are:
## Rancher Logging Metrics
Although the Dashboard for a cluster primarily displays data sourced from Prometheus, it also displays information for cluster logging, provided that you have [configured Rancher to use a logging service](../../../pages-for-subheaders/cluster-logging.md).
Although the Dashboard for a cluster primarily displays data sourced from Prometheus, it also displays information for cluster logging, provided that you have [configured Rancher to use a logging service](../cluster-logging/cluster-logging.md).
[_Get expressions for Rancher Logging Metrics_](./expression.md#rancher-logging-metrics)
@@ -0,0 +1,110 @@
---
title: Integrating Rancher and Prometheus for Cluster Monitoring
description: Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Learn about the scope of monitoring and how to enable cluster monitoring
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/monitoring-and-alerting"/>
</head>
_Available as of v2.2.0_
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
## About Prometheus
Prometheus provides a _time series_ of your data, which is, according to [Prometheus documentation](https://prometheus.io/docs/concepts/data_model/):
You can configure these services to collect logs at either the cluster level or the project level. This page describes how to enable monitoring for a cluster. For details on enabling monitoring for a project, refer to the [project administration section](project-monitoring.md).
>A stream of timestamped values belonging to the same metric and the same set of labeled dimensions, along with comprehensive statistics and metrics of the monitored cluster.
In other words, Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Using timestamps, Prometheus lets you query and view these metrics in easy-to-read graphs and visuals, either through the Rancher UI or [Grafana](https://grafana.com/), which is an analytics viewing platform deployed along with Prometheus.
By viewing data that Prometheus scrapes from your cluster control plane, nodes, and deployments, you can stay on top of everything happening in your cluster. You can then use these analytics to better run your organization: stop system emergencies before they start, develop maintenance strategies, restore crashed servers, etc.
Multi-tenancy support in terms of cluster-only and project-only Prometheus instances are also supported.
## Monitoring Scope
Using Prometheus, you can monitor Rancher at both the cluster level and [project level](project-monitoring.md). For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
- Cluster monitoring allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- Kubernetes control plane
- etcd database
- All nodes (including workers)
- [Project monitoring](project-monitoring.md) allows you to view the state of pods running in a given project. Prometheus collects metrics from the project's deployed HTTP and TCP/UDP workloads.
## Enabling Cluster Monitoring
As an [administrator](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/global-permissions.md) or [cluster owner](../../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles), you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.
> **Prerequisites:** The following TCP ports need to be opened for metrics scraping:
>
> | Port | Node type | Component |
> | --- | --- | --- |
> | 9796 | Worker | Node exporter |
> | 10254 | Worker | Nginx Ingress Controller |
> | 10250 | Worker/Controlplane | Kubelet |
> | 10251 | Controlplane | Kube scheduler |
> | 10252 | Controlplane | Kube controller manager |
> | 2379 | Etcd | Etcd server |
> Monitoring V1 requires a Kubernetes verison less than or equal to v1.20.x. To install monitoring on Kubernetes v1.21+, you will need to [migrate to Monitoring V2.](../../version-2.5/how-to-guides/advanced-user-guides/monitoring-alerting-guides/migrate-to-rancher-v2.5%2B-monitoring.md)
1. From the **Global** view, navigate to the cluster that you want to configure cluster monitoring.
1. Select **Tools > Monitoring** in the navigation bar.
1. Select **Enable** to show the [Prometheus configuration options](prometheus.md). Review the [resource consumption recommendations](#resource-consumption) to ensure you have enough resources for Prometheus and on your worker nodes to enable monitoring. Enter in your desired configuration options.
1. Click **Save**.
**Result:** The Prometheus server will be deployed as well as two monitoring applications. The two monitoring applications, `cluster-monitoring` and `monitoring-operator`, are added as an [application](../../../how-to-guides/new-user-guides/helm-charts-in-rancher/helm-charts-in-rancher.md) to the cluster's `system` project. After the applications are `active`, you can start viewing [cluster metrics](cluster-metrics.md) through the Rancher dashboard or directly from Grafana.
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
## Resource Consumption
When enabling cluster monitoring, you need to ensure your worker nodes and Prometheus pod have enough resources. The tables below provides a guide of how much resource consumption will be used. In larger deployments, it is strongly advised that the monitoring infrastructure be placed on dedicated nodes in the cluster.
### Resource Consumption of Prometheus Pods
This table is the resource consumption of the Prometheus pod, which is based on the number of all the nodes in the cluster. The count of nodes includes the worker, control plane and etcd nodes. Total disk space allocation should be approximated by the `rate * retention` period set at the cluster level. When enabling cluster level monitoring, you should adjust the CPU and Memory limits and reservation.
Number of Cluster Nodes | CPU (milli CPU) | Memory | Disk
------------------------|-----|--------|------
5 | 500 | 650 MB | ~1 GB/Day
50| 2000 | 2 GB | ~5 GB/Day
256| 4000 | 6 GB | ~18 GB/Day
Additional pod resource requirements for cluster level monitoring.
| Workload | Container | CPU - Request | Mem - Request | CPU - Limit | Mem - Limit | Configurable |
|---------------------|---------------------------------|---------------|---------------|-------------|-------------|--------------|
| Prometheus | prometheus | 750m | 750Mi | 1000m | 1000Mi | Y |
| | prometheus-proxy | 50m | 50Mi | 100m | 100Mi | Y |
| | prometheus-auth | 100m | 100Mi | 500m | 200Mi | Y |
| | prometheus-config-reloader | - | - | 50m | 50Mi | N |
| | rules-configmap-reloader | - | - | 100m | 25Mi | N |
| Grafana | grafana-init-plugin-json-copy | 50m | 50Mi | 50m | 50Mi | Y |
| | grafana-init-plugin-json-modify | 50m | 50Mi | 50m | 50Mi | Y |
| | grafana | 100m | 100Mi | 200m | 200Mi | Y |
| | grafana-proxy | 50m | 50Mi | 100m | 100Mi | Y |
| Kube-State Exporter | kube-state | 100m | 130Mi | 100m | 200Mi | Y |
| Node Exporter | exporter-node | 200m | 200Mi | 200m | 200Mi | Y |
| Operator | prometheus-operator | 100m | 50Mi | 200m | 100Mi | Y |
### Resource Consumption of Other Pods
Besides the Prometheus pod, there are components that are deployed that require additional resources on the worker nodes.
Pod | CPU (milli CPU) | Memory (MB)
----|-----------------|------------
Node Exporter (Per Node) | 100 | 30
Kube State Cluster Monitor | 100 | 130
Grafana | 100 | 150
Prometheus Cluster Monitoring Nginx | 50 | 50
@@ -2,7 +2,7 @@
title: Prometheus Custom Metrics Adapter
---
After you've enabled [cluster level monitoring](../../../pages-for-subheaders/cluster-monitoring.md), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
After you've enabled [cluster level monitoring](cluster-monitoring.md), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
## Deploy Prometheus Custom Metrics Adapter
@@ -2,9 +2,9 @@
title: Prometheus Expressions
---
The PromQL expressions in this doc can be used to configure [alerts.](../../../pages-for-subheaders/cluster-alerts.md)
The PromQL expressions in this doc can be used to configure [alerts.](../cluster-alerts/cluster-alerts.md)
> Before expressions can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level](../../../pages-for-subheaders/cluster-monitoring.md) or [at the project level.](./project-monitoring.md)
> Before expressions can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level](cluster-monitoring.md) or [at the project level.](./project-monitoring.md)
For more information about querying Prometheus, refer to the official [Prometheus documentation.](https://prometheus.io/docs/prometheus/latest/querying/basics/)
@@ -9,9 +9,9 @@ Using Rancher, you can monitor the state and processes of your cluster nodes, Ku
### Monitoring Scope
Using Prometheus, you can monitor Rancher at both the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md) and project level. For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
Using Prometheus, you can monitor Rancher at both the [cluster level](cluster-monitoring.md) and project level. For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
- [Cluster monitoring](../../../pages-for-subheaders/cluster-monitoring.md/) allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- [Cluster monitoring](cluster-monitoring.md/) allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- Kubernetes control plane
- etcd database
@@ -25,7 +25,7 @@ Only [administrators](../../../how-to-guides/advanced-user-guides/authentication
### Enabling Project Monitoring
> **Prerequisite:** Cluster monitoring must be [enabled.](../../../pages-for-subheaders/cluster-monitoring.md)
> **Prerequisite:** Cluster monitoring must be [enabled.](cluster-monitoring.md)
1. Go to the project where monitoring should be enabled. Note: When cluster monitoring is enabled, monitoring is also enabled by default in the **System** project.
@@ -43,12 +43,12 @@ Prometheus|750m| 750Mi | 1000m | 1000Mi | Yes
Grafana | 100m | 100Mi | 200m | 200Mi | No
**Result:** A single application,`project-monitoring`, is added as an [application](../../../pages-for-subheaders/helm-charts-in-rancher.md) to the project. After the application is `active`, you can start viewing project metrics through the [Rancher dashboard](../../../pages-for-subheaders/cluster-monitoring.md/) or directly from Grafana.
**Result:** A single application,`project-monitoring`, is added as an [application](../../../how-to-guides/new-user-guides/helm-charts-in-rancher/helm-charts-in-rancher.md) to the project. After the application is `active`, you can start viewing project metrics through the [Rancher dashboard](cluster-monitoring.md/) or directly from Grafana.
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
### Project Metrics
[Workload metrics](./expression.md#workload-metrics) are available for the project if monitoring is enabled at the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md/) and at the [project level.](#enabling-project-monitoring)
[Workload metrics](./expression.md#workload-metrics) are available for the project if monitoring is enabled at the [cluster level](cluster-monitoring.md/) and at the [project level.](#enabling-project-monitoring)
You can monitor custom metrics from any [exporters.](https://prometheus.io/docs/instrumenting/exporters/) You can also expose some custom endpoints on deployments without needing to configure Prometheus for your project.
@@ -4,7 +4,7 @@ title: Prometheus Configuration
_Available as of v2.2.0_
While configuring monitoring at either the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md) or [project level](./project-monitoring.md), there are multiple options that can be configured.
While configuring monitoring at either the [cluster level](cluster-monitoring.md) or [project level](./project-monitoring.md), there are multiple options that can be configured.
- [Basic Configuration](#basic-configuration)
- [Advanced Options](#advanced-options)
@@ -29,7 +29,7 @@ Selector | Ability to select the nodes in which Prometheus and Grafana pods are
## Advanced Options
Since monitoring is an [application](https://github.com/rancher/system-charts/tree/dev/charts/rancher-monitoring) from the [Rancher catalog](../../../pages-for-subheaders/helm-charts-in-rancher.md), it can be configured like any other catalog application, by passing in values to Helm.
Since monitoring is an [application](https://github.com/rancher/system-charts/tree/dev/charts/rancher-monitoring) from the [Rancher catalog](../../../how-to-guides/new-user-guides/helm-charts-in-rancher/helm-charts-in-rancher.md), it can be configured like any other catalog application, by passing in values to Helm.
> **Warning:** Any modification to the application without understanding the entire application can lead to catastrophic errors.
@@ -74,7 +74,7 @@ When configuring Prometheus and enabling the node exporter, enter a host port in
## Persistent Storage
>**Prerequisite:** Configure one or more StorageClasses to use as [persistent storage](../../../pages-for-subheaders/create-kubernetes-persistent-storage.md) for your Prometheus or Grafana pod.
>**Prerequisite:** Configure one or more StorageClasses to use as [persistent storage](../../../how-to-guides/advanced-user-guides/manage-clusters/create-kubernetes-persistent-storage/create-kubernetes-persistent-storage.md) for your Prometheus or Grafana pod.
By default, when you enable Prometheus for either a cluster or project, all monitoring data that Prometheus collects is stored on its own pod. With local storage, if the Prometheus or Grafana pods fail, all the data is lost. Rancher recommends configuring an external persistent storage to the cluster. With the external persistent storage, if the Prometheus or Grafana pods fail, the new pods can recover using data from the persistent storage.
@@ -4,11 +4,11 @@ title: Viewing Metrics
_Available as of v2.2.0_
After you've enabled monitoring at either the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md) or [project level](./project-monitoring.md), you will want to be start viewing the data being collected. There are multiple ways to view this data.
After you've enabled monitoring at either the [cluster level](cluster-monitoring.md) or [project level](./project-monitoring.md), you will want to be start viewing the data being collected. There are multiple ways to view this data.
## Rancher Dashboard
>**Note:** This is only available if you've enabled monitoring at the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md). Project specific analytics must be viewed using the project's Grafana instance.
>**Note:** This is only available if you've enabled monitoring at the [cluster level](cluster-monitoring.md). Project specific analytics must be viewed using the project's Grafana instance.
Rancher's dashboards are available at multiple locations:
@@ -32,7 +32,7 @@ When analyzing these metrics, don't be concerned about any single standalone met
## Grafana
If you've enabled monitoring at either the [cluster level](../../../pages-for-subheaders/cluster-monitoring.md) or [project level](./project-monitoring.md), Rancher automatically creates a link to Grafana instance. Use this link to view monitoring data.
If you've enabled monitoring at either the [cluster level](cluster-monitoring.md) or [project level](./project-monitoring.md), Rancher automatically creates a link to Grafana instance. Use this link to view monitoring data.
Grafana allows you to query, visualize, alert, and ultimately, understand your cluster and workload data. For more information on Grafana and its capabilities, visit the [Grafana website](https://grafana.com/grafana).
@@ -0,0 +1,11 @@
---
title: Integrations in Rancher
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/cloud-marketplace"/>
</head>
Over time, Rancher has accrued several products and projects that have been integrated into the Rancher UI.
Examples of some of these integrations are [Istio](istio/istio.md) and [CIS Scans](cis-scans/cis-scans.md).
@@ -0,0 +1,90 @@
---
title: Istio
---
<head>
<link rel="canonical" href="https://ranchermanager.docs.rancher.com/pages-for-subheaders/istio"/>
</head>
_Available as of v2.3.0_
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices.
As a network of microservices changes and grows, the interactions between them can become more difficult to manage and understand. In such a situation, it is useful to have a service mesh as a separate infrastructure layer. Istio's service mesh lets you manipulate traffic between microservices without changing the microservices directly.
Our integration of Istio is designed so that a Rancher operator, such as an administrator or cluster owner, can deliver Istio to developers. Then developers can use Istio to enforce security policies, troubleshoot problems, or manage traffic for green/blue deployments, canary deployments, or A/B testing.
This service mesh provides features that include but are not limited to the following:
- Traffic management features
- Enhanced monitoring and tracing
- Service discovery and routing
- Secure connections and service-to-service authentication with mutual TLS
- Load balancing
- Automatic retries, backoff, and circuit breaking
After Istio is enabled in a cluster, you can leverage Istio's control plane functionality with `kubectl`.
Rancher's Istio integration comes with comprehensive visualization aids:
- **Trace the root cause of errors with Jaeger.** [Jaeger](https://www.jaegertracing.io/) is an open-source tool that provides a UI for a distributed tracing system, which is useful for root cause analysis and for determining what causes poor performance. Distributed tracing allows you to view an entire chain of calls, which might originate with a user request and traverse dozens of microservices.
- **Get the full picture of your microservice architecture with Kiali.** [Kiali](https://www.kiali.io/) provides a diagram that shows the services within a service mesh and how they are connected, including the traffic rates and latencies between them. You can check the health of the service mesh, or drill down to see the incoming and outgoing requests to a single component.
- **Gain insights from time series analytics with Grafana dashboards.** [Grafana](https://grafana.com/) is an analytics platform that allows you to query, visualize, alert on and understand the data gathered by Prometheus.
- **Write custom queries for time series data with the Prometheus UI.** [Prometheus](https://prometheus.io/) is a systems monitoring and alerting toolkit. Prometheus scrapes data from your cluster, which is then used by Grafana. A Prometheus UI is also integrated into Rancher, and lets you write custom queries for time series data and see the results in the UI.
Istio needs to be set up by a Rancher administrator or cluster administrator before it can be used in a project.
## Prerequisites
Before enabling Istio, we recommend that you confirm that your Rancher worker nodes have enough [CPU and memory](cpu-and-memory-allocations.md) to run all of the components of Istio.
## Setup Guide
Refer to the [setup guide](../../../how-to-guides/advanced-user-guides/istio-setup-guide/istio-setup-guide.md) for instructions on how to set up Istio and use it in a project.
## Disabling Istio
To remove Istio components from a cluster, namespace, or workload, refer to the section on [disabling Istio.](disable-istio.md)
## Accessing Visualizations
> By default, only cluster owners have access to Jaeger and Kiali. For instructions on how to allow project members to access them, see [this section.](rbac-for-istio.md)
After Istio is set up in a cluster, Grafana, Prometheus, Jaeger, and Kiali are available in the Rancher UI.
Your access to the visualizations depend on your role. Grafana and Prometheus are only available for cluster owners. The Kiali and Jaeger UIs are available only to cluster owners by default, but cluster owners can allow project members to access them by editing the Istio settings. When you go to your project and click **Resources > Istio,** you can go to each UI for Kiali, Jaeger, Grafana, and Prometheus by clicking their icons in the top right corner of the page.
To see the visualizations, go to the cluster where Istio is set up and click **Tools > Istio.** You should see links to each UI at the top of the page.
You can also get to the visualization tools from the project view.
## Viewing the Kiali Traffic Graph
1. From the project view in Rancher, click **Resources > Istio.**
1. If you are a cluster owner, you can go to the **Traffic Graph** tab. This tab has the Kiali network visualization integrated into the UI.
## Viewing Traffic Metrics
Istios monitoring features provide visibility into the performance of all your services.
1. From the project view in Rancher, click **Resources > Istio.**
1. Go to the **Traffic Metrics** tab. After traffic is generated in your cluster, you should be able to see metrics for **Success Rate, Request Volume, 4xx Response Count, Project 5xx Response Count** and **Request Duration.** Cluster owners can see all of the metrics, while project members can see a subset of the metrics.
## Architecture
Istio installs a service mesh that uses [Envoy](https://www.envoyproxy.io) sidecar proxies to intercept traffic to each workload. These sidecars intercept and manage service-to-service communication, allowing fine-grained observation and control over traffic within the cluster.
Only workloads that have the Istio sidecar injected can be tracked and controlled by Istio.
Enabling Istio in Rancher enables monitoring in the cluster, and enables Istio in all new namespaces that are created in a cluster. You need to manually enable Istio in preexisting namespaces.
When a namespace has Istio enabled, new workloads deployed in the namespace will automatically have the Istio sidecar. You need to manually enable Istio in preexisting workloads.
For more information on the Istio sidecar, refer to the [Istio docs](https://istio.io/docs/setup/kubernetes/additional-setup/sidecar-injection/).
### Two Ingresses
By default, each Rancher-provisioned cluster has one NGINX ingress controller allowing traffic into the cluster. To allow Istio to receive external traffic, you need to enable the Istio ingress gateway for the cluster. The result is that your cluster will have two ingresses.
![In an Istio-enabled cluster, you can have two ingresses: the default Nginx ingress, and the default Istio controller.](/img/istio-ingress.svg)
@@ -189,5 +189,5 @@ After you set up notifiers, you can manage them. From the **Global** view, open
After creating a notifier, set up alerts to receive notifications of Rancher system events.
- [Cluster owners](../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles) can set up alerts at the [cluster level](../../pages-for-subheaders/cluster-alerts.md).
- [Cluster owners](../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#cluster-roles) can set up alerts at the [cluster level](cluster-alerts/cluster-alerts.md).
- [Project owners](../../how-to-guides/advanced-user-guides/authentication-permissions-and-global-configuration/manage-role-based-access-control-rbac/cluster-and-project-roles.md#project-roles) can set up alerts at the [project level](../../reference-guides/rancher-project-tools/project-alerts.md).
@@ -32,7 +32,7 @@ OPA Gatekeeper is made available via Rancher's Helm system chart, and it is inst
> **Prerequisites:**
>
> - Only administrators and cluster owners can enable OPA Gatekeeper.
> - The dashboard needs to be enabled using the `dashboard` feature flag. For more information, refer to the [section on enabling experimental features.](../../pages-for-subheaders/enable-experimental-features.md)
> - The dashboard needs to be enabled using the `dashboard` feature flag. For more information, refer to the [section on enabling experimental features.](../../getting-started/installation-and-upgrade/advanced-options/enable-experimental-features/enable-experimental-features.md)
1. Navigate to the cluster's **Dashboard** view.
1. On the left side menu, expand the cluster menu and click on **OPA Gatekeeper.**