From ca7333b3003fccab399c93075f79538a0706372a Mon Sep 17 00:00:00 2001 From: Jennifer Travinski Date: Thu, 6 Jan 2022 18:26:48 +0000 Subject: [PATCH 01/10] Revised Monitoring V2 page in 2.5 --- .../how-monitoring-works/_index.md | 193 ++++++++---------- 1 file changed, 90 insertions(+), 103 deletions(-) diff --git a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md index 6445df9e75a..f5b3cc0fe3f 100644 --- a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md @@ -11,52 +11,64 @@ weight: 1 # 1. Architecture Overview -This diagram shows how data flows through the Monitoring V2 application: +_**The following steps describe how data flows through the Monitoring V2 application:**_ -{{% row %}} -{{% column %}} +**1.** **ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. -![How data flows through the monitoring application]({{}}/img/rancher/monitoring-v2-architecture-overview.svg) +- Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). +- In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. +- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. + - Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: `kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. + - For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. + + - We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: -{{% /column %}} -{{% column %}} + 1. The PushProx Client establishes an outbound connection with the PushProx Proxy. + 1. The client then polls the proxy for scrape requests that have come into the proxy. + 1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. + 1. The client scrapes the internal component. + 1. The internal component responds by pushing metrics back to the proxy. + +
+
Process for Exporting Metrics with PushProx:
-1. Rules define what Prometheus metrics or time series database queries should result in alerts being fired. -2. ServiceMonitors and PodMonitors declaratively specify how services and pods should be monitored. They use labels to scrape metrics from pods. -3. Prometheus Operator observes ServiceMonitors, PodMonitors and PrometheusRules being created. -4. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. -5. Recording Rules are not directly used for alerting. They create new time series of precomputed queries. These new time series data can then be queried to generate alerts. -6. Prometheus scrapes all targets in the scrape configuration on a recurring schedule based on the scrape interval, storing the results in its time series database.Depending on the Kubernetes master component and Kubernetes distribution, the metrics from a certain Kubernetes component could be directly exposed to Prometheus, proxied through PushProx, or not available. For details, see Scraping and Exposing Metrics. -7. Prometheus evaluates the alerting rules against the time series database. It fires alerts to Alertmanager whenever an alerting rule evaluates to a positive number. -8. Alertmanager uses routes to group, label and filter the fired alerts to translate them into useful notifications. -9. Alertmanager uses the Receiver configuration to send notifications to Slack, PagerDuty, SMS, or other types of receivers. + ![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) -{{% /column %}} -{{% /row %}} + For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). +**2.** **PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +- **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. +- **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. + +**3.** **Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. + +**4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. + +- Alerts contain labels that come from the PromQL query itself and additional labels and annotations that can be provided as part of specifying the initial PrometheusRule. +- Before receiving any alerts, Alertmanager will use the **routes** and **receivers** specified in its configuration to form a routing tree on which all incoming alerts are evaluated. Each node of the routing tree can specify additional grouping, labeling, and filtering that needs to happen based on the labels attached to the Prometheus alert. A node on the routing tree (usually a leaf node) can also specify that an alert that reaches it needs to be sent out to a configured Receiver, e.g., Slack, PagerDuty, SMS, etc. Note that Alertmanager will send an alert first to **alertingDriver**, then alertingDriver will send or forward alert to the proper destination. +- Routes and receivers are also stored in the Kubernetes API via the Alertmanager Secret. When the Secret is updated, Alertmanager is also updated automatically. Note that routing occurs via labels only (not via annotations, etc.). + +
How data flows through the monitoring application:
# 2. How Prometheus Works -### 2.1. Storing Time Series Data +### Storing Time Series Data After collecting metrics from exporters, Prometheus stores the time series in a local on-disk time series database. Prometheus optionally integrates with remote systems, but `rancher-monitoring` uses local storage for the time series database. -The database can then be queried using PromQL, the query language for Prometheus. Grafana dashboards use PromQL queries to generate data visualizations. +Once stored, users can query this TSDB using PromQL, the query language for Prometheus. -### 2.2. Querying the Time Series Database +PromQL queries can be visualized in one of two ways: -The PromQL query language is the primary tool to query Prometheus for time series data. +1. By supplying the query in Prometheus's Graph UI, which will show a simple graphical view of the data. +1. By creating a Grafana Dashboard that contains the PromQL query and additional formatting directives that label axes, add units, change colors, use alternative visualizations, etc. -In Grafana, you can right-click a CPU utilization and click Inspect. This opens a panel that shows the [raw query results.](https://grafana.com/docs/grafana/latest/panels/inspect-panel/#inspect-raw-query-results)The raw results demonstrate how each dashboard is powered by PromQL queries. +### Defining Rules for Prometheus -### 2.3. Defining Rules for when Alerts Should be Fired - -Rules define the conditions for Prometheus to fire alerts. When PrometheusRule custom resources are created or updated, the Prometheus Operator observes the change and calls the Prometheus API to synchronize the rule configuration with the Alerting Rules and Recording Rules in Prometheus. - -When you define a Rule (which is declared within a RuleGroup in a PrometheusRule resource), the [spec of the Rule itself](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#rule) contains labels that are used by Alertmanager to figure out which Route should receive this Alert. For example, an Alert with the label `team: front-end` will be sent to all Routes that match on that label. +Rules define queries that Prometheus needs to execute on a regular `evaluationInterval` to perform certain actions, such as firing an alert (alerting rules) or precomputing a query based on others existing in its TSDB (recording rules). These rules are encoded in PrometheusRules custom resources. When PrometheusRule custom resources are created or updated, the Prometheus Operator observes the change and calls the Prometheus API to synchronize the set of rules that Prometheus is currently evaluating on a regular interval. A PrometheusRule allows you to define one or more RuleGroups. Each RuleGroup consists of a set of Rule objects that can each represent either an alerting or a recording rule with the following fields: @@ -65,7 +77,9 @@ A PrometheusRule allows you to define one or more RuleGroups. Each RuleGroup con - Labels that should be attached to the alert or record that identify it (e.g. cluster name or severity) - Annotations that encode any additional important pieces of information that need to be displayed on the notification for an alert (e.g. summary, description, message, runbook URL, etc.). This field is not required for recording rules. -### 2.4. Firing Alerts +On evaluating a [rule](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#rule), Prometheus will execute the provided PromQL query, add additional provided labels (or annotations - only for alerting rules), and execute the appropriate action for the rule. For example, an Alerting Rule that adds `team: front-end` as a label to the provided PromQL query will append that label to the fired alert, which will allow Alertmanager to forward the alert to the correct Receiver. + +### Alerting and Recording Rules Prometheus doesn't maintain the state of whether alerts are active. It fires alerts repetitively at every evaluation interval, relying on Alertmanager to group and filter the alerts into meaningful notifications. @@ -90,14 +104,18 @@ The Alertmanager handles alerts sent by client applications such as the Promethe - Silencing and inhibition of alerts - Tracking alerts that fire over time - Sending out the status of whether an alert is currently firing, or if it is resolved + +### Alerts Forwarded by alertingDrivers + +When alertingDrivers are installed, this creates a `Service` that can be used as the receiver's URL for Teams or SMS, based on the alertingDriver's configuration. The URL in the Receiver points to the alertingDrivers; so the Alertmanager sends alert first to alertingDriver, then alertingDriver forwards or sends alert to the proper destination. -### 3.1. Routing Alerts to Receivers +### Routing Alerts to Receivers Alertmanager coordinates where alerts are sent. It allows you to group alerts based on labels and fire them based on whether certain labels are matched. One top-level route accepts all alerts. From there, Alertmanager continues routing alerts to receivers based on whether they match the conditions of the next route. -While the Rancher UI forms only allow editing a routing tree that is two levels deep, you can configure more deeply nested routing structures by editing the Alertmanager custom resource YAML. +While the Rancher UI forms only allow editing a routing tree that is two levels deep, you can configure more deeply nested routing structures by editing the Alertmanager Secret. -### 3.2. Configuring Multiple Receivers +### Configuring Multiple Receivers By editing the forms in the Rancher UI, you can set up a Receiver resource with all the information Alertmanager needs to send alerts to your notification system. @@ -105,124 +123,93 @@ By editing custom YAML in the Alertmanager or Receiver configuration, you can al # 4. Monitoring V2 Specific Components -Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. +Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. Prometheus Operator will automatically update your Prometheus configuration based on the live state of the resources and configuration options that are edited in the Rancher UI. -### 4.1. Resources Deployed by Default +### Resources Deployed by Default By default, a set of resources curated by the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project are deployed onto your cluster as part of installing the Rancher Monitoring Application to set up a basic Monitoring/Alerting stack. The resources that get deployed onto your cluster to support this solution can be found in the [`rancher-monitoring`](https://github.com/rancher/charts/tree/main/charts/rancher-monitoring) Helm chart, which closely tracks the upstream [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart maintained by the Prometheus community with certain changes tracked in the [CHANGELOG.md](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/CHANGELOG.md). -There are also certain special types of ConfigMaps and Secrets such as those corresponding to Grafana Dashboards, Grafana Datasources, and Alertmanager Configs that will automatically update your Prometheus configuration via sidecar proxies that observe the live state of those resources within your cluster. +### Default Exporters -### 4.2. PushProx +Monitoring V2 deploys three default exporters that provide additional metrics for Prometheus to store: -PushProx enhances the security of the monitoring application, allowing it to be installed on hardened Kubernetes clusters. +1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) -To expose Kubernetes metrics, PushProxes use a client proxy model to expose specific ports within default Kubernetes components. Node exporters expose metrics to PushProx through an outbound connection. +1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). -The proxy allows `rancher-monitoring` to scrape metrics from processes on the hostNetwork, such as the `kube-api-server`, without opening up node ports to inbound connections. +1. [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics): expose additional metrics that track the state of resources contained in the Kubernetes API (e.g. pods, workloads, etc.). -PushProx is a DaemonSet that listens for clients that seek to register. Once registered, it proxies scrape requests through the established connection. Then the client executes the request to etcd. +ServiceMonitors and PodMonitors will scrape these exporters, as defined [here](#defining-what-metrics-are-scraped). Prometheus stores these metrics, and you can query the results via either Prometheus's UI or Grafana. -All of the default ServiceMonitors, such as `rancher-monitoring-kube-controller-manager`, are configured to hit the metrics endpoint of the client using this proxy. +See [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. -For more details about how PushProx works, refer to [Scraping Metrics with PushProx.](#5-5-scraping-metrics-with-pushprox) - - -### 4.3. Default Exporters - -`rancher-monitoring` deploys two exporters to expose metrics to prometheus: `node-exporter` and `windows-exporter`. Both are deployed as DaemonSets. - -`node-exporter` exports container, pod and node metrics for CPU and memory from each Linux node. `windows-exporter` does the same, but for Windows nodes. - -For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) - -[kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is also useful because it exports metrics for Kubernetes components. - -### 4.4. Components Exposed in the Rancher UI +### Components Exposed in the Rancher UI When the monitoring application is installed, you will be able to edit the following components in the Rancher UI: | Component | Type of Component | Purpose and Common Use Cases for Editing | |--------------|------------------------|---------------------------| -| ServiceMonitor | Custom resource | Set up targets to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| PodMonitor | Custom resource | Set up targets to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| Receiver | Configuration block (part of Alertmanager) | Set up a notification system to receive alerts. Automatically updates the Alertmanager custom resource. | -| Route | Configuration block (part of Alertmanager) | Add identifying information to make alerts more meaningful and direct them to individual teams. Automatically updates the Alertmanager custom resource. | -| PrometheusRule | Custom resource | For more advanced use cases, you may want to define what Prometheus metrics or time series database queries should result in alerts being fired. Automatically updates the Prometheus custom resource. | -| Alertmanager | Custom resource | Edit this custom resource only if you need more advanced configuration options beyond what the Rancher UI exposes in the Routes and Receivers sections. For example, you might want to edit this resource to add a routing tree with more than two levels. | -| Prometheus | Custom resource | Edit this custom resource only if you need more advanced configuration beyond what can be configured using ServiceMonitors, PodMonitors, or [Rancher monitoring Helm chart options.](../configuration/helm-chart-options) | +| ServiceMonitor | Custom resource | Sets up Kubernetes Services to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | +| PodMonitor | Custom resource | Sets up Kubernetes Pods to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | +| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g. Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g. TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | +| Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | +| PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | + +### How PushProx Works + +PushProx allows Prometheus to scrape metrics across a network boundary, which prevents users from having to expose metrics ports for internal Kubernetes components on each node in a Kubernetes cluster. + +Since the metrics for Kubernetes components are generally exposed on the host network of nodes in the cluster, PushProx deploys a DaemonSet of clients that sit on the hostNetwork of each node and make an outbound connection to a single proxy that is sitting on the Kubernetes API. Prometheus can then be configured to proxy scrape requests through the proxy to each client, which allows it to scrape metrics from the internal Kubernetes components without requiring any inbound node ports to be open. + +For more details about how PushProx works, refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox). # 5. Scraping and Exposing Metrics -### 5.1. Defining what Metrics are Scraped +### Defining what Metrics are Scraped -ServiceMonitors define targets that are intended for Prometheus to scrape. The [Prometheus custom resource tells](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#prometheus) Prometheus which ServiceMonitors it should use to find out where to scrape metrics from. +ServiceMonitors and PodMonitors define targets that are intended for Prometheus to scrape. The [Prometheus custom resource](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#prometheus) tells Prometheus which ServiceMonitors or PodMonitors it should use to find out where to scrape metrics from. -The Prometheus Operator observes the ServiceMonitors. When it observes that ServiceMonitors are created or updated, it calls the Prometheus API to update the scrape configuration in the Prometheus custom resource and keep it in sync with the scrape configuration in the ServiceMonitors. This scrape configuration tells Prometheus which endpoints to scrape metrics from and how it will label the metrics from those endpoints. +The Prometheus Operator observes the ServiceMonitors and PodMonitors. When it observes that they are created or updated, it calls the Prometheus API to update the scrape configuration in the Prometheus custom resource and keep it in sync with the scrape configuration in the ServiceMonitors or PodMonitors. This scrape configuration tells Prometheus which endpoints to scrape metrics from and how it will label the metrics from those endpoints. Prometheus scrapes all of the metrics defined in its scrape configuration at every `scrape_interval`, which is one minute by default. The scrape configuration can be viewed as part of the Prometheus custom resource that is exposed in the Rancher UI. -### 5.2. How the Prometheus Operator Sets up Metrics Scraping +### How the Prometheus Operator Sets up Metrics Scraping The Prometheus Deployment or StatefulSet scrapes metrics, and the configuration of Prometheus is controlled by the Prometheus custom resources. The Prometheus Operator watches for Prometheus and Alertmanager resources, and when they are created, the Prometheus Operator creates a Deployment or StatefulSet for Prometheus or Alertmanager with the user-defined configuration. -
How the Prometheus Operator Sets up Metrics Scraping
+When the Prometheus Operator observes ServiceMonitors, PodMonitors, and PrometheusRules being created, it knows that the scrape configuration needs to be updated in Prometheus. It updates Prometheus by first updating the configuration and rules files in the volumes of Prometheus's Deployment or StatefulSet. Then it calls the Prometheus API to sync the new configuration, resulting in the Prometheus Deployment or StatefulSet to be modified in place. -![How the Prometheus Operator sets up metrics scraping]({{}}/img/rancher/set-up-scraping.svg) - -When the Prometheus Operator observes ServiceMonitors, PodMonitors and PrometheusRules being created, it knows that the scrape configuration needs to be updated in Prometheus. It updates Prometheus by first updating the configuration and rules files in the volumes of Prometheus's Deployment or StatefulSet. Then it calls the Prometheus API to sync the new configuration, resulting in the Prometheus Deployment or StatefulSet to be modified in place. - -![How the Prometheus Operator Updates Scrape Configuration]({{}}/img/rancher/update-scrape-config.svg) - -### 5.3. How Kubernetes Component Metrics are Exposed +### How Kubernetes Component Metrics are Exposed Prometheus scrapes metrics from deployments known as [exporters,](https://prometheus.io/docs/instrumenting/exporters/) which export the time series data in a format that Prometheus can ingest. In Prometheus, time series consist of streams of timestamped values belonging to the same metric and the same set of labeled dimensions. -To allow monitoring to be installed on hardened Kubernetes clusters, `rancher-monitoring` application proxies the communication between Prometheus and the exporter through PushProx for some Kubernetes master components. +### Scraping Metrics with PushProx -### 5.4. Scraping Metrics without PushProx +Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#pushprox) and to the above [architecture](#1-architecture-overview) section. -The Kubernetes components that directly expose metrics to Prometheus are the following: +### Scraping Metrics -- kubelet -- ingress-nginx* +The following Kubernetes components are directly scraped by Prometheus: + +- kubelet* +- ingress-nginx** - coreDns/kubeDns - kube-api-server -\* For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. +\* You can optionally use hardenedKubelet.enabled to use a PushProx, but that is not the default. -### 5.5. Scraping Metrics with PushProx +** For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. -The purpose of this architecture is to allow us to scrape internal Kubernetes components without exposing those ports to inbound requests. As a result, Prometheus can scrape metrics across a network boundary. -The Kubernetes components that expose metrics to Prometheus through PushProx are the following: +### Scraping Metrics Based on Kubernetes Distribution -- kube-controller-manager -- kube-scheduler -- etcd -- kube-proxy - -For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. We deploy exactly one PushProx proxy per exporter. - -The process for exporting metrics is as follows: - -1. The PushProx Client establishes an outbound connection with the PushProx Proxy. -2. The client then polls the proxy for scrape requests that have come into the proxy. -3. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. -4. The client scrapes the internal component. -5. The internal component responds by pushing metrics back to the proxy. - -
Process for Exporting Metrics with PushProx
- -![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) - -Metrics are scraped differently based on the Kubernetes distribution. For help with terminology, see Terminology(#terminology). For details, see the table below: +Metrics are scraped differently based on the Kubernetes distribution. For help with terminology, refer [here](#terminology). For details, see the table below:
How Metrics are Exposed to Prometheus
@@ -239,7 +226,7 @@ Metrics are scraped differently based on the Kubernetes distribution. For help w \* For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. -### 5.6. Terminology +### Terminology - **kube-scheduler:** The internal Kubernetes component that uses information in the pod spec to decide on which node to run a pod. - **kube-controller-manager:** The internal Kubernetes component that is responsible for node management (detecting if a node fails), pod replication and endpoint creation. From 52dd354f7b873bc5f004728ea5997c9214a65df4 Mon Sep 17 00:00:00 2001 From: Jennifer Travinski Date: Thu, 6 Jan 2022 18:27:07 +0000 Subject: [PATCH 02/10] Revised Monitoring V2 page in 2.6 --- .../how-monitoring-works/_index.md | 205 ++++++++---------- 1 file changed, 91 insertions(+), 114 deletions(-) diff --git a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md index 9e7690d7de3..86f60a39ef7 100644 --- a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md @@ -8,56 +8,68 @@ weight: 1 3. [How Alertmanager Works](#3-how-alertmanager-works) 4. [Monitoring V2 Specific Components](#4-monitoring-v2-specific-components) 5. [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics) -6. [Monitoring on RKE2 Clusters](#6-monitoring-on-rke2-clusters) # 1. Architecture Overview -This diagram shows how data flows through the Monitoring V2 application: +_**The following steps describe how data flows through the Monitoring V2 application:**_ -{{% row %}} -{{% column %}} +**1.** **ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. -![How data flows through the monitoring application]({{}}/img/rancher/monitoring-v2-architecture-overview.svg) +- Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). +- In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. +- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. + - Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: + `kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. + - For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. + + - We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: -{{% /column %}} -{{% column %}} + 1. The PushProx Client establishes an outbound connection with the PushProx Proxy. + 1. The client then polls the proxy for scrape requests that have come into the proxy. + 1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. + 1. The client scrapes the internal component. + 1. The internal component responds by pushing metrics back to the proxy. + +
+
Process for Exporting Metrics with PushProx:
-1. Rules define what Prometheus metrics or time series database queries should result in alerts being fired. -2. ServiceMonitors and PodMonitors declaratively specify how services and pods should be monitored. They use labels to scrape metrics from pods. -3. Prometheus Operator observes ServiceMonitors, PodMonitors and PrometheusRules being created. -4. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. -5. Recording Rules are not directly used for alerting. They create new time series of precomputed queries. These new time series data can then be queried to generate alerts. -6. Prometheus scrapes all targets in the scrape configuration on a recurring schedule based on the scrape interval, storing the results in its time series database.Depending on the Kubernetes master component and Kubernetes distribution, the metrics from a certain Kubernetes component could be directly exposed to Prometheus, proxied through PushProx, or not available. For details, see Scraping and Exposing Metrics. -7. Prometheus evaluates the alerting rules against the time series database. It fires alerts to Alertmanager whenever an alerting rule evaluates to a positive number. -8. Alertmanager uses routes to group, label and filter the fired alerts to translate them into useful notifications. -9. Alertmanager uses the Receiver configuration to send notifications to Slack, PagerDuty, SMS, or other types of receivers. + ![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) -{{% /column %}} -{{% /row %}} + For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). +**2.** **PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +- **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. +- **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. + +**3.** **Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. + +**4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. + +- Alerts contain labels that come from the PromQL query itself and additional labels and annotations that can be provided as part of specifying the initial PrometheusRule. +- Before receiving any alerts, Alertmanager will use the **routes** and **receivers** specified in its configuration to form a routing tree on which all incoming alerts are evaluated. Each node of the routing tree can specify additional grouping, labeling, and filtering that needs to happen based on the labels attached to the Prometheus alert. A node on the routing tree (usually a leaf node) can also specify that an alert that reaches it needs to be sent out to a configured Receiver, e.g., Slack, PagerDuty, SMS, etc. Note that Alertmanager will send an alert first to **alertingDriver**, then alertingDriver will send or forward alert to the proper destination. +- Routes and receivers are also stored in the Kubernetes API via the Alertmanager Secret. When the Secret is updated, Alertmanager is also updated automatically. Note that routing occurs via labels only (not via annotations, etc.). + +
How data flows through the monitoring application:
# 2. How Prometheus Works -### 2.1. Storing Time Series Data +### Storing Time Series Data After collecting metrics from exporters, Prometheus stores the time series in a local on-disk time series database. Prometheus optionally integrates with remote systems, but `rancher-monitoring` uses local storage for the time series database. -The database can then be queried using PromQL, the query language for Prometheus. Grafana dashboards use PromQL queries to generate data visualizations. +Once stored, users can query this TSDB using PromQL, the query language for Prometheus. -### 2.2. Querying the Time Series Database +PromQL queries can be visualized in one of two ways: -The PromQL query language is the primary tool to query Prometheus for time series data. +1. By supplying the query in Prometheus's Graph UI, which will show a simple graphical view of the data. +1. By creating a Grafana Dashboard that contains the PromQL query and additional formatting directives that label axes, add units, change colors, use alternative visualizations, etc. -In Grafana, you can right-click a CPU utilization and click Inspect. This opens a panel that shows the [raw query results.](https://grafana.com/docs/grafana/latest/panels/inspect-panel/#inspect-raw-query-results)The raw results demonstrate how each dashboard is powered by PromQL queries. +### Defining Rules for Prometheus -### 2.3. Defining Rules for when Alerts Should be Fired - -Rules define the conditions for Prometheus to fire alerts. When PrometheusRule custom resources are created or updated, the Prometheus Operator observes the change and calls the Prometheus API to synchronize the rule configuration with the Alerting Rules and Recording Rules in Prometheus. - -When you define a Rule (which is declared within a RuleGroup in a PrometheusRule resource), the [spec of the Rule itself](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#rule) contains labels that are used by Alertmanager to figure out which Route should receive this Alert. For example, an Alert with the label `team: front-end` will be sent to all Routes that match on that label. +Rules define queries that Prometheus needs to execute on a regular `evaluationInterval` to perform certain actions, such as firing an alert (alerting rules) or precomputing a query based on others existing in its TSDB (recording rules). These rules are encoded in PrometheusRules custom resources. When PrometheusRule custom resources are created or updated, the Prometheus Operator observes the change and calls the Prometheus API to synchronize the set of rules that Prometheus is currently evaluating on a regular interval. A PrometheusRule allows you to define one or more RuleGroups. Each RuleGroup consists of a set of Rule objects that can each represent either an alerting or a recording rule with the following fields: @@ -66,7 +78,9 @@ A PrometheusRule allows you to define one or more RuleGroups. Each RuleGroup con - Labels that should be attached to the alert or record that identify it (e.g. cluster name or severity) - Annotations that encode any additional important pieces of information that need to be displayed on the notification for an alert (e.g. summary, description, message, runbook URL, etc.). This field is not required for recording rules. -### 2.4. Firing Alerts +On evaluating a [rule](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#rule), Prometheus will execute the provided PromQL query, add additional provided labels (or annotations - only for alerting rules), and execute the appropriate action for the rule. For example, an Alerting Rule that adds `team: front-end` as a label to the provided PromQL query will append that label to the fired alert, which will allow Alertmanager to forward the alert to the correct Receiver. + +### Alerting and Recording Rules Prometheus doesn't maintain the state of whether alerts are active. It fires alerts repetitively at every evaluation interval, relying on Alertmanager to group and filter the alerts into meaningful notifications. @@ -91,14 +105,18 @@ The Alertmanager handles alerts sent by client applications such as the Promethe - Silencing and inhibition of alerts - Tracking alerts that fire over time - Sending out the status of whether an alert is currently firing, or if it is resolved + +### Alerts Forwarded by alertingDrivers + +When alertingDrivers are installed, this creates a `Service` that can be used as the receiver's URL for Teams or SMS, based on the alertingDriver's configuration. The URL in the Receiver points to the alertingDrivers; so the Alertmanager sends alert first to alertingDriver, then alertingDriver forwards or sends alert to the proper destination. -### 3.1. Routing Alerts to Receivers +### Routing Alerts to Receivers Alertmanager coordinates where alerts are sent. It allows you to group alerts based on labels and fire them based on whether certain labels are matched. One top-level route accepts all alerts. From there, Alertmanager continues routing alerts to receivers based on whether they match the conditions of the next route. -While the Rancher UI forms only allow editing a routing tree that is two levels deep, you can configure more deeply nested routing structures by editing the Alertmanager custom resource YAML. +While the Rancher UI forms only allow editing a routing tree that is two levels deep, you can configure more deeply nested routing structures by editing the Alertmanager Secret. -### 3.2. Configuring Multiple Receivers +### Configuring Multiple Receivers By editing the forms in the Rancher UI, you can set up a Receiver resource with all the information Alertmanager needs to send alerts to your notification system. @@ -106,124 +124,93 @@ By editing custom YAML in the Alertmanager or Receiver configuration, you can al # 4. Monitoring V2 Specific Components -Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. +Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. Prometheus Operator will automatically update your Prometheus configuration based on the live state of the resources and configuration options that are edited in the Rancher UI. -### 4.1. Resources Deployed by Default +### Resources Deployed by Default By default, a set of resources curated by the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project are deployed onto your cluster as part of installing the Rancher Monitoring Application to set up a basic Monitoring/Alerting stack. The resources that get deployed onto your cluster to support this solution can be found in the [`rancher-monitoring`](https://github.com/rancher/charts/tree/main/charts/rancher-monitoring) Helm chart, which closely tracks the upstream [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart maintained by the Prometheus community with certain changes tracked in the [CHANGELOG.md](https://github.com/rancher/charts/blob/main/charts/rancher-monitoring/CHANGELOG.md). -There are also certain special types of ConfigMaps and Secrets such as those corresponding to Grafana Dashboards, Grafana Datasources, and Alertmanager Configs that will automatically update your Prometheus configuration via sidecar proxies that observe the live state of those resources within your cluster. +### Default Exporters -### 4.2. PushProx +Monitoring V2 deploys three default exporters that provide additional metrics for Prometheus to store: -PushProx enhances the security of the monitoring application, allowing it to be installed on hardened Kubernetes clusters. +1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) -To expose Kubernetes metrics, PushProxes use a client proxy model to expose specific ports within default Kubernetes components. Node exporters expose metrics to PushProx through an outbound connection. +1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). -The proxy allows `rancher-monitoring` to scrape metrics from processes on the hostNetwork, such as the `kube-api-server`, without opening up node ports to inbound connections. +1. [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics): expose additional metrics that track the state of resources contained in the Kubernetes API (e.g. pods, workloads, etc.). -PushProx is a DaemonSet that listens for clients that seek to register. Once registered, it proxies scrape requests through the established connection. Then the client executes the request to etcd. +ServiceMonitors and PodMonitors will scrape these exporters, as defined [here](#defining-what-metrics-are-scraped). Prometheus stores these metrics, and you can query the results via either Prometheus's UI or Grafana. -All of the default ServiceMonitors, such as `rancher-monitoring-kube-controller-manager`, are configured to hit the metrics endpoint of the client using this proxy. +See [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. -For more details about how PushProx works, refer to [Scraping Metrics with PushProx.](#5-5-scraping-metrics-with-pushprox) - - -### 4.3. Default Exporters - -`rancher-monitoring` deploys two exporters to expose metrics to prometheus: `node-exporter` and `windows-exporter`. Both are deployed as DaemonSets. - -`node-exporter` exports container, pod and node metrics for CPU and memory from each Linux node. `windows-exporter` does the same, but for Windows nodes. - -For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) - -[kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is also useful because it exports metrics for Kubernetes components. - -# 4.4. Components Exposed in the Rancher UI +### Components Exposed in the Rancher UI When the monitoring application is installed, you will be able to edit the following components in the Rancher UI: | Component | Type of Component | Purpose and Common Use Cases for Editing | |--------------|------------------------|---------------------------| -| ServiceMonitor | Custom resource | Set up targets to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| PodMonitor | Custom resource | Set up targets to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| Receiver | Configuration block (part of Alertmanager) | Set up a notification system to receive alerts. Automatically updates the Alertmanager custom resource. | -| Route | Configuration block (part of Alertmanager) | Add identifying information to make alerts more meaningful and direct them to individual teams. Automatically updates the Alertmanager custom resource. | -| PrometheusRule | Custom resource | For more advanced use cases, you may want to define what Prometheus metrics or time series database queries should result in alerts being fired. Automatically updates the Prometheus custom resource. | -| Alertmanager | Custom resource | Edit this custom resource only if you need more advanced configuration options beyond what the Rancher UI exposes in the Routes and Receivers sections. For example, you might want to edit this resource to add a routing tree with more than two levels. | -| Prometheus | Custom resource | Edit this custom resource only if you need more advanced configuration beyond what can be configured using ServiceMonitors, PodMonitors, or [Rancher monitoring Helm chart options.](../configuration/helm-chart-options) | +| ServiceMonitor | Custom resource | Sets up Kubernetes Services to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | +| PodMonitor | Custom resource | Sets up Kubernetes Pods to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | +| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g. Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g. TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | +| Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | +| PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | + +### How PushProx Works + +PushProx allows Prometheus to scrape metrics across a network boundary, which prevents users from having to expose metrics ports for internal Kubernetes components on each node in a Kubernetes cluster. + +Since the metrics for Kubernetes components are generally exposed on the host network of nodes in the cluster, PushProx deploys a DaemonSet of clients that sit on the hostNetwork of each node and make an outbound connection to a single proxy that is sitting on the Kubernetes API. Prometheus can then be configured to proxy scrape requests through the proxy to each client, which allows it to scrape metrics from the internal Kubernetes components without requiring any inbound node ports to be open. + +For more details about how PushProx works, refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox). # 5. Scraping and Exposing Metrics -### 5.1. Defining what Metrics are Scraped +### Defining what Metrics are Scraped -ServiceMonitors define targets that are intended for Prometheus to scrape. The [Prometheus custom resource tells](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#prometheus) Prometheus which ServiceMonitors it should use to find out where to scrape metrics from. +ServiceMonitors and PodMonitors define targets that are intended for Prometheus to scrape. The [Prometheus custom resource](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#prometheus) tells Prometheus which ServiceMonitors or PodMonitors it should use to find out where to scrape metrics from. -The Prometheus Operator observes the ServiceMonitors. When it observes that ServiceMonitors are created or updated, it calls the Prometheus API to update the scrape configuration in the Prometheus custom resource and keep it in sync with the scrape configuration in the ServiceMonitors. This scrape configuration tells Prometheus which endpoints to scrape metrics from and how it will label the metrics from those endpoints. +The Prometheus Operator observes the ServiceMonitors and PodMonitors. When it observes that they are created or updated, it calls the Prometheus API to update the scrape configuration in the Prometheus custom resource and keep it in sync with the scrape configuration in the ServiceMonitors or PodMonitors. This scrape configuration tells Prometheus which endpoints to scrape metrics from and how it will label the metrics from those endpoints. Prometheus scrapes all of the metrics defined in its scrape configuration at every `scrape_interval`, which is one minute by default. The scrape configuration can be viewed as part of the Prometheus custom resource that is exposed in the Rancher UI. -### 5.2. How the Prometheus Operator Sets up Metrics Scraping +### How the Prometheus Operator Sets up Metrics Scraping The Prometheus Deployment or StatefulSet scrapes metrics, and the configuration of Prometheus is controlled by the Prometheus custom resources. The Prometheus Operator watches for Prometheus and Alertmanager resources, and when they are created, the Prometheus Operator creates a Deployment or StatefulSet for Prometheus or Alertmanager with the user-defined configuration. -
How the Prometheus Operator Sets up Metrics Scraping
+When the Prometheus Operator observes ServiceMonitors, PodMonitors, and PrometheusRules being created, it knows that the scrape configuration needs to be updated in Prometheus. It updates Prometheus by first updating the configuration and rules files in the volumes of Prometheus's Deployment or StatefulSet. Then it calls the Prometheus API to sync the new configuration, resulting in the Prometheus Deployment or StatefulSet to be modified in place. -![How the Prometheus Operator sets up metrics scraping]({{}}/img/rancher/set-up-scraping.svg) - -When the Prometheus Operator observes ServiceMonitors, PodMonitors and PrometheusRules being created, it knows that the scrape configuration needs to be updated in Prometheus. It updates Prometheus by first updating the configuration and rules files in the volumes of Prometheus's Deployment or StatefulSet. Then it calls the Prometheus API to sync the new configuration, resulting in the Prometheus Deployment or StatefulSet to be modified in place. - -![How the Prometheus Operator Updates Scrape Configuration]({{}}/img/rancher/update-scrape-config.svg) - -### 5.3. How Kubernetes Component Metrics are Exposed +### How Kubernetes Component Metrics are Exposed Prometheus scrapes metrics from deployments known as [exporters,](https://prometheus.io/docs/instrumenting/exporters/) which export the time series data in a format that Prometheus can ingest. In Prometheus, time series consist of streams of timestamped values belonging to the same metric and the same set of labeled dimensions. -To allow monitoring to be installed on hardened Kubernetes clusters, `rancher-monitoring` application proxies the communication between Prometheus and the exporter through PushProx for some Kubernetes master components. +### Scraping Metrics with PushProx -### 5.4. Scraping Metrics without PushProx +Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#pushprox) and to the above [architecture](#1-architecture-overview) section. -The Kubernetes components that directly expose metrics to Prometheus are the following: +### Scraping Metrics -- kubelet -- ingress-nginx* +The following Kubernetes components are directly scraped by Prometheus: + +- kubelet* +- ingress-nginx** - coreDns/kubeDns - kube-api-server -\* For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. +\* You can optionally use hardenedKubelet.enabled to use a PushProx, but that is not the default. -### 5.5. Scraping Metrics with PushProx +** For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. -The purpose of this architecture is to allow us to scrape internal Kubernetes components without exposing those ports to inbound requests. As a result, Prometheus can scrape metrics across a network boundary. -The Kubernetes components that expose metrics to Prometheus through PushProx are the following: +### Scraping Metrics Based on Kubernetes Distribution -- kube-controller-manager -- kube-scheduler -- etcd -- kube-proxy - -For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. We deploy exactly one PushProx proxy per exporter. - -The process for exporting metrics is as follows: - -1. The PushProx Client establishes an outbound connection with the PushProx Proxy. -2. The client then polls the proxy for scrape requests that have come into the proxy. -3. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. -4. The client scrapes the internal component. -5. The internal component responds by pushing metrics back to the proxy. - -
Process for Exporting Metrics with PushProx
- -![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) - -Metrics are scraped differently based on the Kubernetes distribution. For help with terminology, see Terminology(#terminology). For details, see the table below: +Metrics are scraped differently based on the Kubernetes distribution. For help with terminology, refer [here](#terminology). For details, see the table below:
How Metrics are Exposed to Prometheus
@@ -240,7 +227,7 @@ Metrics are scraped differently based on the Kubernetes distribution. For help w \* For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. -### 5.6. Terminology +### Terminology - **kube-scheduler:** The internal Kubernetes component that uses information in the pod spec to decide on which node to run a pod. - **kube-controller-manager:** The internal Kubernetes component that is responsible for node management (detecting if a node fails), pod replication and endpoint creation. @@ -250,13 +237,3 @@ Metrics are scraped differently based on the Kubernetes distribution. For help w - **ingress-nginx:** An Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer. - **coreDns/kubeDns:** The internal Kubernetes component responsible for DNS. - **kube-api-server:** The main internal Kubernetes component that is responsible for exposing APIs for the other master components. - -# 6. Monitoring on RKE2 Clusters - -Rancher v2.6 introduced the ability to provision new Kubernetes clusters with [RKE2,](https://docs.rke2.io/) which is Rancher's fully conformant Kubernetes distribution that focuses on security and compliance within the U.S. Federal Government sector. To allow Monitoring V2 to be installed on RKE2 Kubernetes clusters, the `rkeIngressNginx` and `rke2IngressNginx` sub-charts were introduced to scrape metrics from the `ingress-nginx` Deployment/DaemonSet in RKE and RKE2 clusters respectively. - -The PushProx pod needs to run on the same nodes as the `ingress-nginx` pod. - -When the RKE2 cluster's Kubernetes version is <= 1.20, the workload type of `ingress-nginx` is a Deployment. The `pushprox-ingress-nginx-client` is deployed as a Deployment, and the Rancher UI sets the Helm chart value `rke2IngressNginx.deployment.enabled=true`. - -For Kubernetes >= 1.21, the workload type of `ingress-nginx` is a DaemonSet. The `pushprox-ingress-nginx-client` is deployed as a DaemonSet, which is the default behavior. \ No newline at end of file From 73557d78e36a28e366f6a10fe36dfa21a3406e7b Mon Sep 17 00:00:00 2001 From: Jennifer Travinski Date: Thu, 6 Jan 2022 22:25:07 +0000 Subject: [PATCH 03/10] Updated per feedback --- .../how-monitoring-works/_index.md | 20 +++++++++---------- .../how-monitoring-works/_index.md | 20 +++++++++---------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md index f5b3cc0fe3f..9c9edb6a4ed 100644 --- a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md @@ -13,7 +13,7 @@ weight: 1 _**The following steps describe how data flows through the Monitoring V2 application:**_ -**1.** **ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. +**1. ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. - Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). - In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. @@ -37,12 +37,12 @@ _**The following steps describe how data flows through the Monitoring V2 applica For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). -**2.** **PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +**2. PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. - **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. - **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. -**3.** **Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. +**3. Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. **4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. @@ -123,7 +123,7 @@ By editing custom YAML in the Alertmanager or Receiver configuration, you can al # 4. Monitoring V2 Specific Components -Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. +Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. Prometheus Operator will automatically update your Prometheus configuration based on the live state of the resources and configuration options that are edited in the Rancher UI. @@ -137,15 +137,15 @@ The resources that get deployed onto your cluster to support this solution can b Monitoring V2 deploys three default exporters that provide additional metrics for Prometheus to store: -1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) +1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation](https://prometheus.io/docs/guides/node-exporter/). -1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). +1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). For more information on `windows-exporter`, refer to the [upstream documentation](https://github.com/prometheus-community/windows_exporter). -1. [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics): expose additional metrics that track the state of resources contained in the Kubernetes API (e.g. pods, workloads, etc.). +1. `kube-state-metrics`: expose additional metrics that track the state of resources contained in the Kubernetes API (e.g., pods, workloads, etc.). For more information on `kube-state-metrics`, refer to the [upstream documentation](https://github.com/kubernetes/kube-state-metrics/tree/master/docs). ServiceMonitors and PodMonitors will scrape these exporters, as defined [here](#defining-what-metrics-are-scraped). Prometheus stores these metrics, and you can query the results via either Prometheus's UI or Grafana. -See [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. +See the [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. ### Components Exposed in the Rancher UI @@ -155,7 +155,7 @@ When the monitoring application is installed, you will be able to edit the follo |--------------|------------------------|---------------------------| | ServiceMonitor | Custom resource | Sets up Kubernetes Services to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | | PodMonitor | Custom resource | Sets up Kubernetes Pods to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g. Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g. TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | +| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g., Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g., TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | | Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | | PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | @@ -202,7 +202,7 @@ The following Kubernetes components are directly scraped by Prometheus: - coreDns/kubeDns - kube-api-server -\* You can optionally use hardenedKubelet.enabled to use a PushProx, but that is not the default. +\* You can optionally use `hardenedKubelet.enabled` to use a PushProx, but that is not the default. ** For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. diff --git a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md index 86f60a39ef7..61c8a9c2c36 100644 --- a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md @@ -13,7 +13,7 @@ weight: 1 _**The following steps describe how data flows through the Monitoring V2 application:**_ -**1.** **ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. +**1. ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. - Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). - In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. @@ -38,12 +38,12 @@ _**The following steps describe how data flows through the Monitoring V2 applica For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). -**2.** **PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +**2. PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. - **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. - **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. -**3.** **Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. +**3. Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. **4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. @@ -124,7 +124,7 @@ By editing custom YAML in the Alertmanager or Receiver configuration, you can al # 4. Monitoring V2 Specific Components -Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. +Prometheus Operator introduces a set of [Custom Resource Definitions](https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions) that allow users to deploy and manage Prometheus and Alertmanager instances by creating and modifying those custom resources on a cluster. Prometheus Operator will automatically update your Prometheus configuration based on the live state of the resources and configuration options that are edited in the Rancher UI. @@ -138,15 +138,15 @@ The resources that get deployed onto your cluster to support this solution can b Monitoring V2 deploys three default exporters that provide additional metrics for Prometheus to store: -1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation.](https://prometheus.io/docs/guides/node-exporter/) +1. `node-exporter`: exposes hardware and OS metrics for Linux hosts. For more information on `node-exporter`, refer to the [upstream documentation](https://prometheus.io/docs/guides/node-exporter/). -1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). +1. `windows-exporter`: exposes hardware and OS metrics for Windows hosts (only deployed on Windows clusters). For more information on `windows-exporter`, refer to the [upstream documentation](https://github.com/prometheus-community/windows_exporter). -1. [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics): expose additional metrics that track the state of resources contained in the Kubernetes API (e.g. pods, workloads, etc.). +1. `kube-state-metrics`: expose additional metrics that track the state of resources contained in the Kubernetes API (e.g., pods, workloads, etc.). For more information on `kube-state-metrics`, refer to the [upstream documentation](https://github.com/kubernetes/kube-state-metrics/tree/master/docs). ServiceMonitors and PodMonitors will scrape these exporters, as defined [here](#defining-what-metrics-are-scraped). Prometheus stores these metrics, and you can query the results via either Prometheus's UI or Grafana. -See [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. +See the [architecture](#1-architecture-overview) section for more information on recording rules, alerting rules, and Alertmanager. ### Components Exposed in the Rancher UI @@ -156,7 +156,7 @@ When the monitoring application is installed, you will be able to edit the follo |--------------|------------------------|---------------------------| | ServiceMonitor | Custom resource | Sets up Kubernetes Services to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | | PodMonitor | Custom resource | Sets up Kubernetes Pods to scrape custom metrics from. Automatically updates the scrape configuration in the Prometheus custom resource. | -| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g. Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g. TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | +| Receiver | Configuration block (part of Alertmanager) | Modifies information on where to send an alert (e.g., Slack, PagerDuty, etc.) and any necessary information to send the alert (e.g., TLS certs, proxy URLs, etc.). Automatically updates the Alertmanager custom resource. | | Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | | PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | @@ -203,7 +203,7 @@ The following Kubernetes components are directly scraped by Prometheus: - coreDns/kubeDns - kube-api-server -\* You can optionally use hardenedKubelet.enabled to use a PushProx, but that is not the default. +\* You can optionally use `hardenedKubelet.enabled` to use a PushProx, but that is not the default. ** For RKE and RKE2 clusters, ingress-nginx is deployed by default and treated as an internal Kubernetes component. From dca72c4775d22aaf7be472815f206956d2c06eb3 Mon Sep 17 00:00:00 2001 From: Jennifer Travinski Date: Fri, 7 Jan 2022 22:42:41 +0000 Subject: [PATCH 04/10] Updated sections based on feedback --- .../how-monitoring-works/_index.md | 60 +++++++++++------- .../how-monitoring-works/_index.md | 61 ++++++++++++------- 2 files changed, 78 insertions(+), 43 deletions(-) diff --git a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md index 9c9edb6a4ed..30b19cdbc66 100644 --- a/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.5/en/monitoring-alerting/how-monitoring-works/_index.md @@ -11,43 +11,57 @@ weight: 1 # 1. Architecture Overview -_**The following steps describe how data flows through the Monitoring V2 application:**_ +_**The following sections describe how data flows through the Monitoring V2 application:**_ -**1. ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. +### Prometheus Operator + +Prometheus Operator observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. As the diagram at the end of this section shows, the Prometheus Operator acts as the intermediary between Prometheus and Kubernetes, calling the Prometheus API to synchronize Prometheus with the monitoring-related resources in Kubernetes. + +### ServiceMonitors and PodMonitors + +ServiceMonitors and PodMonitors declaratively specify targets, such as Services and Pods, that need to be monitored. - Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). + - In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. -- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. - - Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: `kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. - - For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. + +- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). + +### How PushProx Works + +- Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: +`kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. + +- For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. - - We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: +- We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: - 1. The PushProx Client establishes an outbound connection with the PushProx Proxy. - 1. The client then polls the proxy for scrape requests that have come into the proxy. - 1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. - 1. The client scrapes the internal component. - 1. The internal component responds by pushing metrics back to the proxy. +1. The PushProx Client establishes an outbound connection with the PushProx Proxy. +1. The client then polls the proxy for scrape requests that have come into the proxy. +1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. +1. The client scrapes the internal component. +1. The internal component responds by pushing metrics back to the proxy. -
-
Process for Exporting Metrics with PushProx:
+

Process for Exporting Metrics with PushProx:
- ![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) +![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) - For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). +### PrometheusRules -**2. PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +PrometheusRules allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. - **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. - **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. -**3. Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. +### Alert Routing -**4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. +Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. - Alerts contain labels that come from the PromQL query itself and additional labels and annotations that can be provided as part of specifying the initial PrometheusRule. + - Before receiving any alerts, Alertmanager will use the **routes** and **receivers** specified in its configuration to form a routing tree on which all incoming alerts are evaluated. Each node of the routing tree can specify additional grouping, labeling, and filtering that needs to happen based on the labels attached to the Prometheus alert. A node on the routing tree (usually a leaf node) can also specify that an alert that reaches it needs to be sent out to a configured Receiver, e.g., Slack, PagerDuty, SMS, etc. Note that Alertmanager will send an alert first to **alertingDriver**, then alertingDriver will send or forward alert to the proper destination. + - Routes and receivers are also stored in the Kubernetes API via the Alertmanager Secret. When the Secret is updated, Alertmanager is also updated automatically. Note that routing occurs via labels only (not via annotations, etc.).
How data flows through the monitoring application:
@@ -94,6 +108,7 @@ Alerting rules are more commonly used. Whenever an alerting rule evaluates to a The Rule file adds labels and annotations to alerts before firing them, depending on the use case: - Labels indicate information that identifies the alert and could affect the routing of the alert. For example, if when sending an alert about a certain container, the container ID could be used as a label. + - Annotations denote information that doesn't affect where an alert is routed, for example, a runbook or an error message. # 3. How Alertmanager Works @@ -101,8 +116,11 @@ The Rule file adds labels and annotations to alerts before firing them, dependin The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of the following tasks: - Deduplicating, grouping, and routing alerts to the correct receiver integration such as email, PagerDuty, or OpsGenie + - Silencing and inhibition of alerts + - Tracking alerts that fire over time + - Sending out the status of whether an alert is currently firing, or if it is resolved ### Alerts Forwarded by alertingDrivers @@ -159,13 +177,13 @@ When the monitoring application is installed, you will be able to edit the follo | Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | | PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | -### How PushProx Works +### PushProx PushProx allows Prometheus to scrape metrics across a network boundary, which prevents users from having to expose metrics ports for internal Kubernetes components on each node in a Kubernetes cluster. Since the metrics for Kubernetes components are generally exposed on the host network of nodes in the cluster, PushProx deploys a DaemonSet of clients that sit on the hostNetwork of each node and make an outbound connection to a single proxy that is sitting on the Kubernetes API. Prometheus can then be configured to proxy scrape requests through the proxy to each client, which allows it to scrape metrics from the internal Kubernetes components without requiring any inbound node ports to be open. -For more details about how PushProx works, refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox). +Refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox) for more. # 5. Scraping and Exposing Metrics @@ -191,7 +209,7 @@ Prometheus scrapes metrics from deployments known as [exporters,](https://promet ### Scraping Metrics with PushProx -Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#pushprox) and to the above [architecture](#1-architecture-overview) section. +Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#how-pushprox-works) and to the above [architecture](#1-architecture-overview) section. ### Scraping Metrics diff --git a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md index 61c8a9c2c36..30b19cdbc66 100644 --- a/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md +++ b/content/rancher/v2.6/en/monitoring-alerting/how-monitoring-works/_index.md @@ -11,44 +11,57 @@ weight: 1 # 1. Architecture Overview -_**The following steps describe how data flows through the Monitoring V2 application:**_ +_**The following sections describe how data flows through the Monitoring V2 application:**_ -**1. ServiceMonitors and PodMonitors** declaratively specify targets, such as Services and Pods, that need to be monitored. +### Prometheus Operator + +Prometheus Operator observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. As the diagram at the end of this section shows, the Prometheus Operator acts as the intermediary between Prometheus and Kubernetes, calling the Prometheus API to synchronize Prometheus with the monitoring-related resources in Kubernetes. + +### ServiceMonitors and PodMonitors + +ServiceMonitors and PodMonitors declaratively specify targets, such as Services and Pods, that need to be monitored. - Targets are scraped on a recurring schedule based on the configured Prometheus scrape interval, and the metrics that are scraped are stored into the Prometheus Time Series Database (TSDB). + - In order to perform the scrape, ServiceMonitors and PodMonitors are defined with label selectors that determine which Services or Pods should be scraped and endpoints that determine how the scrape should happen on the given target, e.g., scrape/metrics in TCP 10252, proxying through IP addr x.x.x.x. -- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. - - Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: - `kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. - - For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. + +- Out of the box, Monitoring V2 comes with certain pre-configured exporters that are deployed based on the type of Kubernetes cluster that it is deployed on. For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). + +### How PushProx Works + +- Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called **PushProx**. The Kubernetes components that expose metrics to Prometheus through PushProx are the following: +`kube-controller-manager`, `kube-scheduler`, `etcd`, and `kube-proxy`. + +- For each PushProx exporter, we deploy one PushProx client onto all target nodes. For example, a PushProx client is deployed onto all controlplane nodes for kube-controller-manager, all etcd nodes for kube-etcd, and all nodes for kubelet. - - We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: +- We deploy exactly one PushProx proxy per exporter. The process for exporting metrics is as follows: - 1. The PushProx Client establishes an outbound connection with the PushProx Proxy. - 1. The client then polls the proxy for scrape requests that have come into the proxy. - 1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. - 1. The client scrapes the internal component. - 1. The internal component responds by pushing metrics back to the proxy. +1. The PushProx Client establishes an outbound connection with the PushProx Proxy. +1. The client then polls the proxy for scrape requests that have come into the proxy. +1. When the proxy receives a scrape request from Prometheus, the client sees it as a result of the poll. +1. The client scrapes the internal component. +1. The internal component responds by pushing metrics back to the proxy. -
-
Process for Exporting Metrics with PushProx:
+

Process for Exporting Metrics with PushProx:
- ![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) +![Process for Exporting Metrics with PushProx]({{}}/img/rancher/pushprox-process.svg) - For more information, see [Scraping and Exposing Metrics](#5-scraping-and-exposing-metrics). +### PrometheusRules -**2. PrometheusRules** allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. +PrometheusRules allow users to define rules for what metrics or time series database queries should result in alerts being fired. Rules are evaluated on an interval. - **Recording rules** create a new time series based on existing series that have been collected. They are frequently used to precompute complex queries. - **Alerting rules** run a particular query and fire an alert from Prometheus if the query evaluates to a non-zero value. -**3. Prometheus Operator** observes ServiceMonitors, PodMonitors, and PrometheusRules being created. When the Prometheus configuration resources are created, Prometheus Operator calls the Prometheus API to sync the new configuration. +### Alert Routing -**4.** Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. +Once Prometheus determines that an alert needs to be fired, alerts are forwarded to **Alertmanager**. - Alerts contain labels that come from the PromQL query itself and additional labels and annotations that can be provided as part of specifying the initial PrometheusRule. + - Before receiving any alerts, Alertmanager will use the **routes** and **receivers** specified in its configuration to form a routing tree on which all incoming alerts are evaluated. Each node of the routing tree can specify additional grouping, labeling, and filtering that needs to happen based on the labels attached to the Prometheus alert. A node on the routing tree (usually a leaf node) can also specify that an alert that reaches it needs to be sent out to a configured Receiver, e.g., Slack, PagerDuty, SMS, etc. Note that Alertmanager will send an alert first to **alertingDriver**, then alertingDriver will send or forward alert to the proper destination. + - Routes and receivers are also stored in the Kubernetes API via the Alertmanager Secret. When the Secret is updated, Alertmanager is also updated automatically. Note that routing occurs via labels only (not via annotations, etc.).
How data flows through the monitoring application:
@@ -95,6 +108,7 @@ Alerting rules are more commonly used. Whenever an alerting rule evaluates to a The Rule file adds labels and annotations to alerts before firing them, depending on the use case: - Labels indicate information that identifies the alert and could affect the routing of the alert. For example, if when sending an alert about a certain container, the container ID could be used as a label. + - Annotations denote information that doesn't affect where an alert is routed, for example, a runbook or an error message. # 3. How Alertmanager Works @@ -102,8 +116,11 @@ The Rule file adds labels and annotations to alerts before firing them, dependin The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of the following tasks: - Deduplicating, grouping, and routing alerts to the correct receiver integration such as email, PagerDuty, or OpsGenie + - Silencing and inhibition of alerts + - Tracking alerts that fire over time + - Sending out the status of whether an alert is currently firing, or if it is resolved ### Alerts Forwarded by alertingDrivers @@ -160,13 +177,13 @@ When the monitoring application is installed, you will be able to edit the follo | Route | Configuration block (part of Alertmanager) | Modifies the routing tree that is used to filter, label, and group alerts based on labels and send them to the appropriate Receiver. Automatically updates the Alertmanager custom resource. | | PrometheusRule | Custom resource | Defines additional queries that need to trigger alerts or define materialized views of existing series that are within Prometheus's TSDB. Automatically updates the Prometheus custom resource. | -### How PushProx Works +### PushProx PushProx allows Prometheus to scrape metrics across a network boundary, which prevents users from having to expose metrics ports for internal Kubernetes components on each node in a Kubernetes cluster. Since the metrics for Kubernetes components are generally exposed on the host network of nodes in the cluster, PushProx deploys a DaemonSet of clients that sit on the hostNetwork of each node and make an outbound connection to a single proxy that is sitting on the Kubernetes API. Prometheus can then be configured to proxy scrape requests through the proxy to each client, which allows it to scrape metrics from the internal Kubernetes components without requiring any inbound node ports to be open. -For more details about how PushProx works, refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox). +Refer to [Scraping Metrics with PushProx](#scraping-metrics-with-pushprox) for more. # 5. Scraping and Exposing Metrics @@ -192,7 +209,7 @@ Prometheus scrapes metrics from deployments known as [exporters,](https://promet ### Scraping Metrics with PushProx -Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#pushprox) and to the above [architecture](#1-architecture-overview) section. +Certain internal Kubernetes components are scraped via a proxy deployed as part of Monitoring V2 called PushProx. For detailed information on PushProx, refer [here](#how-pushprox-works) and to the above [architecture](#1-architecture-overview) section. ### Scraping Metrics From c273c5b47df4cf57b757a722964be894c558fe8f Mon Sep 17 00:00:00 2001 From: Jamie Phillips Date: Mon, 11 Apr 2022 16:34:36 -0400 Subject: [PATCH 05/10] Creating docs for creating VMware vSphere VM templates. --- .gitignore | 2 + .../rke-clusters/node-pools/vsphere/_index.md | 3 +- .../vsphere/creating-a-vm-template/_index.md | 141 ++++++++++++++++++ 3 files changed, 145 insertions(+), 1 deletion(-) create mode 100644 content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md diff --git a/.gitignore b/.gitignore index daefacf85e2..e50d424efcf 100644 --- a/.gitignore +++ b/.gitignore @@ -11,3 +11,5 @@ package-lock.json /scripts/converters/results_to_markdown/.terraform /scripts/converters/results_to_markdown/terraform.tfstate* /scripts/converters/results_to_markdown/*.tfvars + +.idea/ diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/_index.md index 13c897c89b3..e11d83949fb 100644 --- a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/_index.md +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/_index.md @@ -36,6 +36,7 @@ For the fields to be populated, your setup needs to fulfill the [prerequisites.] ### More Supported Operating Systems You can provision VMs with any operating system that supports `cloud-init`. Only YAML format is supported for the [cloud config.](https://cloudinit.readthedocs.io/en/latest/topics/examples.html) + ### Video Walkthrough of v2.3.3 Node Template Features In this YouTube video, we demonstrate how to set up a node template with the new features designed to help you bring cloud operations to on-premises clusters. @@ -54,4 +55,4 @@ For an example of how to provision storage in vSphere using Rancher, refer to [t When a cloud provider is set up in Rancher, the Rancher server can automatically provision new infrastructure for the cluster, including new nodes or persistent storage devices. -For details, refer to the section on [enabling the vSphere cloud provider.]({{}}/rancher/v2.6/en/cluster-provisioning/rke-clusters/cloud-providers/vsphere) \ No newline at end of file +For details, refer to the section on [enabling the vSphere cloud provider.]({{}}/rancher/v2.6/en/cluster-provisioning/rke-clusters/cloud-providers/vsphere) diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md new file mode 100644 index 00000000000..ef49277a333 --- /dev/null +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md @@ -0,0 +1,141 @@ +--- +title: Creating a vSphere Virtual Machine Template +weight: 2 +--- + +Creating virtual machines in a repeatable and reliable fashion can often be difficult. VMware vSphere offers the ability to build one VM that can then be converted to a template. The template can then be used to create identically configured VMs. Rancher leverages this capability within node pools to create identical RKE1 and RKE2 nodes. With that said, Rancher does have some specific requirements for the VM to have pre-installed to leverage the template to create new VMs. After configuring the VM with the requirements, the VM will need to be prepared before creating the template. Once preparation is complete, the VM can be converted to a template and moved into a content library, ready for Rancher node pool usage. + +- [Requirements](#requirements) +- [Template Creation](#template-creation) +- [Preparation](#preparation) +- [Converting to a Template](#converting-to-a-template) +- [Moving to a content library](#moving-to-a-content-library) +- [Other Resources](#other-resources) + +# Requirements + +There are specific tooling required for both Linux and Windows VMs to be usable by the vSphere node driver. The most critical dependency is [cloud-init](https://cloud-init.io/) for Linux and [cloudbase-init](https://cloudbase.it/cloudbase-init/) for Windows. Both of these are used for provisioning the VMs by configuring the hostname, settings up the SSH access, and default Rancher user. Users can add additional content to these as desired if additional configuration is desired. Outside of that some additional requirements will be listed below. If you have any specific firewall rules or configuration, this should be added to the VM before creating a template. + +## Linux Dependencies + +Here is the list of packages that need installed on the template. These will have slightly varied names based on distribution with some distributions shipping these by default. + +* curl +* wget +* git +* net-tools +* unzip +* apparmor-parser +* ca-certificates +* cloud-init +* cloud-guest-utils +* cloud-image-utils +* growpart +* cloud-initramfs-growroot +* open-iscsi +* openssh-server +* [open-vm-tools](https://docs.vmware.com/en/VMware-Tools/11.3.0/com.vmware.vsphere.vmwaretools.doc/GUID-8B6EA5B7-453B-48AA-92E5-DB7F061341D1.html) + +## Windows Dependencies + +Here is the list of packages that need installed on the template. + +* Windows Container Feature +* [cloudbase-init](https://cloudbase.it/cloudbase-init/#download) +* [Docker EE](https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=Windows-Server#install-docker) - RKE1 Only + +**Now here is where the configuration for Windows templates need to differ between RKE1 and RKE2. RKE1 leverages Docker, so any templates for RKE1 will need to have Docker EE preinstalled too. RKE2 doesn't require Docker EE and doesn't require that it be installed.** + +# Template Creation + +There a few different approaches that can be pursued at this step. You can manually create your VM by following [these instructions](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AE8AFBF1-75D1-4172-988C-378C35C9FAF2.html) from VMware. Once you have a VM running, you can manually install the dependency listed above to configure the VM correctly for the vSphere node driver. After the required dependencies are configured, you can further customize based on your specific environment and requirements. Finally, you are ready to precede with the final preparation before creating your template. + +## Alternatives to manual creation + +Alternatives to manual creation do exist and below is a list of tools that can assist. + +* [VMware PowerCLI](https://developer.vmware.com/powercli) +* [Packer](https://www.packer.io/) +* [SaltStack](https://saltproject.io/) +* [Ansible](https://www.ansible.com/) + +Packer is often used and here is a good [reference](https://github.com/vmware-samples/packer-examples-for-vsphere) for usage with vSphere. + +# Preparation + +Once you have a VM created with all the dependencies listed above and any additional items that are required, the most critical step is next. That step is preparing the VM to be turned into a template. This basically resets the VM hostname, IPs, etc. to prevent that information from being brought into a new VM. When VMs are created from a template without this step, those VMs could have the same hostname, IP address, etc. The steps differ between Linux and Windows. + +## Linux Preparation + +Here is how to achieve the different items that need reset. + +```Bash +# Cleaning logs. +if [ -f /var/log/audit/audit.log ]; then + cat /dev/null > /var/log/audit/audit.log +fi +if [ -f /var/log/wtmp ]; then + cat /dev/null > /var/log/wtmp +fi +if [ -f /var/log/lastlog ]; then + cat /dev/null > /var/log/lastlog +fi + +# Cleaning udev rules. +if [ -f /etc/udev/rules.d/70-persistent-net.rules ]; then + rm /etc/udev/rules.d/70-persistent-net.rules +fi + +# Cleaning the /tmp directories +rm -rf /tmp/* +rm -rf /var/tmp/* + +# Cleaning the SSH host keys +rm -f /etc/ssh/ssh_host_* + +# Cleaning the machine-id +truncate -s 0 /etc/machine-id +rm /var/lib/dbus/machine-id +ln -s /etc/machine-id /var/lib/dbus/machine-id + +# Cleaning the shell history +unset HISTFILE +history -cw +echo > ~/.bash_history +rm -fr /root/.bash_history + +# Truncating hostname, hosts, resolv.conf and setting hostname to localhost +truncate -s 0 /etc/{hostname,hosts,resolv.conf} +hostnamectl set-hostname localhost + +# Clean cloud-init +cloud-init clean -s -l +``` + +## Windows Preparation + +Windows has a utility called [sysprep](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/sysprep--generalize--a-windows-installation) that is used to generalize an image and reset the same items listed above for Linux. The command would look like this. + +```PowerShell +sysprep.exe /generalize /shutdown /oobe +``` + +# Converting to a Template + +To convert a VM to a template the first step is to shut down and stop the VM. Once it has been stopped, right-click on the VM in the inventory list and select Template. Then click on `Convert to Template`. Once that process has finished, there is now a template that can be used. + +* [VMware guide](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-5B3737CC-28DB-4334-BD18-6E12011CDC9F.html) + +# Moving to a content library + +Rancher has the ability to consume templates provided by a content library. Content libraries store and manage content within vSphere. Content libraries offer the ability to publish and share that content. + +* [Create a content library](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-2A0F1C13-7336-45CE-B211-610D39A6E1F4.html) +* [Clone the template to the content library](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AC1545F0-F8BA-4CD2-96EB-21B3DFAA1DC1.html) + +# Other Resources + +Here is a list of additional resources that may be useful. + +* [Tutorial for creating a Linux template](https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/manage/hybrid/server/best-practices/vmware-ubuntu-template) +* [Tutorial for creating a Windows template](https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/manage/hybrid/server/best-practices/vmware-windows-template) From f8ac666f73fdce9e3f05ce4167dadc8c0400feeb Mon Sep 17 00:00:00 2001 From: Jamie Phillips Date: Fri, 22 Apr 2022 08:45:02 -0400 Subject: [PATCH 06/10] Implementing requested changes. --- .../vsphere/creating-a-vm-template/_index.md | 24 ++++++++++++------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md index ef49277a333..00fdaea7956 100644 --- a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md @@ -1,9 +1,11 @@ --- title: Creating a vSphere Virtual Machine Template -weight: 2 +weight: 4 --- -Creating virtual machines in a repeatable and reliable fashion can often be difficult. VMware vSphere offers the ability to build one VM that can then be converted to a template. The template can then be used to create identically configured VMs. Rancher leverages this capability within node pools to create identical RKE1 and RKE2 nodes. With that said, Rancher does have some specific requirements for the VM to have pre-installed to leverage the template to create new VMs. After configuring the VM with the requirements, the VM will need to be prepared before creating the template. Once preparation is complete, the VM can be converted to a template and moved into a content library, ready for Rancher node pool usage. +Creating virtual machines in a repeatable and reliable fashion can often be difficult. VMware vSphere offers the ability to build one VM that can then be converted to a template. The template can then be used to create identically configured VMs. Rancher leverages this capability within node pools to create identical RKE1 and RKE2 nodes. + +In order to leverage the template to create new VMs, Rancher has some [specific requirements](#requirements) that the VM must have pre-installed. After you configure the VM with these requirements, you will next need to [prepare the VM](#preparing-your-vm) before [creating the template](#creating-a-template). Finally, once preparation is complete, the VM can be [converted to a template](#converting-to-a-template) and [moved into a content library](#moving-to-a-content-library), ready for Rancher node pool usage. - [Requirements](#requirements) - [Template Creation](#template-creation) @@ -14,11 +16,13 @@ Creating virtual machines in a repeatable and reliable fashion can often be diff # Requirements -There are specific tooling required for both Linux and Windows VMs to be usable by the vSphere node driver. The most critical dependency is [cloud-init](https://cloud-init.io/) for Linux and [cloudbase-init](https://cloudbase.it/cloudbase-init/) for Windows. Both of these are used for provisioning the VMs by configuring the hostname, settings up the SSH access, and default Rancher user. Users can add additional content to these as desired if additional configuration is desired. Outside of that some additional requirements will be listed below. If you have any specific firewall rules or configuration, this should be added to the VM before creating a template. +There is specific tooling required for both Linux and Windows VMs to be usable by the vSphere node driver. The most critical dependency is [cloud-init](https://cloud-init.io/) for Linux and [cloudbase-init](https://cloudbase.it/cloudbase-init/) for Windows. Both of these are used for provisioning the VMs by configuring the hostname and by setting up the SSH access and the default Rancher user. Users can add additional content to these as desired if other configuration is needed. In addition, other requirements are listed below for reference. + +**Note:** If you have any specific firewall rules or configuration, you will need to add this to the VM before creating a template. ## Linux Dependencies -Here is the list of packages that need installed on the template. These will have slightly varied names based on distribution with some distributions shipping these by default. +The packages that need to be installed on the template are listed below. These will have slightly different names based on distribution; some distributions ship these by default, for example. * curl * wget @@ -38,7 +42,7 @@ Here is the list of packages that need installed on the template. These will hav ## Windows Dependencies -Here is the list of packages that need installed on the template. +The list of packages that need to be installed on the template is as follows: * Windows Container Feature * [cloudbase-init](https://cloudbase.it/cloudbase-init/#download) @@ -124,18 +128,20 @@ sysprep.exe /generalize /shutdown /oobe To convert a VM to a template the first step is to shut down and stop the VM. Once it has been stopped, right-click on the VM in the inventory list and select Template. Then click on `Convert to Template`. Once that process has finished, there is now a template that can be used. -* [VMware guide](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-5B3737CC-28DB-4334-BD18-6E12011CDC9F.html) +For additional information on converting a VM to a template, see the [VMware guide](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-5B3737CC-28DB-4334-BD18-6E12011CDC9F.html). -# Moving to a content library +# Moving to a Content library -Rancher has the ability to consume templates provided by a content library. Content libraries store and manage content within vSphere. Content libraries offer the ability to publish and share that content. +Rancher has the ability to use templates provided by a content library. Content libraries store and manage content within vSphere, and they also offer the ability to publish and share that content. + +Below are some helpful links on content libraries: * [Create a content library](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-2A0F1C13-7336-45CE-B211-610D39A6E1F4.html) * [Clone the template to the content library](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AC1545F0-F8BA-4CD2-96EB-21B3DFAA1DC1.html) # Other Resources -Here is a list of additional resources that may be useful. +Here is a list of additional resources that may be useful: * [Tutorial for creating a Linux template](https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/manage/hybrid/server/best-practices/vmware-ubuntu-template) * [Tutorial for creating a Windows template](https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/manage/hybrid/server/best-practices/vmware-windows-template) From 46c149930d71d0cd279516b828e02959f1912b95 Mon Sep 17 00:00:00 2001 From: Jamie Phillips Date: Mon, 25 Apr 2022 10:36:55 -0400 Subject: [PATCH 07/10] Updating vSphere permissions to add tagging and custom attributes. This came up from this issue: https://github.com/rancher/rancher/issues/37440. It seems we have some capabilities in the UI that require these additional permissions. Our UI doesn't indicate they aren't configured, it just greys out the fields. --- .../node-pools/vsphere/creating-credentials/_index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-credentials/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-credentials/_index.md index 6f83c84c66c..55ac548274e 100644 --- a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-credentials/_index.md +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-credentials/_index.md @@ -10,9 +10,11 @@ The following table lists the permissions required for the vSphere user account: | Privilege Group | Operations | |:----------------------|:-----------------------------------------------------------------------| | Datastore | AllocateSpace
Browse
FileManagement (Low level file operations)
UpdateVirtualMachineFiles
UpdateVirtualMachineMetadata | +| Global | Set custom attribute | | Network | Assign | | Resource | AssignVMToPool | | Virtual Machine | Config (All)
GuestOperations (All)
Interact (All)
Inventory (All)
Provisioning (All) | +| vSphere Tagging | Assign or Unassign vSphere Tag
Assign or Unassign vSphere Tag on Object | The following steps create a role with the required privileges and then assign it to a new user in the vSphere console: From 1f25844a8a17ddeeba3b68330b5a5c199566fbcb Mon Sep 17 00:00:00 2001 From: Manuel Buil Date: Fri, 22 Apr 2022 14:21:04 +0200 Subject: [PATCH 08/10] Update ipv6-only docs Signed-off-by: Manuel Buil --- .../k3s/latest/en/installation/network-options/_index.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/content/k3s/latest/en/installation/network-options/_index.md b/content/k3s/latest/en/installation/network-options/_index.md index c667a7cef73..652f22f6adc 100644 --- a/content/k3s/latest/en/installation/network-options/_index.md +++ b/content/k3s/latest/en/installation/network-options/_index.md @@ -89,10 +89,8 @@ If you are using a custom cni plugin, i.e. a cni plugin different from flannel, ### IPv6 only installation -IPv6 only setup is supported on k3s v1.22 or above. As in dual-stack operation, IPv6 node addresses cannot be auto-detected; all nodes must have an explicitly configured IPv6 `node-ip`. This is an example of a valid configuration: +IPv6 only setup is supported on k3s v1.22 or above. Note that network policy enforcement is not supported on IPv6-only clusters when using the default flannel CNI. This is an example of a valid configuration: ``` -k3s server --node-ip 2a05:d012:c6f:4611:5c2:5602:eed2:898c --cluster-cidr 2001:cafe:42:0::/56 --service-cidr 2001:cafe:42:1::/112 +k3s server --disable-network-policy ``` - -Note that you can specify only one IPv6 `cluster-cidr` value. From d04785ee3067f41b8176c1737186b53ed75b66a0 Mon Sep 17 00:00:00 2001 From: Jamie Phillips Date: Tue, 26 Apr 2022 14:10:23 -0400 Subject: [PATCH 09/10] Additional changes. --- .../vsphere/creating-a-vm-template/_index.md | 39 ++++++++++++------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md index 00fdaea7956..d3da38f1d2d 100644 --- a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md @@ -8,8 +8,8 @@ Creating virtual machines in a repeatable and reliable fashion can often be diff In order to leverage the template to create new VMs, Rancher has some [specific requirements](#requirements) that the VM must have pre-installed. After you configure the VM with these requirements, you will next need to [prepare the VM](#preparing-your-vm) before [creating the template](#creating-a-template). Finally, once preparation is complete, the VM can be [converted to a template](#converting-to-a-template) and [moved into a content library](#moving-to-a-content-library), ready for Rancher node pool usage. - [Requirements](#requirements) -- [Template Creation](#template-creation) -- [Preparation](#preparation) +- [Creating a Template](#creating-a-template) +- [Preparing Your VM](#preparing-your-vm) - [Converting to a Template](#converting-to-a-template) - [Moving to a content library](#moving-to-a-content-library) - [Other Resources](#other-resources) @@ -48,30 +48,39 @@ The list of packages that need to be installed on the template is as follows: * [cloudbase-init](https://cloudbase.it/cloudbase-init/#download) * [Docker EE](https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=Windows-Server#install-docker) - RKE1 Only -**Now here is where the configuration for Windows templates need to differ between RKE1 and RKE2. RKE1 leverages Docker, so any templates for RKE1 will need to have Docker EE preinstalled too. RKE2 doesn't require Docker EE and doesn't require that it be installed.** +**Important to note: The configuration for Windows templates varies between RKE1 and RKE2:** +- RKE1 leverages Docker, so any RKE1 templates need to have Docker EE pre-installed as well +- RKE2 does not require Docker EE, and thus it does not need to be installed -# Template Creation +# Creating a Template -There a few different approaches that can be pursued at this step. You can manually create your VM by following [these instructions](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AE8AFBF1-75D1-4172-988C-378C35C9FAF2.html) from VMware. Once you have a VM running, you can manually install the dependency listed above to configure the VM correctly for the vSphere node driver. After the required dependencies are configured, you can further customize based on your specific environment and requirements. Finally, you are ready to precede with the final preparation before creating your template. +You may either manually create your VM or you can utilize [other alternatives](#alternatives-to-manual-creation) to create your VM. -## Alternatives to manual creation +## Manual Creation +1. Manually create your VM by following [these instructions](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AE8AFBF1-75D1-4172-988C-378C35C9FAF2.html) from VMware. Once you have a VM running, you can manually install the dependency listed above to configure the VM correctly for the vSphere node driver. +2. Customize as needed based on your specific environment and requirements. +3. Proceed with the final preparation before creating your template. -Alternatives to manual creation do exist and below is a list of tools that can assist. +## Alternatives to Manual Creation + +Other alternative options to create VMs are listed below: * [VMware PowerCLI](https://developer.vmware.com/powercli) * [Packer](https://www.packer.io/) * [SaltStack](https://saltproject.io/) * [Ansible](https://www.ansible.com/) -Packer is often used and here is a good [reference](https://github.com/vmware-samples/packer-examples-for-vsphere) for usage with vSphere. +Packer is a frequently-used alternative. Refer to this [reference](https://github.com/vmware-samples/packer-examples-for-vsphere) for examples of its usage with vSphere. -# Preparation +# Preparing Your VM -Once you have a VM created with all the dependencies listed above and any additional items that are required, the most critical step is next. That step is preparing the VM to be turned into a template. This basically resets the VM hostname, IPs, etc. to prevent that information from being brought into a new VM. When VMs are created from a template without this step, those VMs could have the same hostname, IP address, etc. The steps differ between Linux and Windows. +After creating a VM with all the required dependencies (and any additional required items), you must perform the most critical step next: preparing the VM to be turned into a template. This preparation will reset critical data such as the VM hostname, IPs, etc., to prevent that information from being brought into a new VM. If you fail to perform this step, you could create a VM with the same hostname, IP address, etc. + +Note that these preparatory steps differ between Linux and Windows. ## Linux Preparation -Here is how to achieve the different items that need reset. +The commands below will reset your VM in Linux: ```Bash # Cleaning logs. @@ -118,7 +127,7 @@ cloud-init clean -s -l ## Windows Preparation -Windows has a utility called [sysprep](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/sysprep--generalize--a-windows-installation) that is used to generalize an image and reset the same items listed above for Linux. The command would look like this. +Windows has a utility called [sysprep](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/sysprep--generalize--a-windows-installation) that is used to generalize an image and reset the same items listed above for Linux. The command is as follows: ```PowerShell sysprep.exe /generalize /shutdown /oobe @@ -126,7 +135,11 @@ sysprep.exe /generalize /shutdown /oobe # Converting to a Template -To convert a VM to a template the first step is to shut down and stop the VM. Once it has been stopped, right-click on the VM in the inventory list and select Template. Then click on `Convert to Template`. Once that process has finished, there is now a template that can be used. +1. Shut down and stop the VM. +2. Right-click on the VM in the inventory list and select **Template**. +3. Click on **Convert to Template**. + +**Result:** Once the process has completed, a template will be available for use. For additional information on converting a VM to a template, see the [VMware guide](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-5B3737CC-28DB-4334-BD18-6E12011CDC9F.html). From e7013de736c330cd3d004e6a152f55c9d8dfeb11 Mon Sep 17 00:00:00 2001 From: Jamie Phillips Date: Tue, 26 Apr 2022 16:24:24 -0400 Subject: [PATCH 10/10] Correcting one last typo. --- .../node-pools/vsphere/creating-a-vm-template/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md index d3da38f1d2d..1ed401c2ebd 100644 --- a/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md +++ b/content/rancher/v2.6/en/cluster-provisioning/rke-clusters/node-pools/vsphere/creating-a-vm-template/_index.md @@ -57,7 +57,7 @@ The list of packages that need to be installed on the template is as follows: You may either manually create your VM or you can utilize [other alternatives](#alternatives-to-manual-creation) to create your VM. ## Manual Creation -1. Manually create your VM by following [these instructions](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AE8AFBF1-75D1-4172-988C-378C35C9FAF2.html) from VMware. Once you have a VM running, you can manually install the dependency listed above to configure the VM correctly for the vSphere node driver. +1. Manually create your VM by following [these instructions](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-AE8AFBF1-75D1-4172-988C-378C35C9FAF2.html) from VMware. Once you have a VM running, you can manually install the dependencies listed above to configure the VM correctly for the vSphere node driver. 2. Customize as needed based on your specific environment and requirements. 3. Proceed with the final preparation before creating your template.