Fix cluster admin links

This commit is contained in:
Catherine Luse
2020-10-06 15:15:04 -07:00
parent 105c157c4e
commit 628043a8cd
31 changed files with 82 additions and 1033 deletions
@@ -1,6 +1,8 @@
---
title: API Tokens
weight: 1
aliases:
- /rancher/v2.x/en/cluster-admin/api/api-tokens/
---
By default, some cluster-level API tokens are generated with infinite time-to-live (`ttl=0`). In other words, API tokens with `ttl=0` never expire unless you invalidate them. Tokens are not invalidated by changing a password.
@@ -11,10 +11,7 @@ Use the navigation bar on the left to find the current best practices for managi
For more guidance on best practices, you can consult these resources:
- [Rancher Docs]({{<baseurl>}})
- [Monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/)
- [Backups and Disaster Recovery]({{<baseurl>}}/rancher/v2.x/en/backups/)
- [Security]({{<baseurl>}}/rancher/v2.x/en/security/)
- [Security]({{<baseurl>}}/rancher/v2.x/en/security/)
- [Rancher Blog](https://rancher.com/blog/)
- [Articles about best practices on the Rancher blog](https://rancher.com/tags/best-practices/)
- [101 More Security Best Practices for Kubernetes](https://rancher.com/blog/2019/2019-01-17-101-more-kubernetes-security-best-practices/)
@@ -34,5 +34,5 @@ However, metrics-driven capacity planning analysis should be the ultimate guidan
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with Prometheus, a leading open-source monitoring solution, and Grafana, which lets you visualize the metrics from Prometheus.
After you [enable monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) in the cluster, you can set up [a notification channel]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/) and [cluster alerts]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/) to let you know if your cluster is approaching its capacity. You can also use the Prometheus and Grafana monitoring framework to establish a baseline for key metrics as you scale.
After you [enable monitoring]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) in the cluster, you can set up [a notification channel]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/) and [cluster alerts]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/) to let you know if your cluster is approaching its capacity. You can also use the Prometheus and Grafana monitoring framework to establish a baseline for key metrics as you scale.
@@ -78,7 +78,7 @@ Provision 3 or 5 etcd nodes. Etcd requires a quorum to determine a leader by the
Provision two or more control plane nodes. Some control plane components, such as the `kube-apiserver`, run in [active-active](https://www.jscape.com/blog/active-active-vs-active-passive-high-availability-cluster) mode and will give you more scalability. Other components such as kube-scheduler and kube-controller run in active-passive mode (leader elect) and give you more fault tolerance.
### Monitor Your Cluster
Closely monitor and scale your nodes as needed. You should [enable cluster monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) and use the Prometheus metrics and Grafana visualization options as a starting point.
Closely monitor and scale your nodes as needed. You should [enable cluster monitoring]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and use the Prometheus metrics and Grafana visualization options as a starting point.
# Tips for Security
+2
View File
@@ -4,6 +4,8 @@ description: The Rancher CLI is a unified tool that you can use to interact with
metaTitle: "Using the Rancher Command Line Interface "
metaDescription: "The Rancher CLI is a unified tool that you can use to interact with Rancher. With it, you can operate Rancher using a command line interface rather than the GUI"
weight: 21
aliases:
- /rancher/v2.x/en/cluster-admin/cluster-access/cli
---
The Rancher CLI (Command Line Interface) is a unified tool that you can use to interact with Rancher. With this tool, you can operate Rancher using a command line rather than the GUI.
@@ -15,7 +15,7 @@ After you download the kubeconfig file, you will be able to use the kubeconfig f
_Available as of v2.4.6_
If admins have [enforced TTL on kubeconfig tokens](../../api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires [rancher cli](../cli) to be present in your PATH.
If admins have [enforced TTL on kubeconfig tokens]({{<baseurl>}}/rancher/v2.x/en/api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires [rancher cli](../cli) to be present in your PATH.
### Two Authentication Methods for RKE Clusters
@@ -17,15 +17,8 @@ Rancher contains a variety of tools that aren't included in Kubernetes to assist
<!-- /TOC -->
## Notifiers and Alerts
Notifiers and alerts are two features that work together to inform you of events in the Rancher system.
[Notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers) are services that inform you of alert events. You can configure notifiers to send alert notifications to staff best suited to take corrective action. Notifications can be sent with Slack, email, PagerDuty, WeChat, and webhooks.
[Alerts]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts) are rules that trigger those notifications. Before you can receive alerts, you must configure one or more notifier in Rancher. The scope for alerts can be set at either the cluster or project level.
## Logging
# Logging
Logging is helpful because it allows you to:
@@ -37,18 +30,24 @@ Logging is helpful because it allows you to:
Rancher can integrate with Elasticsearch, splunk, kafka, syslog, and fluentd.
For details, refer to the [logging section.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/logging)
For details, refer to the [logging section.]({{<baseurl>}}/rancher/v2.x/en/logging)
## Monitoring
# Monitoring
_Available as of v2.2.0_
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution. For details, refer to the [monitoring section.]({{<baseurl>}}/rancher/v2.x/en/monitoring)
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution. For details, refer to the [monitoring section.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring)
### Notifiers and Alerts
## Istio
After monitoring is enabled, you can set up alerts and notifiers that provide the mechanism to receive them.
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices. For details on how to enable Istio in Rancher, refer to the [Istio section.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/istio)
Notifiers are services that inform you of alert events. You can configure notifiers to send alert notifications to staff best suited to take corrective action. Notifications can be sent with Slack, email, PagerDuty, WeChat, and webhooks.
Alerts are rules that trigger those notifications. Before you can receive alerts, you must configure one or more notifier in Rancher. The scope for alerts can be set at either the cluster or project level.
# Istio
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices. For details on how to enable Istio in Rancher, refer to the [Istio section.]({{<baseurl>}}/rancher/v2.x/en/istio)
## OPA Gatekeeper
[OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper) is an open-source project that provides integration between OPA and Kubernetes to provide policy control via admission controller webhooks. For details on how to enable Gatekeeper in Rancher, refer to the [OPA Gatekeeper section.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/opa-gatekeeper)
[OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper) is an open-source project that provides integration between OPA and Kubernetes to provide policy control via admission controller webhooks. For details on how to enable Gatekeeper in Rancher, refer to the [OPA Gatekeeper section.]({{<baseurl>}}/rancher/v2.x/en/opa-gatekeeper)
@@ -1,31 +0,0 @@
---
title: Release Notes
---
# Important note on Istio 1.5.x versions
When upgrading from any 1.4 version of Istio to any 1.5 version, the Rancher installer will delete several resources in order to complete the upgrade, at which point they will be immediately re-installed. This includes the `istio-reader-service-account`. If your Istio installation is using this service account be aware that any secrets tied to the service account will be deleted. Most notably this will **break specific [multi-cluster deployments](https://archive.istio.io/v1.4/docs/setup/install/multicluster/)**. Downgrades back to 1.4 are not possible.
See the official upgrade notes for additional information on the 1.5 release and upgrading from 1.4: https://istio.io/latest/news/releases/1.5.x/announcing-1.5/upgrade-notes/
> **Note:** Rancher continues to use the Helm installation method, which produces a different architecture from an istioctl installation.
## Istio 1.5.9 release notes
**Bug fixes**
* The Kiali traffic graph is now working [#28109](https://github.com/rancher/rancher/issues/28109)
**Known Issues**
* The Kiali traffic graph is offset in the UI [#28207](https://github.com/rancher/rancher/issues/28207)
## Istio 1.5.8 release notes
**Known Issues**
* The Kiali traffic graph is currently not working [#24924](https://github.com/istio/istio/issues/24924)
@@ -1,489 +0,0 @@
---
title: Prometheus Custom Metrics Adapter
weight: 5
---
After you've enabled [cluster level monitoring]({{< baseurl >}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
## Deploy Prometheus Custom Metrics Adapter
We are going to use the [Prometheus custom metrics adapter](https://github.com/DirectXMan12/k8s-prometheus-adapter/releases/tag/v0.5.0), version v0.5.0. This is a great example for the [custom metrics server](https://github.com/kubernetes-incubator/custom-metrics-apiserver). And you must be the *cluster owner* to execute following steps.
- Get the service account of the cluster monitoring is using. It should be configured in the workload ID: `statefulset:cattle-prometheus:prometheus-cluster-monitoring`. And if you didn't customize anything, the service account name should be `cluster-monitoring`.
- Grant permission to that service account. You will need two kinds of permission.
One role is `extension-apiserver-authentication-reader` in `kube-system`, so you will need to create a `Rolebinding` to in `kube-system`. This permission is to get api aggregation configuration from config map in `kube-system`.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: custom-metrics-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
The other one is cluster role `system:auth-delegator`, so you will need to create a `ClusterRoleBinding`. This permission is to have subject access review permission.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-metrics:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
- Create configuration for custom metrics adapter. Following is an example configuration. There will be a configuration details in next session.
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: cattle-prometheus
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)$
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
template: <<.Resource>>
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
template: <<.Resource>>
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
resourceRules:
cpu:
containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
memory:
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
window: 1m
```
- Create HTTPS TLS certs for your api server. You can use following command to create a self-signed cert.
```bash
openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out serving.crt -keyout serving.key -subj "/C=CN/CN=custom-metrics-apiserver.cattle-prometheus.svc.cluster.local"
# And you will find serving.crt and serving.key in your path. And then you are going to create a secret in cattle-prometheus namespace.
kubectl create secret generic -n cattle-prometheus cm-adapter-serving-certs --from-file=serving.key=./serving.key --from-file=serving.crt=./serving.crt
```
- Then you can create the prometheus custom metrics adapter. And you will need a service for this deployment too. Creating it via Import YAML or Rancher would do. Please create those resources in `cattle-prometheus` namespaces.
Here is the prometheus custom metrics adapter deployment.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
replicas: 1
selector:
matchLabels:
app: custom-metrics-apiserver
template:
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
serviceAccountName: cluster-monitoring
containers:
- name: custom-metrics-apiserver
image: directxman12/k8s-prometheus-adapter-amd64:v0.5.0
args:
- --secure-port=6443
- --tls-cert-file=/var/run/serving-cert/serving.crt
- --tls-private-key-file=/var/run/serving-cert/serving.key
- --logtostderr=true
- --prometheus-url=http://prometheus-operated/
- --metrics-relist-interval=1m
- --v=10
- --config=/etc/adapter/config.yaml
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
name: volume-serving-cert
readOnly: true
- mountPath: /etc/adapter/
name: config
readOnly: true
- mountPath: /tmp
name: tmp-vol
volumes:
- name: volume-serving-cert
secret:
secretName: cm-adapter-serving-certs
- name: config
configMap:
name: adapter-config
- name: tmp-vol
emptyDir: {}
```
Here is the service of the deployment.
```yaml
apiVersion: v1
kind: Service
metadata:
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: custom-metrics-apiserver
```
- Create API service for your custom metric server.
```yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: cattle-prometheus
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
```
- Then you can verify your custom metrics server by `kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1`. If you see the return datas from the api, it means that the metrics server has been successfully set up.
- You create HPA with custom metrics now. Here is an example of HPA. You will need to create a nginx deployment in your namespace first.
```yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: nginx
spec:
scaleTargetRef:
# point the HPA at the nginx deployment you just created
apiVersion: apps/v1
kind: Deployment
name: nginx
# autoscale between 1 and 10 replicas
minReplicas: 1
maxReplicas: 10
metrics:
# use a "Pods" metric, which takes the average of the
# given metric across all pods controlled by the autoscaling target
- type: Pods
pods:
metricName: memory_usage_bytes
targetAverageValue: 5000000
```
And then, you should see your nginx is scaling up. HPA with custom metrics works.
## Configuration of prometheus custom metrics adapter
> Refer to https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/config.md
The adapter determines which metrics to expose, and how to expose them,
through a set of "discovery" rules. Each rule is executed independently
(so make sure that your rules are mutually exclusive), and specifies each
of the steps the adapter needs to take to expose a metric in the API.
Each rule can be broken down into roughly four parts:
- *Discovery*, which specifies how the adapter should find all Prometheus
metrics for this rule.
- *Association*, which specifies how the adapter should determine which
Kubernetes resources a particular metric is associated with.
- *Naming*, which specifies how the adapter should expose the metric in
the custom metrics API.
- *Querying*, which specifies how a request for a particular metric on one
or more Kubernetes objects should be turned into a query to Prometheus.
A more comprehensive configuration file can be found in
[sample-config.yaml](sample-config.yaml), but a basic config with one rule
might look like:
```yaml
rules:
# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
resources:
# skip specifying generic resource<->label mappings, and just
# attach only pod and namespace resources by mapping label names to group-resources
overrides:
namespace: {resource: "namespace"},
pod_name: {resource: "pod"},
# specify that the `container_` and `_seconds_total` suffixes should be removed.
# this also introduces an implicit filter on metric family names
name:
# we use the value of the capture group implicitly as the API name
# we could also explicitly write `as: "$1"`
matches: "^container_(.*)_seconds_total$"
# specify how to construct a query to fetch samples for a given series
# This is a Go template where the `.Series` and `.LabelMatchers` string values
# are available, and the delimiters are `<<` and `>>` to avoid conflicts with
# the prometheus query language
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
### Discovery
Discovery governs the process of finding the metrics that you want to
expose in the custom metrics API. There are two fields that factor into
discovery: `seriesQuery` and `seriesFilters`.
`seriesQuery` specifies Prometheus series query (as passed to the
`/api/v1/series` endpoint in Prometheus) to use to find some set of
Prometheus series. The adapter will strip the label values from this
series, and then use the resulting metric-name-label-names combinations
later on.
In many cases, `seriesQuery` will be sufficient to narrow down the list of
Prometheus series. However, sometimes (especially if two rules might
otherwise overlap), it's useful to do additional filtering on metric
names. In this case, `seriesFilters` can be used. After the list of
series is returned from `seriesQuery`, each series has its metric name
filtered through any specified filters.
Filters may be either:
- `is: <regex>`, which matches any series whose name matches the specified
regex.
- `isNot: <regex>`, which matches any series whose name does not match the
specified regex.
For example:
```yaml
# match all cAdvisor metrics that aren't measured in seconds
seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
isNot: "^container_.*_seconds_total"
```
### Association
Association governs the process of figuring out which Kubernetes resources
a particular metric could be attached to. The `resources` field controls
this process.
There are two ways to associate resources with a particular metric. In
both cases, the value of the label becomes the name of the particular
object.
One way is to specify that any label name that matches some particular
pattern refers to some group-resource based on the label name. This can
be done using the `template` field. The pattern is specified as a Go
template, with the `Group` and `Resource` fields representing group and
resource. You don't necessarily have to use the `Group` field (in which
case the group is guessed by the system). For instance:
```yaml
# any label `kube_<group>_<resource>` becomes <group>.<resource> in Kubernetes
resources:
template: "kube_<<.Group>>_<<.Resource>>"
```
The other way is to specify that some particular label represents some
particular Kubernetes resource. This can be done using the `overrides`
field. Each override maps a Prometheus label to a Kubernetes
group-resource. For instance:
```yaml
# the microservice label corresponds to the apps.deployment resource
resource:
overrides:
microservice: {group: "apps", resource: "deployment"}
```
These two can be combined, so you can specify both a template and some
individual overrides.
The resources mentioned can be any resource available in your kubernetes
cluster, as long as you've got a corresponding label.
### Naming
Naming governs the process of converting a Prometheus metric name into
a metric in the custom metrics API, and vice versa. It's controlled by
the `name` field.
Naming is controlled by specifying a pattern to extract an API name from
a Prometheus name, and potentially a transformation on that extracted
value.
The pattern is specified in the `matches` field, and is just a regular
expression. If not specified, it defaults to `.*`.
The transformation is specified by the `as` field. You can use any
capture groups defined in the `matches` field. If the `matches` field
doesn't contain capture groups, the `as` field defaults to `$0`. If it
contains a single capture group, the `as` field defautls to `$1`.
Otherwise, it's an error not to specify the as field.
For example:
```yaml
# match turn any name <name>_total to <name>_per_second
# e.g. http_requests_total becomes http_requests_per_second
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
```
### Querying
Querying governs the process of actually fetching values for a particular
metric. It's controlled by the `metricsQuery` field.
The `metricsQuery` field is a Go template that gets turned into
a Prometheus query, using input from a particular call to the custom
metrics API. A given call to the custom metrics API is distilled down to
a metric name, a group-resource, and one or more objects of that
group-resource. These get turned into the following fields in the
template:
- `Series`: the metric name
- `LabelMatchers`: a comma-separated list of label matchers matching the
given objects. Currently, this is the label for the particular
group-resource, plus the label for namespace, if the group-resource is
namespaced.
- `GroupBy`: a comma-separated list of labels to group by. Currently,
this contains the group-resource label used in `LabelMatchers`.
For instance, suppose we had a series `http_requests_total` (exposed as
`http_requests_per_second` in the API) with labels `service`, `pod`,
`ingress`, `namespace`, and `verb`. The first four correspond to
Kubernetes resources. Then, if someone requested the metric
`pods/http_request_per_second` for the pods `pod1` and `pod2` in the
`somens` namespace, we'd have:
- `Series: "http_requests_total"`
- `LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"`
- `GroupBy`: `pod`
Additionally, there are two advanced fields that are "raw" forms of other
fields:
- `LabelValuesByName`: a map mapping the labels and values from the
`LabelMatchers` field. The values are pre-joined by `|`
(for used with the `=~` matcher in Prometheus).
- `GroupBySlice`: the slice form of `GroupBy`.
In general, you'll probably want to use the `Series`, `LabelMatchers`, and
`GroupBy` fields. The other two are for advanced usage.
The query is expected to return one value for each object requested. The
adapter will use the labels on the returned series to associate a given
series back to its corresponding object.
For example:
```yaml
# convert cumulative cAdvisor metrics into rates calculated over 2 minutes
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
@@ -1,430 +0,0 @@
---
title: Prometheus Expressions
weight: 4
---
The PromQL expressions in this doc can be used to configure [alerts.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/)
> Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring)
For more information about querying Prometheus, refer to the official [Prometheus documentation.](https://prometheus.io/docs/prometheus/latest/querying/basics/)
<!-- TOC -->
- [Cluster Metrics](#cluster-metrics)
- [Cluster CPU Utilization](#cluster-cpu-utilization)
- [Cluster Load Average](#cluster-load-average)
- [Cluster Memory Utilization](#cluster-memory-utilization)
- [Cluster Disk Utilization](#cluster-disk-utilization)
- [Cluster Disk I/O](#cluster-disk-i-o)
- [Cluster Network Packets](#cluster-network-packets)
- [Cluster Network I/O](#cluster-network-i-o)
- [Node Metrics](#node-metrics)
- [Node CPU Utilization](#node-cpu-utilization)
- [Node Load Average](#node-load-average)
- [Node Memory Utilization](#node-memory-utilization)
- [Node Disk Utilization](#node-disk-utilization)
- [Node Disk I/O](#node-disk-i-o)
- [Node Network Packets](#node-network-packets)
- [Node Network I/O](#node-network-i-o)
- [Etcd Metrics](#etcd-metrics)
- [Etcd Has a Leader](#etcd-has-a-leader)
- [Number of Times the Leader Changes](#number-of-times-the-leader-changes)
- [Number of Failed Proposals](#number-of-failed-proposals)
- [GRPC Client Traffic](#grpc-client-traffic)
- [Peer Traffic](#peer-traffic)
- [DB Size](#db-size)
- [Active Streams](#active-streams)
- [Raft Proposals](#raft-proposals)
- [RPC Rate](#rpc-rate)
- [Disk Operations](#disk-operations)
- [Disk Sync Duration](#disk-sync-duration)
- [Kubernetes Components Metrics](#kubernetes-components-metrics)
- [API Server Request Latency](#api-server-request-latency)
- [API Server Request Rate](#api-server-request-rate)
- [Scheduling Failed Pods](#scheduling-failed-pods)
- [Controller Manager Queue Depth](#controller-manager-queue-depth)
- [Scheduler E2E Scheduling Latency](#scheduler-e2e-scheduling-latency)
- [Scheduler Preemption Attempts](#scheduler-preemption-attempts)
- [Ingress Controller Connections](#ingress-controller-connections)
- [Ingress Controller Request Process Time](#ingress-controller-request-process-time)
- [Rancher Logging Metrics](#rancher-logging-metrics)
- [Fluentd Buffer Queue Rate](#fluentd-buffer-queue-rate)
- [Fluentd Input Rate](#fluentd-input-rate)
- [Fluentd Output Errors Rate](#fluentd-output-errors-rate)
- [Fluentd Output Rate](#fluentd-output-rate)
- [Workload Metrics](#workload-metrics)
- [Workload CPU Utilization](#workload-cpu-utilization)
- [Workload Memory Utilization](#workload-memory-utilization)
- [Workload Network Packets](#workload-network-packets)
- [Workload Network I/O](#workload-network-i-o)
- [Workload Disk I/O](#workload-disk-i-o)
- [Pod Metrics](#pod-metrics)
- [Pod CPU Utilization](#pod-cpu-utilization)
- [Pod Memory Utilization](#pod-memory-utilization)
- [Pod Network Packets](#pod-network-packets)
- [Pod Network I/O](#pod-network-i-o)
- [Pod Disk I/O](#pod-disk-i-o)
- [Container Metrics](#container-metrics)
- [Container CPU Utilization](#container-cpu-utilization)
- [Container Memory Utilization](#container-memory-utilization)
- [Container Disk I/O](#container-disk-i-o)
<!-- /TOC -->
# Cluster Metrics
### Cluster CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])))` |
### Cluster Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr></table> |
### Cluster Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes) by (instance) / sum(node_memory_MemTotal_bytes) by (instance)` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)` |
### Cluster Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) - sum(node_filesystem_free_bytes{device!="rootfs"}) by (instance)) / sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs"}) - sum(node_filesystem_free_bytes{device!="rootfs"})) / sum(node_filesystem_size_bytes{device!="rootfs"})` |
### Cluster Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m])) by (instance)`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m]))`</td></tr></table> |
### Cluster Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
### Cluster Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
# Node Metrics
### Node CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `avg(irate(node_cpu_seconds_total{mode!="idle", instance=~"$instance"}[5m])) by (mode)` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])))` |
### Node Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
### Node Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"})` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"}) ` |
### Node Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"}) by (device)) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"})) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"})` |
### Node Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
### Node Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
### Node Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
# Etcd Metrics
### Etcd Has a Leader
`max(etcd_server_has_leader)`
### Number of Times the Leader Changes
`max(etcd_server_leader_changes_seen_total)`
### Number of Failed Proposals
`sum(etcd_server_proposals_failed_total)`
### GRPC Client Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m]))`</td></tr></table> |
### Peer Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m]))`</td></tr></table> |
### DB Size
| Catalog | Expression |
| --- | --- |
| Detail | `sum(etcd_debugging_mvcc_db_total_size_in_bytes) by (instance)` |
| Summary | `sum(etcd_debugging_mvcc_db_total_size_in_bytes)` |
### Active Streams
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance)`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"})`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"})`</td></tr></table> |
### Raft Proposals
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m])) by (instance)`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m])) by (instance)`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m])) by (instance)`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m]))`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m]))`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m]))`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m]))`</td></tr></table> |
### RPC Rate
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) by (instance)`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m]))`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m]))`</td></tr></table> |
### Disk Operations
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) by (instance)`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m]))`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m]))`</td></tr></table> |
### Disk Sync Duration
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>wal</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))`</td></tr><tr><td>db</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))`</td></tr></table> |
| Summary | <table><tr><td>wal</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr><tr><td>db</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr></table> |
# Kubernetes Components Metrics
### API Server Request Latency
| Catalog | Expression |
| --- | --- |
| Detail | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance, verb) /1e+06` |
| Summary | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance) /1e+06` |
### API Server Request Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(apiserver_request_count[5m])) by (instance, code)` |
| Summary | `sum(rate(apiserver_request_count[5m])) by (instance)` |
### Scheduling Failed Pods
| Catalog | Expression |
| --- | --- |
| Detail | `sum(kube_pod_status_scheduled{condition="false"})` |
| Summary | `sum(kube_pod_status_scheduled{condition="false"})` |
### Controller Manager Queue Depth
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>volumes</td><td>`sum(volumes_depth) by instance`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth) by instance`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth) by instance`</td></tr><tr><td>service</td><td>`sum(service_depth) by instance`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth) by instance`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth) by instance`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth) by instance`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth) by instance`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth) by instance`</td></tr></table> |
| Summary | <table><tr><td>volumes</td><td>`sum(volumes_depth)`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth)`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth)`</td></tr><tr><td>service</td><td>`sum(service_depth)`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth)`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth)`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth)`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth)`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth)`</td></tr></table> |
### Scheduler E2E Scheduling Latency
| Catalog | Expression |
| --- | --- |
| Detail | `histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06` |
| Summary | `sum(histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06)` |
### Scheduler Preemption Attempts
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(scheduler_total_preemption_attempts[5m])) by (instance)` |
| Summary | `sum(rate(scheduler_total_preemption_attempts[5m]))` |
### Ingress Controller Connections
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) by (instance)`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) by (instance)`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) by (instance)`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) by (instance)`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) by (instance)`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"})`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"})`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"})`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m])))`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m])))`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m])))`</td></tr></table> |
### Ingress Controller Request Process Time
| Catalog | Expression |
| --- | --- |
| Detail | `topk(10, histogram_quantile(0.95,sum by (le, host, path)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
| Summary | `topk(10, histogram_quantile(0.95,sum by (le, host)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
# Rancher Logging Metrics
### Fluentd Buffer Queue Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_buffer_queue_length[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_buffer_queue_length[5m]))` |
### Fluentd Input Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_input_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_input_status_num_records_total[5m]))` |
### Fluentd Output Errors Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_errors[5m])) by (type)` |
| Summary | `sum(rate(fluentd_output_status_num_errors[5m]))` |
### Fluentd Output Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_num_records_total[5m]))` |
# Workload Metrics
### Workload CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) by (pod_name)` |
| Summary | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""})` |
### Workload Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
# Pod Metrics
### Pod CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr></table> |
### Pod Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) by (container_name)` |
| Summary | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""})` |
### Pod Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
# Container Metrics
### Container CPU Utilization
| Catalog | Expression |
| --- | --- |
| cfs throttled seconds | `sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| usage seconds | `sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| system seconds | `sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| user seconds | `sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
### Container Memory Utilization
`sum(container_memory_working_set_bytes{namespace="$namespace",pod_name="$podName",container_name="$containerName"})`
### Container Disk I/O
| Catalog | Expression |
| --- | --- |
| read | `sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| write | `sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
@@ -41,7 +41,7 @@ When you delete an EKS cluster that was created in Rancher, the cluster is destr
After importing a cluster, the cluster owner can:
- [Manage cluster access]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/cluster-project-roles/) through role-based access control
- Enable [monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) and [logging]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/logging/)
- Enable [monitoring]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and [logging]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/logging/)
- Enable [Istio]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/istio/)
- Use [pipelines]({{<baseurl>}}/rancher/v2.x/en/project-admin/pipelines/)
- Configure [alerts]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/) and [notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/)
@@ -361,7 +361,7 @@ See [Docker Root Directory](#docker-root-directory).
### enable_cluster_monitoring
Option to enable or disable [Cluster Monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/).
Option to enable or disable [Cluster Monitoring]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/).
### enable_network_policy
@@ -20,7 +20,7 @@ After creating a multi-cluster application, you can program a [Global DNS entry]
- [Roles](#roles)
- [Application configuration options](#application-configuration-options)
- [Using a questions.yml file](#using-a-questions-yml-file)
- [Key value pairs for native Helm charts](key-value-pairs-for-native-helm-charts)
- [Key value pairs for native Helm charts](#key-value-pairs-for-native-helm-charts)
- [Members](#members)
- [Overriding application configuration options for specific projects](#overriding-application-configuration-options-for-specific-projects)
- [Upgrading multi-cluster app roles and projects](#upgrading-multi-cluster-app-roles-and-projects)
@@ -10,11 +10,11 @@ aliases:
---
_Available as of v2.3.0_
> In Rancher 2.5, the Istio application was improved. There are now two ways to enable Istio. The older way is documented in this section, and the new application for Istio is documented in the [dashboard section.]({{<baseurl>}}/rancher/v2.x/en/dashboard/istio)
> In Rancher 2.5, the Istio application was improved. There are now two ways to enable Istio. The older way is documented in this section, and the new application for Istio is documented [here.]({{<baseurl>}}/rancher/v2.x/en/istio)
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices.
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices.
As a network of microservices changes and grows, the interactions between them can become more difficult to manage and understand. In such a situation, it is useful to have a service mesh as a separate infrastructure layer. Istio's service mesh lets you manipulate traffic between microservices without changing the microservices directly.
As a network of microservices changes and grows, the interactions between them can become more difficult to manage and understand. In such a situation, it is useful to have a service mesh as a separate infrastructure layer. Istio's service mesh lets you manipulate traffic between microservices without changing the microservices directly.
Our integration of Istio is designed so that a Rancher operator, such as an administrator or cluster owner, can deliver Istio to developers. Then developers can use Istio to enforce security policies, troubleshoot problems, or manage traffic for green/blue deployments, canary deployments, or A/B testing.
@@ -4,6 +4,16 @@ aliases:
- /rancher/v2.x/en/cluster-admin/tools/istio/release-notes
---
## Istio 1.5.9 release notes
**Bug fixes**
* The Kiali traffic graph is now working [#28109](https://github.com/rancher/rancher/issues/28109)
**Known Issues**
* The Kiali traffic graph is offset in the UI [#28207](https://github.com/rancher/rancher/issues/28207)
# Istio 1.5.8
@@ -8,7 +8,7 @@ aliases:
- /rancher/v2.x/en/cluster-admin/tools/logging
---
> In Rancher 2.5, the logging application was improved. There are now two ways to enable logging. The older way is documented in this section, and the new application for logging is documented in the [dashboard section.]({{<baseurl>}}/rancher/v2.x/en/dashboard/logging)
> In Rancher 2.5, the logging application was improved. There are now two ways to enable logging. The older way is documented in this section, and the new application for logging is documented [dashboard section.]({{<baseurl>}}/rancher/v2.x/en/logging)
Logging is helpful because it allows you to:
@@ -48,14 +48,6 @@ In other words, Prometheus lets you view metrics from your different Rancher and
By viewing data that Prometheus scrapes from your cluster control plane, nodes, and deployments, you can stay on top of everything happening in your cluster. You can then use these analytics to better run your organization: stop system emergencies before they start, develop maintenance strategies, restore crashed servers, etc.
# Monitoring Scope
Cluster monitoring allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- [Kubernetes control plane]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#kubernetes-components-metrics)
- [etcd database]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#etcd-metrics)
- [All nodes (including workers)]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#cluster-metrics)
# Enabling Cluster Monitoring
As an [administrator]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/global-permissions/) or [cluster owner]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.
@@ -5,8 +5,7 @@ aliases:
- rancher/v2.x/en/cluster-admin/tools/alerts
---
> In Rancher 2.5, the monitoring application was improved. There are now two ways to enable monitoring and alerting. The older way is documented in this section, and the new application for monitoring and alerting is documented in the [dashboard section.]({{<baseurl>}}/rancher/v2.x/en/dashboard/monitoring-alerting)
> In Rancher 2.5, the monitoring application was improved. There are now two ways to enable monitoring and alerting. The older way is documented in this section, and the new application for monitoring and alerting is documented [here.]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting)
To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned. When an event occurs, your alert is triggered, and you are sent a notification. You can then, if necessary, follow up with corrective actions.
@@ -38,9 +37,9 @@ Some examples of alert events are:
### Prometheus Queries
> **Prerequisite:** Monitoring must be [enabled]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) before you can trigger alerts with custom Prometheus queries or expressions.
> **Prerequisite:** Monitoring must be [enabled]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring) before you can trigger alerts with custom Prometheus queries or expressions.
When you edit an alert rule, you will have the opportunity to configure the alert to be triggered based on a Prometheus expression. For examples of expressions, refer to [this page.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression)
When you edit an alert rule, you will have the opportunity to configure the alert to be triggered based on a Prometheus expression. For examples of expressions, refer to [this page.]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression)
# Urgency Levels
@@ -61,7 +60,7 @@ At the cluster level, Rancher monitors components in your Kubernetes cluster, an
As a [cluster owner]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to send you alerts for cluster events.
>**Prerequisite:** Before you can receive cluster alerts, you must [add a notifier]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/#adding-notifiers).
>**Prerequisite:** Before you can receive cluster alerts, you must [add a notifier]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/notifiers/#adding-notifiers).
1. From the **Global** view, navigate to the cluster that you want to configure cluster alerts for. Select **Tools > Alerts**. Then click **Add Alert Group**.
@@ -7,7 +7,7 @@ aliases:
When you create a cluster, some alert rules are predefined. These alerts notify you about signs that the cluster could be unhealthy. You can receive these alerts if you configure a [notifier]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers) for them.
Several of the alerts use Prometheus expressions as the metric that triggers the alert. For more information on how expressions work, you can refer to the Rancher [documentation about Prometheus expressions]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/) or the Prometheus [documentation about querying metrics](https://prometheus.io/docs/prometheus/latest/querying/basics/).
Several of the alerts use Prometheus expressions as the metric that triggers the alert. For more information on how expressions work, you can refer to the Rancher [documentation about Prometheus expressions]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/) or the Prometheus [documentation about querying metrics](https://prometheus.io/docs/prometheus/latest/querying/basics/).
# Alerts for etcd
Etcd is the key-value store that contains the state of the Kubernetes cluster. Rancher provides default alerts if the built-in monitoring detects a potential problem with etcd. You don't have to enable monitoring to receive these alerts.
@@ -42,9 +42,9 @@ Using Prometheus, you can monitor Rancher at both the cluster level and [project
- Cluster monitoring allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- [Kubernetes control plane]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#kubernetes-components-metrics)
- [etcd database]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#etcd-metrics)
- [All nodes (including workers)]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#cluster-metrics)
- [Kubernetes control plane]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#kubernetes-components-metrics)
- [etcd database]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#etcd-metrics)
- [All nodes (including workers)]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#cluster-metrics)
- [Project monitoring]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/) allows you to view the state of pods running in a given project. Prometheus collects metrics from the project's deployed HTTP and TCP/UDP workloads.
@@ -58,11 +58,11 @@ As an [administrator]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/global-p
1. Select **Tools > Monitoring** in the navigation bar.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/prometheus/). Review the [resource consumption recommendations](#resource-consumption) to ensure you have enough resources for Prometheus and on your worker nodes to enable monitoring. Enter in your desired configuration options.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/prometheus/). Review the [resource consumption recommendations](#resource-consumption) to ensure you have enough resources for Prometheus and on your worker nodes to enable monitoring. Enter in your desired configuration options.
1. Click **Save**.
**Result:** The Prometheus server will be deployed as well as two monitoring applications. The two monitoring applications, `cluster-monitoring` and `monitoring-operator`, are added as an [application]({{<baseurl>}}/rancher/v2.x/en/catalog/apps/) to the cluster's `system` project. After the applications are `active`, you can start viewing [cluster metrics]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/) through the [Rancher dashboard]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/viewing-metrics/#rancher-dashboard) or directly from [Grafana]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#grafana).
**Result:** The Prometheus server will be deployed as well as two monitoring applications. The two monitoring applications, `cluster-monitoring` and `monitoring-operator`, are added as an [application]({{<baseurl>}}/rancher/v2.x/en/catalog/apps/) to the cluster's `system` project. After the applications are `active`, you can start viewing [cluster metrics]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/) through the Rancher dashboard or directly from [Grafana]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#grafana).
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
@@ -38,7 +38,7 @@ Some of the biggest metrics to look out for:
1. Click on **Node Metrics**.
[_Get expressions for Cluster Metrics_]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/#cluster-metrics)
[_Get expressions for Cluster Metrics_]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#cluster-metrics)
### Etcd Metrics
@@ -58,7 +58,7 @@ Some of the biggest metrics to look out for:
If this statistic suddenly grows, it usually indicates network communication issues that constantly force the cluster to elect a new leader.
[_Get expressions for Etcd Metrics_]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/#etcd-metrics)
[_Get expressions for Etcd Metrics_]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#etcd-metrics)
### Kubernetes Components Metrics
@@ -90,13 +90,13 @@ Some of the more important component metrics to monitor are:
How fast ingress is routing connections to your cluster services.
[_Get expressions for Kubernetes Component Metrics_]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/#kubernetes-components-metrics)
[_Get expressions for Kubernetes Component Metrics_]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#kubernetes-components-metrics)
## Rancher Logging Metrics
Although the Dashboard for a cluster primarily displays data sourced from Prometheus, it also displays information for cluster logging, provided that you have [configured Rancher to use a logging service]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/logging/).
[_Get expressions for Rancher Logging Metrics_]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/#rancher-logging-metrics)
[_Get expressions for Rancher Logging Metrics_]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#rancher-logging-metrics)
## Finding Workload Metrics
@@ -113,4 +113,4 @@ Workload metrics display the hardware utilization for a Kubernetes workload. You
- **View the Pod Metrics:** Click on **Pod Metrics**.
- **View the Container Metrics:** In the **Containers** section, select a specific container and click on its name. Click on **Container Metrics**.
[_Get expressions for Workload Metrics_]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/expression/#workload-metrics)
[_Get expressions for Workload Metrics_]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#workload-metrics)
@@ -4,9 +4,10 @@ weight: 5
aliases:
- rancher/v2.x/en/project-admin/tools/monitoring/custom-metrics
- rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics
- /rancher/v2.x/en/cluster-admin/tools/monitoring/custom-metrics
---
After you've enabled [cluster level monitoring]({{< baseurl >}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
After you've enabled [cluster level monitoring]({{< baseurl >}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
## Deploy Prometheus Custom Metrics Adapter
@@ -305,9 +306,7 @@ Each rule can be broken down into roughly four parts:
- *Querying*, which specifies how a request for a particular metric on one
or more Kubernetes objects should be turned into a query to Prometheus.
A more comprehensive configuration file can be found in
[sample-config.yaml](sample-config.yaml), but a basic config with one rule
might look like:
A basic config with one rule might look like:
```yaml
rules:
@@ -3,12 +3,12 @@ title: Prometheus Expressions
weight: 4
aliases:
- rancher/v2.x/en/project-admin/tools/monitoring/expression
- rancher/v2.x/en/cluster-admin/tools/monitoring/expression
- /rancher/v2.x/en/cluster-admin/tools/monitoring/expression
---
The PromQL expressions in this doc can be used to configure [alerts.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/)
> Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring)
> Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring)
For more information about querying Prometheus, refer to the official [Prometheus documentation.](https://prometheus.io/docs/prometheus/latest/querying/basics/)
@@ -3,13 +3,13 @@ title: Prometheus Configuration
weight: 1
aliases:
- rancher/v2.x/en/project-admin/tools/monitoring/prometheus
- rancher/v2.x/en/cluster-admin/tools/monitoring/prometheus
- /rancher/v2.x/en/cluster-admin/tools/monitoring/prometheus/
---
_Available as of v2.2.0_
While configuring monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), there are multiple options that can be configured.
While configuring monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), there are multiple options that can be configured.
Option | Description
-------|-------------
@@ -8,11 +8,11 @@ aliases:
_Available as of v2.2.0_
After you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), you will want to be start viewing the data being collected. There are multiple ways to view this data.
After you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), you will want to be start viewing the data being collected. There are multiple ways to view this data.
## Rancher Dashboard
>**Note:** This is only available if you've enabled monitoring at the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring). Project specific analytics must be viewed using the project's Grafana instance.
>**Note:** This is only available if you've enabled monitoring at the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring). Project specific analytics must be viewed using the project's Grafana instance.
Rancher's dashboards are available at multiple locations:
@@ -36,7 +36,7 @@ When analyzing these metrics, don't be concerned about any single standalone met
## Grafana
If you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), Rancher automatically creates a link to Grafana instance. Use this link to view monitoring data.
If you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#enabling-cluster-monitoring) or [project level]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring), Rancher automatically creates a link to Grafana instance. Use this link to view monitoring data.
Grafana allows you to query, visualize, alert, and ultimately, understand your cluster and workload data. For more information on Grafana and its capabilities, visit the [Grafana website](https://grafana.com/grafana).
@@ -9,8 +9,6 @@ _Available as of v2.2.4_
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
> For more information about how Prometheus works, refer to the [cluster administration section.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#about-prometheus)
This section covers the following topics:
- [Monitoring scope](#monitoring-scope)
@@ -21,13 +19,13 @@ This section covers the following topics:
### Monitoring Scope
Using Prometheus, you can monitor Rancher at both the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) and project level. For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
Using Prometheus, you can monitor Rancher at both the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and project level. For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
- [Cluster monitoring]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- [Cluster monitoring]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- [Kubernetes control plane]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#kubernetes-components-metrics)
- [etcd database]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#etcd-metrics)
- [All nodes (including workers)]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#cluster-metrics)
- [Kubernetes control plane]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#kubernetes-components-metrics)
- [etcd database]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#etcd-metrics)
- [All nodes (including workers)]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#cluster-metrics)
- Project monitoring allows you to view the state of pods running in a given project. Prometheus collects metrics from the project's deployed HTTP and TCP/UDP workloads.
@@ -37,13 +35,13 @@ Only [administrators]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/global-p
### Enabling Project Monitoring
> **Prerequisite:** Cluster monitoring must be [enabled.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/)
> **Prerequisite:** Cluster monitoring must be [enabled.]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/)
1. Go to the project where monitoring should be enabled. Note: When cluster monitoring is enabled, monitoring is also enabled by default in the **System** project.
1. Select **Tools > Monitoring** in the navigation bar.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/prometheus/). Enter in your desired configuration options.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/prometheus/). Enter in your desired configuration options.
1. Click **Save**.
@@ -55,13 +53,13 @@ Prometheus|750m| 750Mi | 1000m | 1000Mi | Yes
Grafana | 100m | 100Mi | 200m | 200Mi | No
**Result:** A single application,`project-monitoring`, is added as an [application]({{<baseurl>}}/rancher/v2.x/en/catalog/apps/) to the project. After the application is `active`, you can start viewing [project metrics](#project-metrics) through the [Rancher dashboard]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#rancher-dashboard) or directly from [Grafana]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#grafana).
**Result:** A single application,`project-monitoring`, is added as an [application]({{<baseurl>}}/rancher/v2.x/en/catalog/apps/) to the project. After the application is `active`, you can start viewing [project metrics](#project-metrics) through the [Rancher dashboard]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#rancher-dashboard) or directly from [Grafana]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/#grafana).
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
### Project Metrics
[Workload metrics]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/cluster-metrics/#workload-metrics) are available for the project if monitoring is enabled at the [cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) and at the [project level.](#enabling-project-monitoring)
[Workload metrics]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/#workload-metrics) are available for the project if monitoring is enabled at the [cluster level]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and at the [project level.](#enabling-project-monitoring)
You can monitor custom metrics from any [exporters.](https://prometheus.io/docs/instrumenting/exporters/) You can also expose some custom endpoints on deployments without needing to configure Prometheus for your project.
@@ -4,9 +4,10 @@ weight: 1
aliases:
- rancher/v2.x/en/project-admin/tools/notifiers
- rancher/v2.x/en/cluster-admin/tools/notifiers
- /rancher/v2.x/en/cluster-admin/tools/notifiers
---
> In Rancher 2.5, the notifier application was improved. There are now two ways to enable notifiers. The older way is documented in this section, and the new application for notifiers is documented in the [dashboard section.]({{<baseurl>}}/rancher/v2.x/en/dashboard/notifiers)
> In Rancher 2.5, the notifier application was improved. There are now two ways to enable notifiers. The older way is documented in this section, and the new application for notifiers is documented [here.]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting)
Notifiers are services that inform you of alert events. You can configure notifiers to send alert notifications to staff best suited to take corrective action.
@@ -3,7 +3,7 @@ title: OPA Gatekeeper
weight: 17
aliases:
- /rancher/v2.x/en/cluster-admin/tools/opa-gatekeeper
- /rancher/v2.x/en/opa-gatekeper/Open%20Policy%20Agent
---
_Available as of v2.4.0_
+3 -3
View File
@@ -48,9 +48,9 @@ The Rancher API server is built on top of an embedded Kubernetes API server and
### Cluster Visibility
- **Logging:** Rancher can integrate with a variety of popular logging services and tools that exist outside of your Kubernetes clusters. Logging can be set up [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/logging/) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/logging/)
- **Monitoring:** Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with Prometheus, a leading open-source monitoring solution. Monitoring can be configured [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/)
- **Alerting:** To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned. To help you stay informed of these events, you can configure alerts [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/alerts/)
- **Logging:** Rancher can integrate with a variety of popular logging services and tools that exist outside of your Kubernetes clusters.
- **Monitoring:** Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with Prometheus, a leading open-source monitoring solution.
- **Alerting:** To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned.
# Editing Downstream Clusters with Rancher
@@ -309,7 +309,7 @@ timeout: 30
# Notifications
You can enable notifications to any [notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/) based on the build status of a pipeline. Before enabling notifications, Rancher recommends [setting up notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/#adding-notifiers) so it will be easy to add recipients immediately.
You can enable notifications to any [notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/) based on the build status of a pipeline. Before enabling notifications, Rancher recommends [setting up notifiers]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/notifiers/#adding-notifiers) so it will be easy to add recipients immediately.
### Configuring Notifications by UI
@@ -319,7 +319,7 @@ _Available as of v2.2.0_
1. Select the conditions for the notification. You can select to get a notification for the following statuses: `Failed`, `Success`, `Changed`. For example, if you want to receive notifications when an execution fails, select **Failed**.
1. If you don't have any existing [notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers), Rancher will provide a warning that no notifiers are set up and provide a link to be able to go to the notifiers page. Follow the [instructions]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers/#adding-notifiers) to add a notifier. If you already have notifiers, you can add them to the notification by clicking the **Add Recipient** button.
1. If you don't have any existing [notifiers]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/notifiers), Rancher will provide a warning that no notifiers are set up and provide a link to be able to go to the notifiers page. Follow the [instructions]({{<baseurl>}}/rancher/v2.x/en/monitoring-alerting/legacy/notifiers/#adding-notifiers) to add a notifier. If you already have notifiers, you can add them to the notification by clicking the **Add Recipient** button.
> **Note:** Notifiers are configured at a cluster level and require a different level of permissions.
@@ -26,7 +26,7 @@ _**Available as of v2.4.6**_
_Requirements_
If admins have [enforced TTL on kubeconfig tokens](../../api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires the [Rancher cli](../cli) to be present in your PATH when you run `kubectl`. Otherwise, youll see error like:
If admins have [enforced TTL on kubeconfig tokens]({{<baseurl>}}/rancher/v2.x/en/api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires the [Rancher cli](../cli) to be present in your PATH when you run `kubectl`. Otherwise, youll see error like:
`Unable to connect to the server: getting credentials: exec: exec: "rancher": executable file not found in $PATH`.
This feature enables kubectl to authenticate with the Rancher server and get a new kubeconfig token when required. The following auth providers are currently supported: