Merge pull request #2749 from catherineluse/staging-merge

Pull master into staging
This commit is contained in:
Catherine Luse
2020-10-05 20:53:06 -07:00
committed by GitHub
61 changed files with 4899 additions and 180 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 282 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

+23
View File
@@ -314,6 +314,29 @@ rpm -i https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm
To force the install script to log a warning rather than fail, you can set the following environment variable: `INSTALL_K3S_SELINUX_WARN=true`.
The way that SELinux enforcement is enabled or disabled depends on the K3s version. Prior to v1.19.x, SELinux enablement for the builtin containerd was automatic but could be disabled by passing `--disable-selinux`. With v1.19.x and beyond, enabling SELinux must be affirmatively configured via the `--selinux` flag or config file entry. Servers and agents that specify both the `--selinux` and (deprecated) `--disable-selinux` flags will fail to start.
Using a custom `--data-dir` under SELinux is not supported. To customize it, you would most likely need to write your own custom policy. For guidance, you could refer to the [containers/container-selinux](https://github.com/containers/container-selinux) repository, which contains the SELinux policy files for Container Runtimes, and the [rancher/k3s-selinux](https://github.com/rancher/k3s-selinux) repository, which contains the SELinux policy for K3s .
{{% tabs %}}
{{% tab "K3s v1.19.1+k3s1" %}}
To leverage experimental SELinux, specify the `--selinux` flag when starting K3s servers and agents.
This option can also be specified in the K3s [configuration file:]({{<baseurl>}}/k3s/latest/en/installation/install-options/#configuration-file)
```
selinux: true
```
The `--disable-selinux` option should not be used. It is deprecated and will be either ignored or will be unrecognized, resulting in an error, in future minor releases.
{{%/tab%}}
{{% tab "K3s prior to v1.19.1+k3s1" %}}
You can turn off SELinux enforcement in the embedded containerd by launching K3s with the `--disable-selinux` flag.
{{%/tab%}}
{{% /tabs %}}
Note that support for SELinux in containerd is still under development. Progress can be tracked in [this pull request](https://github.com/containerd/cri/pull/1246).
+1 -1
View File
@@ -56,4 +56,4 @@ A unique node ID can be appended to the hostname by launching K3s servers or age
# Automatically Deployed Manifests
The [manifests](https://github.com/rancher/k3s/tree/master/manifests) located at the directory path `/var/lib/rancher/k3s/server/manifests` are bundled into the K3s binary at build time.
The [manifests](https://github.com/rancher/k3s/tree/master/manifests) located at the directory path `/var/lib/rancher/k3s/server/manifests` are bundled into the K3s binary at build time. These will be installed at runtime by the [rancher/helm-controller.](https://github.com/rancher/helm-controller#helm-controller)
@@ -0,0 +1,46 @@
---
title: Backup and Restore Embedded etcd Datastore (Experimental)
shortTitle: Backup and Restore
weight: 26
---
_Available as of v1.19.1+k3s1_
In this section, you'll learn how to create backups of the K3s cluster data and to restore the cluster from backup.
> This is an experimental feature available for K3s clusters with an embedded etcd datastore. If you installed K3s with an external datastore, refer to the upstream documentation for the database for information on backing up the cluster data.
### Creating Snapshots
Snapshots are enabled by default.
The snapshot directory defaults to `/server/db/snapshots`.
To configure the snapshot interval or the number of retained snapshots, refer to the [options.](#options)
### Restoring a Cluster from a Snapshot
When K3s is restored from backup, the old data directory will be moved to `/server/db/etcd-old/`. Then K3s will attempt to restore the snapshot by creating a new data directory, then starting etcd with a new K3s cluster with one etcd member.
To restore the cluster from backup, run K3s with the `--cluster-reset` option, with the `--cluster-reset-restore-path` also given:
```
./k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
```
**Result:** A message in the logs says that K3s can be restarted without the flags. Start k3s again and should run successfully and be restored from the specified snapshot.
### Options
These options can be passed in with the command line, or in the [configuration file,]({{<baseurl>}}/k3s/latest/en/installation/install-options/#configuration-file ) which may be easier to use.
| Options | Description |
| ----------- | --------------- |
| `--etcd-disable-snapshots` | Disable automatic etcd snapshots |
| `--etcd-snapshot-schedule-cron` value | Snapshot interval time in cron spec. eg. every 5 hours `* */5 * * *`(default: `0 */12 * * *`) |
| `--etcd-snapshot-retention` value | Number of snapshots to retain (default: 5) |
| `--etcd-snapshot-dir` value | Directory to save db snapshots. (Default location: `${data-dir}/db/snapshots`) |
| `--cluster-reset` | Forget all peers and become sole member of a new cluster. This can also be set with the environment variable `[$K3S_CLUSTER_RESET]`.
| `--cluster-reset-restore-path` value | Path to snapshot file to be restored
+55 -51
View File
@@ -3,48 +3,28 @@ title: Helm
weight: 42
---
K3s release _v1.17.0+k3s.1_ added support for Helm 3. You can access the Helm 3 documentation [here](https://helm.sh/docs/intro/quickstart/).
Helm is the package management tool of choice for Kubernetes. Helm charts provide templating syntax for Kubernetes YAML manifest documents. With Helm we can create configurable deployments instead of just using static files. For more information about creating your own catalog of deployments, check out the docs at [https://helm.sh/docs/intro/quickstart/](https://helm.sh/docs/intro/quickstart/).
Helm is the package management tool of choice for Kubernetes. Helm charts provide templating syntax for Kubernetes YAML manifest documents. With Helm we can create configurable deployments instead of just using static files. For more information about creating your own catalog of deployments, check out the docs at https://helm.sh/.
K3s does not require any special configuration to start using Helm 3. Just be sure you have properly set up your kubeconfig as per the section about [cluster access.](../cluster-access)
K3s does not require any special configuration to use with Helm command-line tools. Just be sure you have properly set up your kubeconfig as per the section about [cluster access](../cluster-access). K3s does include some extra functionality to make deploying both traditional Kubernetes resource manifests and Helm Charts even easier with the [rancher/helm-release CRD.](#using-the-helm-crd)
This section covers the following topics:
- [Upgrading Helm](#upgrading-helm)
- [Deploying manifests and Helm charts](#deploying-manifests-and-helm-charts)
- [Automatically Deploying Manifests and Helm Charts](#automatically-deploying-manifests-and-helm-charts)
- [Using the Helm CRD](#using-the-helm-crd)
- [Customizing Packaged Components with HelmChartConfig](#customizing-packaged-components-with-helmchartconfig)
- [Upgrading from Helm v2](#upgrading-from-helm-v2)
### Upgrading Helm
### Automatically Deploying Manifests and Helm Charts
If you were using Helm v2 in previous versions of K3s, you may upgrade to v1.17.0+k3s.1 or newer and Helm 2 will still function. If you wish to migrate to Helm 3, [this](https://helm.sh/blog/migrate-from-helm-v2-to-helm-v3/) blog post by Helm explains how to use a plugin to successfully migrate. Refer to the official Helm 3 documentation [here](https://helm.sh/docs/) for more information. K3s will handle either Helm v2 or Helm v3 as of v1.17.0+k3s.1. Just be sure you have properly set your kubeconfig as per the examples in the section about [cluster access.](../cluster-access)
Any Kubernetes manifests found in `/var/lib/rancher/k3s/server/manifests` will automatically be deployed to K3s in a manner similar to `kubectl apply`. Manifests deployed in this manner are managed as AddOn custom resources, and can be viewed by running `kubectl get addon -A`. You will find AddOns for packaged components such as CoreDNS, Local-Storage, Traefik, etc. AddOns are created automatically by the deploy controller, and are named based on their filename in the manifests directory.
Note that Helm 3 no longer requires Tiller and the `helm init` command. Refer to the official documentation for details.
It is also possible to deploy Helm charts as AddOns. K3s includes a [Helm Controller](https://github.com/rancher/helm-controller/) that manages Helm charts using a HelmChart Custom Resource Definition (CRD).
### Deploying Manifests and Helm Charts
### Using the Helm CRD
Any file found in `/var/lib/rancher/k3s/server/manifests` will automatically be deployed to Kubernetes in a manner similar to `kubectl apply`.
> **Note:** K3s versions through v0.5.0 used `k3s.cattle.io/v1` as the apiVersion for HelmCharts. This has been changed to `helm.cattle.io/v1` for later versions.
It is also possible to deploy Helm charts. K3s supports a CRD controller for installing charts. A YAML file specification can look as following (example taken from `/var/lib/rancher/k3s/server/manifests/traefik.yaml`):
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: traefik
namespace: kube-system
spec:
chart: stable/traefik
set:
rbac.enabled: "true"
ssl.enabled: "true"
```
Keep in mind that `namespace` in your HelmChart resource metadata section should always be `kube-system`, because the K3s deploy controller is configured to watch this namespace for new HelmChart resources. If you want to specify the namespace for the actual Helm release, you can do that using `targetNamespace` key under the `spec` directive, as shown in the configuration example below.
> **Note:** In order for the Helm Controller to know which version of Helm to use to Auto-Deploy a helm app, please specify the `helmVersion` in the spec of your YAML file.
Also note that besides `set`, you can use `valuesContent` under the `spec` directive. And it's okay to use both of them:
The [HelmChart resource definition](https://github.com/rancher/helm-controller#helm-controller) captures most of the options you would normally pass to the `helm` command-line tool. Here's an example of how you might deploy Grafana from the default chart repository, overriding some of the default chart values. Note that the HelmChart resource itself is in the `kube-system` namespace, but the chart's resources will be deployed to the `monitoring` namespace.
```yaml
apiVersion: helm.cattle.io/v1
@@ -68,34 +48,58 @@ spec:
enabled: true
```
K3s versions `<= v0.5.0` used `k3s.cattle.io` for the API group of HelmCharts. This has been changed to `helm.cattle.io` for later versions.
#### HelmChart Field Definitions
### Using the Helm CRD
| Field | Default | Description | Helm Argument / Flag Equivalent |
|-------|---------|-------------|-------------------------------|
| name | | Helm Chart name | NAME |
| spec.chart | | Helm Chart name in repository, or complete HTTPS URL to chart archive (.tgz) | CHART |
| spec.targetNamespace | default | Helm Chart target namespace | `--namespace` |
| spec.version | | Helm Chart version (when installing from repository) | `--version` |
| spec.repo | | Helm Chart repository URL | `--repo` |
| spec.helmVersion | v3 | Helm version to use (`v2` or `v3`) | |
| spec.bootstrap | False | Set to True if this chart is needed to bootstrap the cluster (Cloud Controller Manager, etc) | |
| spec.set | | Override simple default Chart values. These take precedence over options set via valuesContent. | `--set` / `--set-string` |
| spec.valuesContent | | Override complex default Chart values via YAML file content | `--values` |
| spec.chartContent | | Base64-encoded chart archive .tgz - overrides spec.chart | CHART |
You can deploy a third-party Helm chart using an example like this:
Content placed in `/var/lib/rancher/k3s/server/static/` can be accessed anonymously via the Kubernetes APIServer from within the cluster. This URL can be templated using the special variable `%{KUBERNETES_API}%` in the `spec.chart` field. For example, the packaged Traefik component loads its chart from `https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz`.
### Customizing Packaged Components with HelmChartConfig
To allow overriding values for packaged components that are deployed as HelmCharts (such as Traefik), K3s versions starting with v1.19.0+k3s1 support customizing deployments via a HelmChartConfig resources. The HelmChartConfig resource must match the name and namespace of its corresponding HelmChart, and supports providing additional `valuesContent`, which is passed to the `helm` command as an additional value file.
> **Note:** HelmChart `spec.set` values override HelmChart and HelmChartConfig `spec.valuesContent` settings.
For example, to customize the packaged Traefik ingress configuration, you can create a file named `/var/lib/rancher/k3s/server/manifests/traefik-config.yaml` and populate it with the following content:
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
kind: HelmChartConfig
metadata:
name: nginx
name: traefik
namespace: kube-system
spec:
chart: nginx
repo: https://charts.bitnami.com/bitnami
targetNamespace: default
valuesContent: |-
image: traefik
imageTag: v1.7.26-alpine
proxyProtocol:
enabled: true
trustedIPs:
- 10.0.0.0/8
forwardedHeaders:
enabled: true
trustedIPs:
- 10.0.0.0/8
ssl:
enabled: true
permanentRedirect: false
```
You can install a specific version of a Helm chart using an example like this:
### Upgrading from Helm v2
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: stable/nginx-ingress
namespace: kube-system
spec:
chart: nginx-ingress
version: 1.24.4
targetNamespace: default
```
> **Note:** K3s versions starting with v1.17.0+k3s.1 support Helm v3, and will use it by default. Helm v2 charts can be used by setting `helmVersion: v2` in the spec.
If you were using Helm v2 in previous versions of K3s, you may upgrade to v1.17.0+k3s.1 or newer and Helm 2 will still function. If you wish to migrate to Helm 3, [this](https://helm.sh/blog/migrate-from-helm-v2-to-helm-v3/) blog post by Helm explains how to use a plugin to successfully migrate. Refer to the official Helm 3 documentation [here](https://helm.sh/docs/) for more information. K3s will handle either Helm v2 or Helm v3 as of v1.17.0+k3s.1. Just be sure you have properly set your kubeconfig as per the examples in the section about [cluster access.](../cluster-access)
Note that Helm 3 no longer requires Tiller and the `helm init` command. Refer to the official documentation for details.
@@ -7,7 +7,7 @@ The ability to run Kubernetes using a datastore other than etcd sets K3s apart f
* If your team doesn't have expertise in operating etcd, you can choose an enterprise-grade SQL database like MySQL or PostgreSQL
* If you need to run a simple, short-lived cluster in your CI/CD environment, you can use the embedded SQLite database
* If you wish to deploy Kubernetes on the edge and require a highly available solution but can't afford the operational overhead of managing a database at the edge, you can use K3s's embedded HA datastore built on top of DQLite (currently experimental)
* If you wish to deploy Kubernetes on the edge and require a highly available solution but can't afford the operational overhead of managing a database at the edge, you can use K3s's embedded HA datastore built on top of embedded etcd (currently experimental)
K3s supports the following datastore options:
@@ -16,7 +16,7 @@ K3s supports the following datastore options:
* [MySQL](https://www.mysql.com/) (certified against version 5.7)
* [MariaDB](https://mariadb.org/) (certified against version 10.3.20)
* [etcd](https://etcd.io/) (certified against version 3.3.15)
* Embedded [DQLite](https://dqlite.io/) for High Availability (experimental)
* Embedded etcd for High Availability (experimental)
### External Datastore Configuration Parameters
If you wish to use an external datastore such as PostgreSQL, MySQL, or etcd you must set the `datastore-endpoint` parameter so that K3s knows how to connect to it. You may also specify parameters to configure the authentication and encryption of the connection. The below table summarizes these parameters, which can be passed as either CLI flags or environment variables.
@@ -94,5 +94,6 @@ K3S_DATASTORE_KEYFILE='/path/to/client.key' \
k3s server
```
### Embedded DQLite for HA (Experimental)
K3s's use of DQLite is similar to its use of SQLite. It is simple to set up and manage. As such, there is no external configuration or additional steps to take in order to use this option. Please see [High Availability with Embedded DB (Experimental)]({{<baseurl>}}/k3s/latest/en/installation/ha-embedded/) for instructions on how to run with this option.
### Embedded etcd for HA (Experimental)
Please see [High Availability with Embedded DB (Experimental)]({{<baseurl>}}/k3s/latest/en/installation/ha-embedded/) for instructions on how to run with this option.
@@ -3,9 +3,15 @@ title: "High Availability with Embedded DB (Experimental)"
weight: 40
---
As of v1.0.0, K3s is previewing support for running a highly available control plane without the need for an external database. This means there is no need to manage an external etcd or SQL datastore in order to run a reliable production-grade setup. While this feature is currently experimental, we expect it to be the primary architecture for running HA K3s clusters in the future.
K3s is previewing support for running a highly available control plane without the need for an external database. This means there is no need to manage an external etcd or SQL datastore.
This architecture is achieved by embedding a dqlite database within the K3s server process. DQLite is short for "distributed SQLite." According to https://dqlite.io, it is "*a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices.*" This makes it a natural fit for K3s.
In K3s 1.0.0, Dqlite was used as the experimental embedded database. In K3s v1.19.1+, embedded etcd is used.
Please note that upgrades from experimental Dqlite to experimental embedded etcd are not supported. If you attempt an upgrade it will not succeed and data will be lost.
### Embedded etcd (Experimental)
_Available as of K3s v1.19.1_
To run K3s in this mode, you must have an odd number of server nodes. We recommend starting with three nodes.
@@ -20,3 +26,13 @@ K3S_TOKEN=SECRET k3s server --server https://<ip or hostname of server1>:6443
```
Now you have a highly available control plane. Joining additional worker nodes to the cluster follows the same procedure as a single server cluster.
### Embedded Dqlite (Deprecated)
> **Warning:** Experimental etcd replaced experimental Dqlite in the K3s v1.19.1 release. This is a breaking change. Please note that upgrades from experimental Dqlite to experimental embedded etcd are not supported. If you attempt an upgrade it will not succeed and data will be lost.
As of v1.0.0, K3s previewed support for running a highly available control plane without the need for an external database.
This architecture is achieved by embedding a Dqlite database within the K3s server process. DQLite is short for "distributed SQLite." According to https://dqlite.io, it is "*a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices.*"
To run K3s with the embedded Dqlite database, follow the same steps as the [embedded etcd database](#embedded-etcd-experimental) using a K3s release between v1.0.0 and v1.19.1.
@@ -48,7 +48,7 @@ To configure TLS certificates when launching server nodes, refer to the [datasto
> **Note:** The same installation options available to single-server installs are also available for high-availability installs. For more details, see the [Installation and Configuration Options]({{<baseurl>}}/k3s/latest/en/installation/install-options/) documentation.
By default, server nodes will be schedulable and thus your workloads can get launched on them. If you wish to have a dedicated control plane where no user workloads will run, you can use taints. The <span style='white-space: nowrap'>`node-taint`</span> parameter will allow you to configure nodes with taints, for example <span style='white-space: nowrap'>`--node-taint k3s-controlplane=true:NoExecute`</span>.
By default, server nodes will be schedulable and thus your workloads can get launched on them. If you wish to have a dedicated control plane where no user workloads will run, you can use taints. The <span style='white-space: nowrap'>`node-taint`</span> parameter will allow you to configure nodes with taints, for example <span style='white-space: nowrap'>`--node-taint CriticalAddonsOnly=true:NoExecute`</span>.
Once you've launched the `k3s server` process on all server nodes, ensure that the cluster has come up properly with `k3s kubectl get nodes`. You should see your server nodes in the Ready state.
@@ -9,6 +9,9 @@ This page focuses on the options that can be used when you set up K3s for the fi
- [Options for installation from binary](#options-for-installation-from-binary)
- [Registration options for the K3s server](#registration-options-for-the-k3s-server)
- [Registration options for the K3s agent](#registration-options-for-the-k3s-agent)
- [Configuration File](#configuration-file)
In addition to configuring K3s with environment variables and CLI arguments, K3s can also use a [config file.](#configuration-file)
For more advanced options, refer to [this page.]({{<baseurl>}}/k3s/latest/en/advanced)
@@ -71,3 +74,36 @@ For details on configuring the K3s server, refer to the [server configuration re
### Registration Options for the K3s Agent
For details on configuring the K3s agent, refer to the [agent configuration reference.]({{<baseurl>}}/k3s/latest/en/installation/install-options/agent-config)
### Configuration File
In addition to configuring K3s with environment variables and CLI arguments, K3s can also use a config file.
By default, values present in a YAML file located at `/etc/rancher/k3s/config.yaml` will be used on install.
An example of a basic `server` config file is below:
```yaml
write-kubeconfig-mode: "0644"
tls-san:
- "foo.local"
node-label:
- "foo=bar"
- "something=amazing"
```
In general, CLI arguments map to their respective YAML key, with repeatable CLI arguments being represented as YAML lists.
An identical configuration using solely CLI arguments is shown below to demonstrate this:
```bash
k3s server \
--write-kubeconfig-mode "0644" \
--tls-san "foo.local" \
--node-label "foo=bar" \
--node-label "something=amazing"
```
It is also possible to use both a configuration file and CLI arguments. In these situations, values will be loaded from both sources, but CLI arguments will take precedence. For repeatable arguments such as `--node-label`, the CLI arguments will overwrite all values in the list.
Finally, the location of the config file can be changed either through the cli argument `--config FILE, -c FILE`, or the environment variable `$K3S_CONFIG_FILE`.
@@ -33,7 +33,7 @@ If you are using **Alpine Linux**, follow [these steps]({{<baseurl>}}/k3s/latest
Hardware requirements scale based on the size of your deployments. Minimum recommendations are outlined here.
* RAM: 512MB Minimum
* RAM: 512MB Minimum (we recommend at least 1GB)
* CPU: 1 Minimum
#### Disks
@@ -25,7 +25,7 @@ Mirrors is a directive that defines the names and endpoints of the private regis
```
mirrors:
docker.io:
mycustomreg.com:
endpoint:
- "https://mycustomreg.com:5000"
```
@@ -41,6 +41,7 @@ Directive | Description
`cert_file` | The client certificate path that will be used to authenticate with the registry
`key_file` | The client key path that will be used to authenticate with the registry
`ca_file` | Defines the CA certificate path to be used to verify the registry's server cert file
`insecure_skip_verify` | Boolean that defines if TLS verification should be skipped for the registry
The credentials consist of either username/password or authentication token:
+65 -12
View File
@@ -3,21 +3,27 @@ title: "Networking"
weight: 35
---
>**Note:** CNI options are covered in detail on the [Installation Network Options]({{<baseurl>}}/k3s/latest/en/installation/network-options/) page. Please reference that page for details on Flannel and the various flannel backend options or how to set up your own CNI.
This page explains how CoreDNS, the Traefik Ingress controller, and Klipper service load balancer work within K3s.
Open Ports
----------
Please reference the [Installation Requirements]({{<baseurl>}}/k3s/latest/en/installation/installation-requirements/#networking) page for port information.
Refer to the [Installation Network Options]({{<baseurl>}}/k3s/latest/en/installation/network-options/) page for details on Flannel configuration options and backend selection, or how to set up your own CNI.
CoreDNS
-------
For information on which ports need to be opened for K3s, refer to the [Installation Requirements.]({{<baseurl>}}/k3s/latest/en/installation/installation-requirements/#networking)
- [CoreDNS](#coredns)
- [Traefik Ingress Controller](#traefik-ingress-controller)
- [Service Load Balancer](#service-load-balancer)
- [How the Service LB Works](#how-the-service-lb-works)
- [Usage](#usage)
- [Excluding the Service LB from Nodes](#excluding-the-service-lb-from-nodes)
- [Disabling the Service LB](#disabling-the-service-lb)
# CoreDNS
CoreDNS is deployed on start of the agent. To disable, run each server with the `--disable coredns` option.
If you don't install CoreDNS, you will need to install a cluster DNS provider yourself.
Traefik Ingress Controller
--------------------------
# Traefik Ingress Controller
[Traefik](https://traefik.io/) is a modern HTTP reverse proxy and load balancer made to deploy microservices with ease. It simplifies networking complexity while designing, deploying, and running applications.
@@ -25,15 +31,59 @@ Traefik is deployed by default when starting the server. For more information se
The Traefik ingress controller will use ports 80, 443, and 8080 on the host (i.e. these will not be usable for HostPort or NodePort).
You can tweak traefik to meet your needs by setting options in the traefik.yaml file. Refer to the official [Traefik for Helm Configuration Parameters](https://github.com/helm/charts/tree/master/stable/traefik#configuration) readme for more information.
Traefik can be configured by editing the `traefik.yaml` file. To prevent k3s from using or overwriting the modified version, deploy k3s with `--no-deploy traefik` and store the modified copy in the `k3s/server/manifests` directory. For more information, refer to the official [Traefik for Helm Configuration Parameters.](https://github.com/helm/charts/tree/master/stable/traefik#configuration)
To disable it, start each server with the `--disable traefik` option.
Service Load Balancer
---------------------
# Service Load Balancer
K3s includes a basic service load balancer that uses available host ports. If you try to create a load balancer that listens on port 80, for example, it will try to find a free host in the cluster for port 80. If no port is available, the load balancer will stay in Pending.
Any service load balancer (LB) can be leveraged in your Kubernetes cluster. K3s provides a load balancer known as [Klipper Load Balancer](https://github.com/rancher/klipper-lb) that uses available host ports.
Upstream Kubernetes allows a Service of type LoadBalancer to be created, but doesn't include the implementation of the LB. Some LB services require a cloud provider such as Amazon EC2 or Microsoft Azure. By contrast, the K3s service LB makes it possible to use an LB service without a cloud provider.
### How the Service LB Works
K3s creates a controller that creates a Pod for the service load balancer, which is a Kubernetes object of kind [Service.](https://kubernetes.io/docs/concepts/services-networking/service/)
For each service load balancer, a [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) is created. The DaemonSet creates a pod with the `svc` prefix on each node.
The Service LB controller listens for other Kubernetes Services. After it finds a Service, it creates a proxy Pod for the service using a DaemonSet on all of the nodes. This Pod becomes a proxy to the other Service, so that for example, requests coming to port 8000 on a node could be routed to your workload on port 8888.
If the Service LB runs on a node that has an external IP, it uses the external IP.
If multiple Services are created, a separate DaemonSet is created for each Service.
It is possible to run multiple Services on the same node, as long as they use different ports.
If you try to create a Service LB that listens on port 80, the Service LB will try to find a free host in the cluster for port 80. If no host with that port is available, the LB will stay in Pending.
### Usage
Create a [Service of type LoadBalancer](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) in K3s.
### Excluding the Service LB from Nodes
To exclude nodes from using the Service LB, add the following label to the nodes that should not be excluded:
```
svccontroller.k3s.cattle.io/enablelb
```
If the label is used, the service load balancer only runs on the labeled nodes.
### Disabling the Service LB
<<<<<<< HEAD
To disable it, start each server with the `--disable traefik` option.
=======
To disable the embedded LB, run the server with the `--disable servicelb` option.
>>>>>>> 6d1d47358a30f3687e515ee9a5b4ce1c3afd9a33
This is necessary if you wish to run a different LB, such as MetalLB.
# Nodes Without a Hostname
<<<<<<< HEAD
To disable the embedded load balancer, run the server with the `--disable servicelb` option. This is necessary if you wish to run a different load balancer, such as MetalLB.
Nodes Without a Hostname
@@ -41,3 +91,6 @@ Nodes Without a Hostname
Some cloud providers, such as Linode, will create machines with "localhost" as the hostname and others may not have a hostname set at all. This can cause problems with domain name resolution. You can run K3s with the `--node-name` flag or `K3S_NODE_NAME` environment variable and this will pass the node name to resolve this issue.
=======
Some cloud providers, such as Linode, will create machines with "localhost" as the hostname and others may not have a hostname set at all. This can cause problems with domain name resolution. You can run K3s with the `--node-name` flag or `K3S_NODE_NAME` environment variable and this will pass the node name to resolve this issue.
>>>>>>> 6d1d47358a30f3687e515ee9a5b4ce1c3afd9a33
+2
View File
@@ -8,3 +8,5 @@ This section describes how to upgrade your K3s cluster.
[Upgrade basics]({{< baseurl >}}/k3s/latest/en/upgrades/basic/) describes several techniques for upgrading your cluster manually. It can also be used as a basis for upgrading through third-party Infrastructure-as-Code tools like [Terraform](https://www.terraform.io/).
[Automated upgrades]({{< baseurl >}}/k3s/latest/en/upgrades/automated/) describes how to perform Kubernetes-native automated upgrades using Rancher's [system-upgrade-controller](https://github.com/rancher/system-upgrade-controller).
> The experimental embedded Dqlite data store was deprecated in K3s v1.19.1. Please note that upgrades from experimental Dqlite to experimental embedded etcd are not supported. If you attempt an upgrade it will not succeed and data will be lost.
@@ -38,4 +38,4 @@ For example, on my current test system, I have set the kernel boot line to:
printk.devkmsg=on console=tty1 rancher.autologin=tty1 console=ttyS0 rancher.autologin=ttyS0 rancher.state.dev=LABEL=RANCHER_STATE rancher.state.autoformat=[/dev/sda,/dev/vda] rancher.rm_usr loglevel=8 netconsole=+9999@10.0.2.14/,514@192.168.42.223/
```
The kernel boot parameters can be set during installation using `sudo ros install --append "...."`, or on an installed RancherOS system, by running `sudo ros config syslinx` (which will start vi in a container, editing the `global.cfg` boot config file.
The kernel boot parameters can be set during installation using `sudo ros install --append "...."`, or on an installed RancherOS system, by running `sudo ros config syslinux` (which will start vi in a container, editing the `global.cfg` boot config file.
@@ -3,7 +3,7 @@ title: Quick Start
weight: 1
---
If you have a specific RanchersOS machine requirements, please check out our [guides on running RancherOS]({{< baseurl >}}/os/v1.x/en/installation/platform/). With the rest of this guide, we'll start up a RancherOS using [Docker machine]({{< baseurl >}}/os/v1.x/en/installation/workstation//docker-machine/) and show you some of what RancherOS can do.
If you have a specific RanchersOS machine requirements, please check out our [guides on running RancherOS]({{< baseurl >}}/os/v1.x/en/installation/). With the rest of this guide, we'll start up a RancherOS using [Docker machine]({{< baseurl >}}/os/v1.x/en/installation/workstation//docker-machine/) and show you some of what RancherOS can do.
### Launching RancherOS using Docker Machine
@@ -100,7 +100,7 @@ The table below details the parameters for the group schema configuration.
| Search Attribute | Attribute used to construct search filters when adding groups to clusters or projects. See description of user schema `Search Attribute`. |
| Search Filter | This filter gets applied to the list of groups that is searched when Rancher attempts to add groups to a site access list or tries to add groups to clusters or projects. For example, a group search filter could be <code>(&#124;(cn=group1)(cn=group2))</code>. Note: If the search filter does not use [valid AD search syntax,](https://docs.microsoft.com/en-us/windows/win32/adsi/search-filter-syntax) the list of groups will be empty. |
| Group DN Attribute | The name of the group attribute whose format matches the values in the user attribute describing a the user's memberships. See `User Member Attribute`. |
| Nested Group Membership | This settings defines whether Rancher should resolve nested group memberships. Use only if your organisation makes use of these nested memberships (ie. you have groups that contain other groups as members). |
| Nested Group Membership | This settings defines whether Rancher should resolve nested group memberships. Use only if your organisation makes use of these nested memberships (ie. you have groups that contain other groups as members. We advise avoiding nested groups when possible). |
---
@@ -24,9 +24,36 @@ If your organization uses Keycloak Identity Provider (IdP) for user authenticati
><sup>1</sup>: Optionally, you can enable either one or both of these settings.
><sup>2</sup>: Rancher SAML metadata won't be generated until a SAML provider is configured and saved.
{{< img "/img/rancher/keycloak/keycloak-saml-client-configuration.png" "">}}
- In the new SAML client, create Mappers to expose the users fields
- Add all "Builtin Protocol Mappers"
{{< img "/img/rancher/keycloak/keycloak-saml-client-builtin-mappers.png" "">}}
- Create a new "Group list" mapper to map the member attribute to a user's groups
{{< img "/img/rancher/keycloak/keycloak-saml-client-group-mapper.png" "">}}
- Export a `metadata.xml` file from your Keycloak client:
From the `Installation` tab, choose the `SAML Metadata IDPSSODescriptor` format option and download your file.
>**Note**
> Keycloak versions 6.0.0 and up no longer provide the IDP metadata under the `Installation` tab.
> You can still get the XML from the following url:
>
> `https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}/protocol/saml/descriptor`
>
> The XML obtained from this URL contains `EntitiesDescriptor` as the root element. Rancher expects the root element to be `EntityDescriptor` rather than `EntitiesDescriptor`. So before passing this XML to Rancher, follow these steps to adjust it:
>
> * Copy all the attributes from `EntitiesDescriptor` to the `EntityDescriptor` that are not present.
> * Remove the `<EntitiesDescriptor>` tag from the beginning.
> * Remove the `</EntitiesDescriptor>` from the end of the xml.
>
> You are left with something similar as the example below:
>
> ```
> <EntityDescriptor xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" entityID="https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}">
> ....
> </EntityDescriptor>
> ```
## Configuring Keycloak in Rancher
@@ -35,18 +62,18 @@ If your organization uses Keycloak Identity Provider (IdP) for user authenticati
1. Select **Keycloak**.
1. Complete the **Configure Keycloak Account** form. Keycloak IdP lets you specify what data store you want to use. You can either add a database or use an existing LDAP server. For example, if you select your Active Directory (AD) server, the examples below describe how you can map AD attributes to fields within Rancher.
1. Complete the **Configure Keycloak Account** form.
| Field | Description |
| ------------------------- | ----------------------------------------------------------------------------- |
| Display Name Field | The AD attribute that contains the display name of users. |
| User Name Field | The AD attribute that contains the user name/given name. |
| UID Field | An AD attribute that is unique to every user. |
| Groups Field | Make entries for managing group memberships. |
| Rancher API Host | The URL for your Rancher Server. |
| Private Key / Certificate | A key/certificate pair to create a secure shell between Rancher and your IdP. |
| IDP-metadata | The `metadata.xml` file that you exported from your IdP server. |
| Field | Description |
| ------------------------- | -------------------------------------------------------------------------------------- |
| Display Name Field | The attribute that contains the display name of users. <br/><br/>Example: `givenName` |
| User Name Field | The attribute that contains the user name/given name. <br/><br/>Example: `email` |
| UID Field | An attribute that is unique to every user. <br/><br/>Example: `email` |
| Groups Field | Make entries for managing group memberships. <br/><br/>Example: `member` |
| Rancher API Host | The URL for your Rancher Server. |
| Private Key / Certificate | A key/certificate pair to create a secure shell between Rancher and your IdP. |
| IDP-metadata | The `metadata.xml` file that you exported from your IdP server. |
>**Tip:** You can generate a key/certificate pair using an openssl command. For example:
>
@@ -96,25 +123,3 @@ Try configuring and saving keycloak as your SAML provider and then accessing the
* Check your Keycloak log.
* If the log displays `request validation failed: org.keycloak.common.VerificationException: SigAlg was null`, set `Client Signature Required` to `OFF` in your Keycloak client.
### Keycloak 6.0.0+: IDPSSODescriptor missing from options
Keycloak versions 6.0.0 and up no longer provide the IDP metadata under the `Installation` tab.
You can still get the XML from the following url:
`https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}/protocol/saml/descriptor`
The XML obtained from this URL contains `EntitiesDescriptor` as the root element. Rancher expects the root element to be `EntityDescriptor` rather than `EntitiesDescriptor`. So before passing this XML to Rancher, follow these steps to adjust it:
* Copy all the tags from `EntitiesDescriptor` to the `EntityDescriptor`.
* Remove the `<EntitiesDescriptor>` tag from the beginning.
* Remove the `</EntitiesDescriptor>` from the end of the xml.
You are left with something similar as the example below:
```
<EntityDescriptor xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" entityID="https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}">
....
</EntityDescriptor>
```
@@ -33,6 +33,8 @@ The option to refresh the Kubernetes metadata is available for administrators by
To force Rancher to refresh the Kubernetes metadata, a manual refresh action is available under **Tools > Drivers > Refresh Kubernetes Metadata** on the right side corner.
You can configure Rancher to only refresh metadata when desired by setting `refresh-interval-minutes` to `0` (see below) and using this button to perform the metadata refresh manually when desired.
### Configuring the Metadata Synchronization
> Only administrators can change these settings.
@@ -88,4 +90,4 @@ After new Kubernetes versions are loaded into the Rancher setup, additional step
1. Download `rancher-images.txt`.
1. Prepare the private registry using the same steps during the [air gap install]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/air-gap/populate-private-registry), but instead of using the `rancher-images.txt` from the releases page, use the one obtained from the previous steps.
**Result:** The air gap installation of Rancher can now sync the Kubernetes metadata. If you update your private registry when new versions of Kubernetes are released, you can provision clusters with the new version without having to upgrade Rancher.
**Result:** The air gap installation of Rancher can now sync the Kubernetes metadata. If you update your private registry when new versions of Kubernetes are released, you can provision clusters with the new version without having to upgrade Rancher.
@@ -51,6 +51,8 @@ The steps to add custom roles differ depending on the version of Rancher.
1. Use the **Grant Resources** options to assign individual [Kubernetes API endpoints](https://kubernetes.io/docs/reference/) to the role.
> When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
You can also choose the individual cURL methods (`Create`, `Delete`, `Get`, etc.) available for use with each endpoint you assign.
@@ -82,6 +84,8 @@ The steps to add custom roles differ depending on the version of Rancher.
1. Use the **Grant Resources** options to assign individual [Kubernetes API endpoints](https://kubernetes.io/docs/reference/) to the role.
> When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
You can also choose the individual cURL methods (`Create`, `Delete`, `Get`, etc.) available for use with each endpoint you assign.
@@ -109,6 +113,9 @@ To create a custom global role based on an existing role,
1. Enter a name for the role.
1. Optional: To assign the custom role default for new users, go to the **New User Default** section and click **Yes: Default role for new users.**
1. In the **Grant Resources** section, select the Kubernetes resource operations that will be enabled for users with the custom role.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
1. Click **Save.**
### Creating a Custom Global Role that Does Not Copy Rules from Another Role
@@ -120,6 +127,9 @@ Custom global roles don't have to be based on existing roles. To create a custom
1. Enter a name for the role.
1. Optional: To assign the custom role default for new users, go to the **New User Default** section and click **Yes: Default role for new users.**
1. In the **Grant Resources** section, select the Kubernetes resource operations that will be enabled for users with the custom role.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
1. Click **Save.**
## Deleting a Custom Global Role
@@ -7,8 +7,6 @@ _Permissions_ are individual access rights that you can assign when selecting a
Global Permissions define user authorization outside the scope of any particular cluster. Out-of-the-box, there are three default global permissions: `Administrator`, `Standard User` and `User-base`.
In Rancher v2.5, a restricted-admin role was added.
- **Administrator:** These users have full control over the entire Rancher system and all clusters within it.
- <a id="user"></a>**Standard User:** These users can create new clusters and use them. Standard users can also assign other users permissions to their clusters.
@@ -7,6 +7,7 @@ By default, some cluster-level API tokens are generated with infinite time-to-li
You can deactivate API tokens by deleting them or by deactivating the user account.
### Deleting tokens
To delete a token,
1. Go to the list of all tokens in the Rancher API view at `https://<Rancher-Server-IP>/v3/tokens`.
@@ -19,7 +20,7 @@ Here is the complete list of tokens that are generated with `ttl=0`:
| Token | Description |
|-------|-------------|
| `kubeconfig-*` | Kubeconfig token |
| `kubeconfig-*` | Kubeconfig token |
| `kubectl-shell-*` | Access to `kubectl` shell in the browser |
| `agent-*` | Token for agent deployment |
| `compose-token-*` | Token for compose |
@@ -27,3 +28,21 @@ Here is the complete list of tokens that are generated with `ttl=0`:
| `*-pipeline*` | Pipeline token for project |
| `telemetry-*` | Telemetry token |
| `drain-node-*` | Token for drain (we use `kubectl` for drain because there is no native Kubernetes API) |
### Setting TTL on Kubeconfig Tokens
_**Available as of v2.4.6**_
Starting Rancher v2.4.6, admins can set a global TTL on Kubeconfig tokens. Once the token expires the kubectl command will require the user to authenticate to Rancher.
1. Disable the kubeconfig-generate-token setting in the Rancher API view at `https://<Rancher-Server-IP/v3/settings/kubeconfig-generate-token`. This setting instructs Rancher to no longer automatically generate a token when a user clicks on download a kubeconfig file. The kubeconfig file will now provide a command to login to Rancher.
2. Edit the setting and set the value to `false`.
3. Go to setting kubeconfig-token-ttl-minutes in the Rancher API view at `https://<Rancher-Server-IP/v3/settings/kubeconfig-token-ttl-minutes`. By default, kubeconfig-token-ttl-minutes is 960 (16 hours).
4. Edit the setting and set the value to desired duration in minutes.
_**Note:**_ This value cannot exceed max-ttl of API tokens.(`https://<Rancher-Server-IP/v3/settings/auth-token-max-ttl-minutes`). In Rancher v2.4.6, auth-token-max-ttl-minutes is set to 1440 (24 hours) by default. Starting Rancher v2.4.7, auth-token-max-ttl-minutes would default to 0 allowing tokens to never expire, similar to v2.4.5.
@@ -0,0 +1,134 @@
---
title: Restoring Backups—Kubernetes installs
shortTitle: Kubernetes Installs
weight: 370
aliases:
- /rancher/v2.x/en/installation/after-installation/ha-backup-and-restoration/
---
This procedure describes how to use RKE to restore a snapshot of the Rancher Kubernetes cluster.
This will restore the Kubernetes configuration and the Rancher database and state.
> **Note:** This document covers clusters set up with RKE >= v0.2.x, for older RKE versions refer to the [RKE Documentation]({{<baseurl>}}/rke/latest/en/etcd-snapshots/restoring-from-backup).
## Restore Outline
<!-- TOC -->
- [1. Preparation](#1-preparation)
- [2. Place Snapshot](#2-place-snapshot)
- [3. Configure RKE](#3-configure-rke)
- [4. Restore the Database and bring up the Cluster](#4-restore-the-database-and-bring-up-the-cluster)
<!-- /TOC -->
### 1. Preparation
It is advised that you run the restore from your local host or a jump box/bastion where your cluster yaml, rke statefile, and kubeconfig are stored. You will need [RKE]({{<baseurl>}}/rke/latest/en/installation/) and [kubectl]({{<baseurl>}}/rancher/v2.x/en/faq/kubectl/) CLI utilities installed locally.
Prepare by creating 3 new nodes to be the target for the restored Rancher instance. We recommend that you start with fresh nodes and a clean state. For clarification on the requirements, review the [Installation Requirements](https://rancher.com/docs/rancher/v2.x/en/installation/requirements/).
Alternatively you can re-use the existing nodes after clearing Kubernetes and Rancher configurations. This will destroy the data on these nodes. See [Node Cleanup]({{<baseurl>}}/rancher/v2.x/en/faq/cleaning-cluster-nodes/) for the procedure.
> **IMPORTANT:** Before starting the restore make sure all the Kubernetes services on the old cluster nodes are stopped. We recommend powering off the nodes to be sure.
### 2. Place Snapshot
As of RKE v0.2.0, snapshots could be saved in an S3 compatible backend. To restore your cluster from the snapshot stored in S3 compatible backend, you can skip this step and retrieve the snapshot in [4. Restore the Database and bring up the Cluster](#4-restore-the-database-and-bring-up-the-cluster). Otherwise, you will need to place the snapshot directly on one of the etcd nodes.
Pick one of the clean nodes that will have the etcd role assigned and place the zip-compressed snapshot file in `/opt/rke/etcd-snapshots` on that node.
> **Note:** Because of a current limitation in RKE, the restore process does not work correctly if `/opt/rke/etcd-snapshots` is a NFS share that is mounted on all nodes with the etcd role. The easiest options are to either keep `/opt/rke/etcd-snapshots` as a local folder during the restore process and only mount the NFS share there after it has been completed, or to only mount the NFS share to one node with an etcd role in the beginning.
### 3. Configure RKE
Use your original `rancher-cluster.yml` and `rancher-cluster.rkestate` files. If they are not stored in a version control system, it is a good idea to back them up before making any changes.
```
cp rancher-cluster.yml rancher-cluster.yml.bak
cp rancher-cluster.rkestate rancher-cluster.rkestate.bak
```
If the replaced or cleaned nodes have been configured with new IP addresses, modify the `rancher-cluster.yml` file to ensure the address and optional internal_address fields reflect the new addresses.
> **IMPORTANT:** You should not rename the `rancher-cluster.yml` or `rancher-cluster.rkestate` files. It is important that the filenames match each other.
### 4. Restore the Database and bring up the Cluster
You will now use the RKE command-line tool with the `rancher-cluster.yml` and the `rancher-cluster.rkestate` configuration files to restore the etcd database and bring up the cluster on the new nodes.
> **Note:** Ensure your `rancher-cluster.rkestate` is present in the same directory as the `rancher-cluster.yml` file before starting the restore, as this file contains the certificate data for the cluster.
#### Restoring from a Local Snapshot
When restoring etcd from a local snapshot, the snapshot is assumed to be located on the target node in the directory `/opt/rke/etcd-snapshots`.
```
rke etcd snapshot-restore --name snapshot-name --config ./rancher-cluster.yml
```
> **Note:** The --name parameter expects the filename of the snapshot without the extension.
#### Restoring from a Snapshot in S3
_Available as of RKE v0.2.0_
When restoring etcd from a snapshot located in an S3 compatible backend, the command needs the S3 information in order to connect to the S3 backend and retrieve the snapshot.
```
$ rke etcd snapshot-restore --config ./rancher-cluster.yml --name snapshot-name \
--s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
--bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com \
--folder folder-name # Available as of v2.3.0
```
#### Options for `rke etcd snapshot-restore`
S3 specific options are only available for RKE v0.2.0+.
| Option | Description | S3 Specific |
| --- | --- | ---|
| `--name` value | Specify snapshot name | |
| `--config` value | Specify an alternate cluster YAML file (default: "cluster.yml") [$RKE_CONFIG] | |
| `--s3` | Enabled backup to s3 |* |
| `--s3-endpoint` value | Specify s3 endpoint url (default: "s3.amazonaws.com") | * |
| `--access-key` value | Specify s3 accessKey | *|
| `--secret-key` value | Specify s3 secretKey | *|
| `--bucket-name` value | Specify s3 bucket name | *|
| `--folder` value | Specify s3 folder in the bucket name _Available as of v2.3.0_ | *|
| `--region` value | Specify the s3 bucket location (optional) | *|
| `--ssh-agent-auth` | [Use SSH Agent Auth defined by SSH_AUTH_SOCK]({{<baseurl>}}/rke/latest/en/config-options/#ssh-agent) | |
| `--ignore-docker-version` | [Disable Docker version check]({{<baseurl>}}/rke/latest/en/config-options/#supported-docker-versions) |
#### Testing the Cluster
Once RKE completes it will have created a credentials file in the local directory. Configure `kubectl` to use the `kube_config_rancher-cluster.yml` credentials file and check on the state of the cluster. See [Installing and Configuring kubectl]({{<baseurl>}}/rancher/v2.x/en/faq/kubectl/#configuration) for details.
#### Check Kubernetes Pods
Wait for the pods running in `kube-system`, `ingress-nginx` and the `rancher` pod in `cattle-system` to return to the `Running` state.
> **Note:** `cattle-cluster-agent` and `cattle-node-agent` pods will be in an `Error` or `CrashLoopBackOff` state until Rancher server is up and the DNS/Load Balancer have been pointed at the new cluster.
```
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-766585f6b-kj88m 0/1 Error 6 4m
cattle-system cattle-node-agent-wvhqm 0/1 Error 8 8m
cattle-system rancher-78947c8548-jzlsr 0/1 Running 1 4m
ingress-nginx default-http-backend-797c5bc547-f5ztd 1/1 Running 1 4m
ingress-nginx nginx-ingress-controller-ljvkf 1/1 Running 1 8m
kube-system canal-4pf9v 3/3 Running 3 8m
kube-system cert-manager-6b47fc5fc-jnrl5 1/1 Running 1 4m
kube-system kube-dns-7588d5b5f5-kgskt 3/3 Running 3 4m
kube-system kube-dns-autoscaler-5db9bbb766-s698d 1/1 Running 1 4m
kube-system metrics-server-97bc649d5-6w7zc 1/1 Running 1 4m
kube-system tiller-deploy-56c4cf647b-j4whh 1/1 Running 1 4m
```
#### Finishing Up
Rancher should now be running and available to manage your Kubernetes clusters. Review the [recommended architecture]({{<baseurl>}}/rancher/v2.x/en/installation/k8s-install/#recommended-architecture) for Kubernetes installations and update the endpoints for Rancher DNS or the Load Balancer that you built during Step 1 of the Kubernetes install ([1. Create Nodes and Load Balancer]({{<baseurl>}}/rancher/v2.x/en/installation/k8s-install/create-nodes-lb/#load-balancer)) to target the new cluster. Once the endpoints are updated, the agents on your managed clusters should automatically reconnect. This may take 10-15 minutes due to reconnect back off timeouts.
> **IMPORTANT:** Remember to save your updated RKE config (`rancher-cluster.yml`) state file (`rancher-cluster.rkestate`) and `kubectl` credentials (`kube_config_rancher-cluster.yml`) files in a safe place for future maintenance for example in a version control system.
@@ -35,3 +35,5 @@ Rancher contains a variety of tools that aren't included in Kubernetes to assist
- Monitoring
- Istio Service Mesh
- OPA Gatekeeper
For more information, see [Tools]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/)
@@ -13,6 +13,11 @@ This kubeconfig file and its contents are specific to the cluster you are viewin
After you download the kubeconfig file, you will be able to use the kubeconfig file and its Kubernetes [contexts](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#kubectl-context-and-configuration) to access your downstream cluster.
_Available as of v2.4.6_
If admins have [enforced TTL on kubeconfig tokens](../../api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires [rancher cli](../cli) to be present in your PATH.
### Two Authentication Methods for RKE Clusters
If the cluster is not an [RKE cluster,]({{<baseurl>}}/rancher/v2.x/en/cluster-provisioning/rke-clusters/) the kubeconfig file allows you to access the cluster in only one way: it lets you be authenticated with the Rancher server, then Rancher allows you to run kubectl commands on the cluster.
@@ -0,0 +1,31 @@
---
title: Release Notes
---
# Important note on Istio 1.5.x versions
When upgrading from any 1.4 version of Istio to any 1.5 version, the Rancher installer will delete several resources in order to complete the upgrade, at which point they will be immediately re-installed. This includes the `istio-reader-service-account`. If your Istio installation is using this service account be aware that any secrets tied to the service account will be deleted. Most notably this will **break specific [multi-cluster deployments](https://archive.istio.io/v1.4/docs/setup/install/multicluster/)**. Downgrades back to 1.4 are not possible.
See the official upgrade notes for additional information on the 1.5 release and upgrading from 1.4: https://istio.io/latest/news/releases/1.5.x/announcing-1.5/upgrade-notes/
> **Note:** Rancher continues to use the Helm installation method, which produces a different architecture from an istioctl installation.
## Istio 1.5.9 release notes
**Bug fixes**
* The Kiali traffic graph is now working [#28109](https://github.com/rancher/rancher/issues/28109)
**Known Issues**
* The Kiali traffic graph is offset in the UI [#28207](https://github.com/rancher/rancher/issues/28207)
## Istio 1.5.8 release notes
**Known Issues**
* The Kiali traffic graph is currently not working [#24924](https://github.com/istio/istio/issues/24924)
@@ -0,0 +1,489 @@
---
title: Prometheus Custom Metrics Adapter
weight: 5
---
After you've enabled [cluster level monitoring]({{< baseurl >}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
## Deploy Prometheus Custom Metrics Adapter
We are going to use the [Prometheus custom metrics adapter](https://github.com/DirectXMan12/k8s-prometheus-adapter/releases/tag/v0.5.0), version v0.5.0. This is a great example for the [custom metrics server](https://github.com/kubernetes-incubator/custom-metrics-apiserver). And you must be the *cluster owner* to execute following steps.
- Get the service account of the cluster monitoring is using. It should be configured in the workload ID: `statefulset:cattle-prometheus:prometheus-cluster-monitoring`. And if you didn't customize anything, the service account name should be `cluster-monitoring`.
- Grant permission to that service account. You will need two kinds of permission.
One role is `extension-apiserver-authentication-reader` in `kube-system`, so you will need to create a `Rolebinding` to in `kube-system`. This permission is to get api aggregation configuration from config map in `kube-system`.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: custom-metrics-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
The other one is cluster role `system:auth-delegator`, so you will need to create a `ClusterRoleBinding`. This permission is to have subject access review permission.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-metrics:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
- Create configuration for custom metrics adapter. Following is an example configuration. There will be a configuration details in next session.
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: cattle-prometheus
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)$
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
template: <<.Resource>>
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
template: <<.Resource>>
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
resourceRules:
cpu:
containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
memory:
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
window: 1m
```
- Create HTTPS TLS certs for your api server. You can use following command to create a self-signed cert.
```bash
openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out serving.crt -keyout serving.key -subj "/C=CN/CN=custom-metrics-apiserver.cattle-prometheus.svc.cluster.local"
# And you will find serving.crt and serving.key in your path. And then you are going to create a secret in cattle-prometheus namespace.
kubectl create secret generic -n cattle-prometheus cm-adapter-serving-certs --from-file=serving.key=./serving.key --from-file=serving.crt=./serving.crt
```
- Then you can create the prometheus custom metrics adapter. And you will need a service for this deployment too. Creating it via Import YAML or Rancher would do. Please create those resources in `cattle-prometheus` namespaces.
Here is the prometheus custom metrics adapter deployment.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
replicas: 1
selector:
matchLabels:
app: custom-metrics-apiserver
template:
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
serviceAccountName: cluster-monitoring
containers:
- name: custom-metrics-apiserver
image: directxman12/k8s-prometheus-adapter-amd64:v0.5.0
args:
- --secure-port=6443
- --tls-cert-file=/var/run/serving-cert/serving.crt
- --tls-private-key-file=/var/run/serving-cert/serving.key
- --logtostderr=true
- --prometheus-url=http://prometheus-operated/
- --metrics-relist-interval=1m
- --v=10
- --config=/etc/adapter/config.yaml
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
name: volume-serving-cert
readOnly: true
- mountPath: /etc/adapter/
name: config
readOnly: true
- mountPath: /tmp
name: tmp-vol
volumes:
- name: volume-serving-cert
secret:
secretName: cm-adapter-serving-certs
- name: config
configMap:
name: adapter-config
- name: tmp-vol
emptyDir: {}
```
Here is the service of the deployment.
```yaml
apiVersion: v1
kind: Service
metadata:
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: custom-metrics-apiserver
```
- Create API service for your custom metric server.
```yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: cattle-prometheus
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
```
- Then you can verify your custom metrics server by `kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1`. If you see the return datas from the api, it means that the metrics server has been successfully set up.
- You create HPA with custom metrics now. Here is an example of HPA. You will need to create a nginx deployment in your namespace first.
```yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: nginx
spec:
scaleTargetRef:
# point the HPA at the nginx deployment you just created
apiVersion: apps/v1
kind: Deployment
name: nginx
# autoscale between 1 and 10 replicas
minReplicas: 1
maxReplicas: 10
metrics:
# use a "Pods" metric, which takes the average of the
# given metric across all pods controlled by the autoscaling target
- type: Pods
pods:
metricName: memory_usage_bytes
targetAverageValue: 5000000
```
And then, you should see your nginx is scaling up. HPA with custom metrics works.
## Configuration of prometheus custom metrics adapter
> Refer to https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/config.md
The adapter determines which metrics to expose, and how to expose them,
through a set of "discovery" rules. Each rule is executed independently
(so make sure that your rules are mutually exclusive), and specifies each
of the steps the adapter needs to take to expose a metric in the API.
Each rule can be broken down into roughly four parts:
- *Discovery*, which specifies how the adapter should find all Prometheus
metrics for this rule.
- *Association*, which specifies how the adapter should determine which
Kubernetes resources a particular metric is associated with.
- *Naming*, which specifies how the adapter should expose the metric in
the custom metrics API.
- *Querying*, which specifies how a request for a particular metric on one
or more Kubernetes objects should be turned into a query to Prometheus.
A more comprehensive configuration file can be found in
[sample-config.yaml](sample-config.yaml), but a basic config with one rule
might look like:
```yaml
rules:
# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
resources:
# skip specifying generic resource<->label mappings, and just
# attach only pod and namespace resources by mapping label names to group-resources
overrides:
namespace: {resource: "namespace"},
pod_name: {resource: "pod"},
# specify that the `container_` and `_seconds_total` suffixes should be removed.
# this also introduces an implicit filter on metric family names
name:
# we use the value of the capture group implicitly as the API name
# we could also explicitly write `as: "$1"`
matches: "^container_(.*)_seconds_total$"
# specify how to construct a query to fetch samples for a given series
# This is a Go template where the `.Series` and `.LabelMatchers` string values
# are available, and the delimiters are `<<` and `>>` to avoid conflicts with
# the prometheus query language
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
### Discovery
Discovery governs the process of finding the metrics that you want to
expose in the custom metrics API. There are two fields that factor into
discovery: `seriesQuery` and `seriesFilters`.
`seriesQuery` specifies Prometheus series query (as passed to the
`/api/v1/series` endpoint in Prometheus) to use to find some set of
Prometheus series. The adapter will strip the label values from this
series, and then use the resulting metric-name-label-names combinations
later on.
In many cases, `seriesQuery` will be sufficient to narrow down the list of
Prometheus series. However, sometimes (especially if two rules might
otherwise overlap), it's useful to do additional filtering on metric
names. In this case, `seriesFilters` can be used. After the list of
series is returned from `seriesQuery`, each series has its metric name
filtered through any specified filters.
Filters may be either:
- `is: <regex>`, which matches any series whose name matches the specified
regex.
- `isNot: <regex>`, which matches any series whose name does not match the
specified regex.
For example:
```yaml
# match all cAdvisor metrics that aren't measured in seconds
seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
isNot: "^container_.*_seconds_total"
```
### Association
Association governs the process of figuring out which Kubernetes resources
a particular metric could be attached to. The `resources` field controls
this process.
There are two ways to associate resources with a particular metric. In
both cases, the value of the label becomes the name of the particular
object.
One way is to specify that any label name that matches some particular
pattern refers to some group-resource based on the label name. This can
be done using the `template` field. The pattern is specified as a Go
template, with the `Group` and `Resource` fields representing group and
resource. You don't necessarily have to use the `Group` field (in which
case the group is guessed by the system). For instance:
```yaml
# any label `kube_<group>_<resource>` becomes <group>.<resource> in Kubernetes
resources:
template: "kube_<<.Group>>_<<.Resource>>"
```
The other way is to specify that some particular label represents some
particular Kubernetes resource. This can be done using the `overrides`
field. Each override maps a Prometheus label to a Kubernetes
group-resource. For instance:
```yaml
# the microservice label corresponds to the apps.deployment resource
resource:
overrides:
microservice: {group: "apps", resource: "deployment"}
```
These two can be combined, so you can specify both a template and some
individual overrides.
The resources mentioned can be any resource available in your kubernetes
cluster, as long as you've got a corresponding label.
### Naming
Naming governs the process of converting a Prometheus metric name into
a metric in the custom metrics API, and vice versa. It's controlled by
the `name` field.
Naming is controlled by specifying a pattern to extract an API name from
a Prometheus name, and potentially a transformation on that extracted
value.
The pattern is specified in the `matches` field, and is just a regular
expression. If not specified, it defaults to `.*`.
The transformation is specified by the `as` field. You can use any
capture groups defined in the `matches` field. If the `matches` field
doesn't contain capture groups, the `as` field defaults to `$0`. If it
contains a single capture group, the `as` field defautls to `$1`.
Otherwise, it's an error not to specify the as field.
For example:
```yaml
# match turn any name <name>_total to <name>_per_second
# e.g. http_requests_total becomes http_requests_per_second
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
```
### Querying
Querying governs the process of actually fetching values for a particular
metric. It's controlled by the `metricsQuery` field.
The `metricsQuery` field is a Go template that gets turned into
a Prometheus query, using input from a particular call to the custom
metrics API. A given call to the custom metrics API is distilled down to
a metric name, a group-resource, and one or more objects of that
group-resource. These get turned into the following fields in the
template:
- `Series`: the metric name
- `LabelMatchers`: a comma-separated list of label matchers matching the
given objects. Currently, this is the label for the particular
group-resource, plus the label for namespace, if the group-resource is
namespaced.
- `GroupBy`: a comma-separated list of labels to group by. Currently,
this contains the group-resource label used in `LabelMatchers`.
For instance, suppose we had a series `http_requests_total` (exposed as
`http_requests_per_second` in the API) with labels `service`, `pod`,
`ingress`, `namespace`, and `verb`. The first four correspond to
Kubernetes resources. Then, if someone requested the metric
`pods/http_request_per_second` for the pods `pod1` and `pod2` in the
`somens` namespace, we'd have:
- `Series: "http_requests_total"`
- `LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"`
- `GroupBy`: `pod`
Additionally, there are two advanced fields that are "raw" forms of other
fields:
- `LabelValuesByName`: a map mapping the labels and values from the
`LabelMatchers` field. The values are pre-joined by `|`
(for used with the `=~` matcher in Prometheus).
- `GroupBySlice`: the slice form of `GroupBy`.
In general, you'll probably want to use the `Series`, `LabelMatchers`, and
`GroupBy` fields. The other two are for advanced usage.
The query is expected to return one value for each object requested. The
adapter will use the labels on the returned series to associate a given
series back to its corresponding object.
For example:
```yaml
# convert cumulative cAdvisor metrics into rates calculated over 2 minutes
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
@@ -0,0 +1,430 @@
---
title: Prometheus Expressions
weight: 4
---
The PromQL expressions in this doc can be used to configure [alerts.]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/alerts/)
> Before expression can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/monitoring/#enabling-cluster-monitoring) or [at the project level.]({{<baseurl>}}/rancher/v2.x/en/project-admin/tools/monitoring/#enabling-project-monitoring)
For more information about querying Prometheus, refer to the official [Prometheus documentation.](https://prometheus.io/docs/prometheus/latest/querying/basics/)
<!-- TOC -->
- [Cluster Metrics](#cluster-metrics)
- [Cluster CPU Utilization](#cluster-cpu-utilization)
- [Cluster Load Average](#cluster-load-average)
- [Cluster Memory Utilization](#cluster-memory-utilization)
- [Cluster Disk Utilization](#cluster-disk-utilization)
- [Cluster Disk I/O](#cluster-disk-i-o)
- [Cluster Network Packets](#cluster-network-packets)
- [Cluster Network I/O](#cluster-network-i-o)
- [Node Metrics](#node-metrics)
- [Node CPU Utilization](#node-cpu-utilization)
- [Node Load Average](#node-load-average)
- [Node Memory Utilization](#node-memory-utilization)
- [Node Disk Utilization](#node-disk-utilization)
- [Node Disk I/O](#node-disk-i-o)
- [Node Network Packets](#node-network-packets)
- [Node Network I/O](#node-network-i-o)
- [Etcd Metrics](#etcd-metrics)
- [Etcd Has a Leader](#etcd-has-a-leader)
- [Number of Times the Leader Changes](#number-of-times-the-leader-changes)
- [Number of Failed Proposals](#number-of-failed-proposals)
- [GRPC Client Traffic](#grpc-client-traffic)
- [Peer Traffic](#peer-traffic)
- [DB Size](#db-size)
- [Active Streams](#active-streams)
- [Raft Proposals](#raft-proposals)
- [RPC Rate](#rpc-rate)
- [Disk Operations](#disk-operations)
- [Disk Sync Duration](#disk-sync-duration)
- [Kubernetes Components Metrics](#kubernetes-components-metrics)
- [API Server Request Latency](#api-server-request-latency)
- [API Server Request Rate](#api-server-request-rate)
- [Scheduling Failed Pods](#scheduling-failed-pods)
- [Controller Manager Queue Depth](#controller-manager-queue-depth)
- [Scheduler E2E Scheduling Latency](#scheduler-e2e-scheduling-latency)
- [Scheduler Preemption Attempts](#scheduler-preemption-attempts)
- [Ingress Controller Connections](#ingress-controller-connections)
- [Ingress Controller Request Process Time](#ingress-controller-request-process-time)
- [Rancher Logging Metrics](#rancher-logging-metrics)
- [Fluentd Buffer Queue Rate](#fluentd-buffer-queue-rate)
- [Fluentd Input Rate](#fluentd-input-rate)
- [Fluentd Output Errors Rate](#fluentd-output-errors-rate)
- [Fluentd Output Rate](#fluentd-output-rate)
- [Workload Metrics](#workload-metrics)
- [Workload CPU Utilization](#workload-cpu-utilization)
- [Workload Memory Utilization](#workload-memory-utilization)
- [Workload Network Packets](#workload-network-packets)
- [Workload Network I/O](#workload-network-i-o)
- [Workload Disk I/O](#workload-disk-i-o)
- [Pod Metrics](#pod-metrics)
- [Pod CPU Utilization](#pod-cpu-utilization)
- [Pod Memory Utilization](#pod-memory-utilization)
- [Pod Network Packets](#pod-network-packets)
- [Pod Network I/O](#pod-network-i-o)
- [Pod Disk I/O](#pod-disk-i-o)
- [Container Metrics](#container-metrics)
- [Container CPU Utilization](#container-cpu-utilization)
- [Container Memory Utilization](#container-memory-utilization)
- [Container Disk I/O](#container-disk-i-o)
<!-- /TOC -->
# Cluster Metrics
### Cluster CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])))` |
### Cluster Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr></table> |
### Cluster Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes) by (instance) / sum(node_memory_MemTotal_bytes) by (instance)` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)` |
### Cluster Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) - sum(node_filesystem_free_bytes{device!="rootfs"}) by (instance)) / sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs"}) - sum(node_filesystem_free_bytes{device!="rootfs"})) / sum(node_filesystem_size_bytes{device!="rootfs"})` |
### Cluster Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m])) by (instance)`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m]))`</td></tr></table> |
### Cluster Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
### Cluster Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
# Node Metrics
### Node CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `avg(irate(node_cpu_seconds_total{mode!="idle", instance=~"$instance"}[5m])) by (mode)` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])))` |
### Node Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
### Node Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"})` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"}) ` |
### Node Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"}) by (device)) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"})) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"})` |
### Node Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
### Node Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
### Node Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
# Etcd Metrics
### Etcd Has a Leader
`max(etcd_server_has_leader)`
### Number of Times the Leader Changes
`max(etcd_server_leader_changes_seen_total)`
### Number of Failed Proposals
`sum(etcd_server_proposals_failed_total)`
### GRPC Client Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m]))`</td></tr></table> |
### Peer Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m]))`</td></tr></table> |
### DB Size
| Catalog | Expression |
| --- | --- |
| Detail | `sum(etcd_debugging_mvcc_db_total_size_in_bytes) by (instance)` |
| Summary | `sum(etcd_debugging_mvcc_db_total_size_in_bytes)` |
### Active Streams
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance)`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"})`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"})`</td></tr></table> |
### Raft Proposals
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m])) by (instance)`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m])) by (instance)`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m])) by (instance)`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m]))`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m]))`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m]))`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m]))`</td></tr></table> |
### RPC Rate
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) by (instance)`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m]))`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m]))`</td></tr></table> |
### Disk Operations
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) by (instance)`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m]))`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m]))`</td></tr></table> |
### Disk Sync Duration
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>wal</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))`</td></tr><tr><td>db</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))`</td></tr></table> |
| Summary | <table><tr><td>wal</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr><tr><td>db</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr></table> |
# Kubernetes Components Metrics
### API Server Request Latency
| Catalog | Expression |
| --- | --- |
| Detail | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance, verb) /1e+06` |
| Summary | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance) /1e+06` |
### API Server Request Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(apiserver_request_count[5m])) by (instance, code)` |
| Summary | `sum(rate(apiserver_request_count[5m])) by (instance)` |
### Scheduling Failed Pods
| Catalog | Expression |
| --- | --- |
| Detail | `sum(kube_pod_status_scheduled{condition="false"})` |
| Summary | `sum(kube_pod_status_scheduled{condition="false"})` |
### Controller Manager Queue Depth
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>volumes</td><td>`sum(volumes_depth) by instance`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth) by instance`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth) by instance`</td></tr><tr><td>service</td><td>`sum(service_depth) by instance`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth) by instance`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth) by instance`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth) by instance`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth) by instance`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth) by instance`</td></tr></table> |
| Summary | <table><tr><td>volumes</td><td>`sum(volumes_depth)`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth)`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth)`</td></tr><tr><td>service</td><td>`sum(service_depth)`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth)`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth)`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth)`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth)`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth)`</td></tr></table> |
### Scheduler E2E Scheduling Latency
| Catalog | Expression |
| --- | --- |
| Detail | `histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06` |
| Summary | `sum(histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06)` |
### Scheduler Preemption Attempts
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(scheduler_total_preemption_attempts[5m])) by (instance)` |
| Summary | `sum(rate(scheduler_total_preemption_attempts[5m]))` |
### Ingress Controller Connections
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) by (instance)`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) by (instance)`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) by (instance)`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) by (instance)`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) by (instance)`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"})`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"})`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"})`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m])))`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m])))`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m])))`</td></tr></table> |
### Ingress Controller Request Process Time
| Catalog | Expression |
| --- | --- |
| Detail | `topk(10, histogram_quantile(0.95,sum by (le, host, path)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
| Summary | `topk(10, histogram_quantile(0.95,sum by (le, host)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
# Rancher Logging Metrics
### Fluentd Buffer Queue Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_buffer_queue_length[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_buffer_queue_length[5m]))` |
### Fluentd Input Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_input_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_input_status_num_records_total[5m]))` |
### Fluentd Output Errors Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_errors[5m])) by (type)` |
| Summary | `sum(rate(fluentd_output_status_num_errors[5m]))` |
### Fluentd Output Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_num_records_total[5m]))` |
# Workload Metrics
### Workload CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) by (pod_name)` |
| Summary | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""})` |
### Workload Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
# Pod Metrics
### Pod CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr></table> |
### Pod Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) by (container_name)` |
| Summary | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""})` |
### Pod Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
# Container Metrics
### Container CPU Utilization
| Catalog | Expression |
| --- | --- |
| cfs throttled seconds | `sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| usage seconds | `sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| system seconds | `sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| user seconds | `sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
### Container Memory Utilization
`sum(container_memory_working_set_bytes{namespace="$namespace",pod_name="$podName",container_name="$containerName"})`
### Container Disk I/O
| Catalog | Expression |
| --- | --- |
| read | `sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| write | `sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
@@ -50,8 +50,7 @@ When upgrading the Kubernetes version of a cluster, we recommend that you:
1. Take a snapshot.
1. Initiate a Kubernetes upgrade.
1. If the upgrade fails, revert the cluster to the pre-upgrade Kubernetes version. Before restoring the cluster from the snapshot in the etcd datastore, the cluster should be running the pre-upgrade Kubernetes version.
1. Restore the cluster from the etcd snapshot.
1. If the upgrade fails, revert the cluster to the pre-upgrade Kubernetes version. This is achieved by selecting the **Restore etcd and Kubernetes version** option. This will return your cluster to the pre-upgrade kubernetes version before restoring the etcd snapshot.
The restore operation will work on a cluster that is not in a healthy or active state.
{{% /tab %}}
@@ -158,4 +157,4 @@ A failed node could be in many different states:
- User drains a node while upgrade is in process, so there are no kubelets on the node
- The upgrade itself failed
If the max unavailable number of nodes is reached during an upgrade, Rancher user clusters will be stuck in updating state and not move forward with upgrading any other control plane nodes. It will continue to evaluate the set of unavailable nodes in case one of the nodes becomes available. If the node cannot be fixed, you must remove the node in order to continue the upgrade.
If the max unavailable number of nodes is reached during an upgrade, Rancher user clusters will be stuck in updating state and not move forward with upgrading any other control plane nodes. It will continue to evaluate the set of unavailable nodes in case one of the nodes becomes available. If the node cannot be fixed, you must remove the node in order to continue the upgrade.
@@ -0,0 +1,22 @@
---
headless: true
---
| Action | [Rancher launched Kubernetes Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-provisioning/rke-clusters/) | [Hosted Kubernetes Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-provisioning/hosted-kubernetes-clusters/) | [Imported Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-provisioning/imported-clusters) |
| --- | --- | ---| ---|
| [Using kubectl and a kubeconfig file to Access a Cluster]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/) | ✓ | ✓ | ✓ |
| [Managing Cluster Members]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/cluster-members/) | ✓ | ✓ | ✓ |
| [Editing and Upgrading Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/editing-clusters/) | ✓ | ✓ | * |
| [Managing Nodes]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/nodes) | ✓ | ✓ | ✓ |
| [Managing Persistent Volumes and Storage Classes]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/volumes-and-storage/) | ✓ | ✓ | ✓ |
| [Managing Projects, Namespaces and Workloads]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/projects-and-namespaces/) | ✓ | ✓ | ✓ |
| [Using App Catalogs]({{<baseurl>}}/rancher/v2.x/en/catalog/) | ✓ | ✓ | ✓ |
| [Configuring Tools (Alerts, Notifiers, Logging, Monitoring, Istio)]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/tools/) | ✓ | ✓ | ✓ |
| [Cloning Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cloning-clusters/)| ✓ | ✓ | |
| [Ability to rotate certificates]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/certificate-rotation/) | ✓ | | |
| [Ability to back up your Kubernetes Clusters]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/backing-up-etcd/) | ✓ | | |
| [Ability to recover and restore etcd]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/restoring-etcd/) | ✓ | | |
| [Cleaning Kubernetes components when clusters are no longer reachable from Rancher]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cleaning-cluster-nodes/) | ✓ | | |
| [Configuring Pod Security Policies]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/pod-security-policy/) | ✓ | | |
| [Running Security Scans]({{<baseurl>}}/rancher/v2.x/en/security/security-scan/) | ✓ | | |
/* Cluster configuration options can't be edited for imported clusters, except for [K3s clusters.]({{<baseurl>}}/rancher/v2.x/en/cluster-provisioning/imported-clusters/#additional-features-for-imported-k3s-clusters)
@@ -6,6 +6,18 @@ aliases:
- /rancher/v2.x/en/tasks/clusters/creating-a-cluster/create-cluster-azure/
---
In this section, you'll learn how to set up a Kubernetes cluster in Azure through Rancher. During this process, Rancher will provision new nodes in Azure.
- [Creating an Azure Cluster](#creating-an-azure-cluster)
- [Creating an Azure Node Template](#creating-an-azure-node-template)
- [Preparation in Azure](#preparation-in-azure)
- [Creating the Template](#creating-the-template)
- [Template Configuration](#template-configuration)
# Creating an Azure Cluster
> **Prerequisite:** Before Rancher can create a cluster in Azure, a node template needs to be created using your Azure credentials and configuration. For details, see [this section.](#creating-an-azure-node-template)
Use {{< product >}} to create a Kubernetes cluster in Azure.
1. From the **Clusters** page, click **Add Cluster**.
@@ -16,35 +28,62 @@ Use {{< product >}} to create a Kubernetes cluster in Azure.
4. {{< step_create-cluster_member-roles >}}
5. {{< step_create-cluster_cluster-options >}}
5. {{< step_create-cluster_cluster-options >}} For more information, see the [cluster configuration reference.](../../options)
6. {{< step_create-cluster_node-pools >}}
1. Click **Add Node Template**.
7. **Optional:** Add additional node pools.
2. Complete the **Azure Options** form.
- **Account Access** stores your account information for authenticating with Azure. Note: As of v2.2.0, account access information is stored as a cloud credentials. Cloud credentials are stored as Kubernetes secrets. Multiple node templates can use the same cloud credential. You can use an existing cloud credential or create a new one. To create a new cloud credential, enter **Name** and **Account Access** data, then click **Create.**
- **Placement** sets the geographical region where your cluster is hosted and other location metadata.
- **Network** configures the networking used in your cluster.
- **Instance** customizes your VM configuration.
3. {{< step_rancher-template >}}
4. Click **Create**.
5. **Optional:** Add additional node pools.
<br>
7. Review your options to confirm they're correct. Then click **Create**.
8. Review your options to confirm they're correct. Then click **Create**.
{{< result_create-cluster >}}
# Optional Next Steps
### Optional Next Steps
After creating your cluster, you can access it through the Rancher UI. As a best practice, we recommend setting up these alternate ways of accessing your cluster:
- **Access your cluster with the kubectl CLI:** Follow [these steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/#accessing-clusters-with-kubectl-on-your-workstation) to access clusters with kubectl on your workstation. In this case, you will be authenticated through the Rancher servers authentication proxy, then Rancher will connect you to the downstream cluster. This method lets you manage the cluster without the Rancher UI.
- **Access your cluster with the kubectl CLI, using the authorized cluster endpoint:** Follow [these steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/#authenticating-directly-with-a-downstream-cluster) to access your cluster with kubectl directly, without authenticating through Rancher. We recommend setting up this alternative method to access your cluster so that in case you cant connect to Rancher, you can still access the cluster.
- **Access your cluster with the kubectl CLI, using the authorized cluster endpoint:** Follow [these steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/#authenticating-directly-with-a-downstream-cluster) to access your cluster with kubectl directly, without authenticating through Rancher. We recommend setting up this alternative method to access your cluster so that in case you cant connect to Rancher, you can still access the cluster.
# Creating an Azure Node Template
Creating a node template for Azure will allow Rancher to provision new nodes when it sets up a Kubernetes cluster in Azure.
### Preparation in Azure
Before creating a **node template** in Rancher using a cloud infrastructure such as Azure, we must configure Rancher to allow the manipulation of resources in an Azure subscription.
To do this, we will first create a new Azure **service principal (SP)** in Azure **Active Directory (AD)**, which, in Azure, is an application user who has permission to manage Azure resources.
The following is a template `az cli` script that you have to run for creating an service principal, where you have to enter your SP name, role, and scope:
```
az ad sp create-for-rbac --name="<Rancher ServicePrincipal name>" --role="Contributor" --scopes="/subscriptions/<subscription Id>"
```
The creation of this service principal returns three pieces of identification information, *The application ID, also called the client ID*, *The client secret*, and *The tenant ID*. This information will be used in the following section adding the **node template**.
### Creating the Template
1. Click **Add Node Template**.
1. Complete the **Azure Options** form. For help filling out the form, refer to [Configuration](#azure-node-template-configuration) below.
1. Click **Create**.
**Result:** The node template can be used during the cluster creation process.
### Template Configuration
- **Account Access** stores your account information for authenticating with Azure. Note: As of v2.2.0, account access information is stored as a cloud credentials. Cloud credentials are stored as Kubernetes secrets. Multiple node templates can use the same cloud credential. You can use an existing cloud credential or create a new one. To create a new cloud credential, enter **Name** and **Account Access** data, then click **Create.**
- **Placement** sets the geographical region where your cluster is hosted and other location metadata.
- **Network** configures the networking used in your cluster.
- **Instance** customizes your VM configuration.
{{< step_rancher-template >}}
@@ -15,6 +15,8 @@ Use Rancher to create a Kubernetes cluster in Amazon EC2.
- [Example IAM Policy to allow encrypted EBS volumes](#example-iam-policy-to-allow-encrypted-ebs-volumes)
- **IAM Policy added as Permission** to the user. See [Amazon Documentation: Adding Permissions to a User (Console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_change-permissions.html#users_change_permissions-add-console) how to attach it to an user.
> **Note:** Rancher v2.4.6 and v2.4.7 had an issue where the `kms:ListKeys` permission was required to create, edit, or clone Amazon EC2 node templates. This requirement was removed in v2.4.8.
# Creating an EC2 Cluster
The steps to create a cluster differ based on your Rancher version.
@@ -117,6 +119,10 @@ After creating your cluster, you can access it through the Rancher UI. As a best
- **Access your cluster with the kubectl CLI:** Follow [these steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/#accessing-clusters-with-kubectl-on-your-workstation) to access clusters with kubectl on your workstation. In this case, you will be authenticated through the Rancher servers authentication proxy, then Rancher will connect you to the downstream cluster. This method lets you manage the cluster without the Rancher UI.
- **Access your cluster with the kubectl CLI, using the authorized cluster endpoint:** Follow [these steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/#authenticating-directly-with-a-downstream-cluster) to access your cluster with kubectl directly, without authenticating through Rancher. We recommend setting up this alternative method to access your cluster so that in case you cant connect to Rancher, you can still access the cluster.
# IAM Policies
> **Note:** Rancher v2.4.6 and v2.4.7 had an issue where the `kms:ListKeys` permission was required to create, edit, or clone Amazon EC2 node templates. This requirement was removed in v2.4.8.
### Example IAM Policy
```json
@@ -5,7 +5,11 @@ weight: 3
---
> **Prerequisite:**
> Set up the Rancher Kubernetes cluster. As of Rancher v2.5, Rancher can be installed on any Kubernetes cluster. This cluster can use upstream Kubernetes, or it can use one of Rancher's Kubernetes distributions, or it can be a managed Kubernetes cluster from a provider such as Amazon EKS.
> Set up the Rancher server's local Kubernetes cluster.
>
> - As of Rancher v2.5, Rancher can be installed on any Kubernetes cluster. This cluster can use upstream Kubernetes, or it can use one of Rancher's Kubernetes distributions, or it can be a managed Kubernetes cluster from a provider such as Amazon EKS.
> - In Rancher v2.4.x, Rancher needs to be installed on a K3s Kubernetes cluster or an RKE Kubernetes cluster.
> - In Rancher prior to v2.4, Rancher needs to be installed on an RKE Kubernetes cluster.
# Install the Rancher Helm Chart
@@ -0,0 +1,14 @@
---
title: Installing Rancher behind an HTTP Proxy
weight: 4
---
In a lot of enterprise environments, servers or VMs running on premise do not have direct Internet access, but must connect to external services through a HTTP(S) proxy for security reasons. This tutorial shows step by step how to set up a highly available Rancher installation in such an environment.
Alternatively, it is also possible to set up Rancher completely air-gapped without any Internet access. This process is described in detail in the [Rancher docs]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/air-gap/).
# Installation Outline
1. [Set up infrastructure]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/behind-proxy/prepare-nodes/)
2. [Set up a Kubernetes cluster]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/behind-proxy/launch-kubernetes/)
3. [Install Rancher]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/behind-proxy/install-rancher/)
@@ -0,0 +1,86 @@
---
title: 3. Install Rancher
weight: 300
---
Now that you have a running RKE cluster, you can install Rancher in it. For security reasons all traffic to Rancher must be encrypted with TLS. For this tutorial you are going to automatically issue a self-signed certificate through [cert-manager](https://cert-manager.io/). In a real-world use-case you will likely use Let's Encrypt or provide your own certificate. For more details see [SSL configuration]({{<baseurl>}}/rancher/v2.x/en/installation/k8s-install/helm-rancher/#4-choose-your-ssl-configuration).
> **Note:** These installation instructions assume you are using Helm 3.
### Install cert-manager
Add the cert-manager helm repository:
```
helm repo add jetstack https://charts.jetstack.io
```
Create a namespace for cert-manager:
```
kubectl create namespace cert-manager
```
Install the CustomResourceDefinitions of cert-manager:
```
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.2/cert-manager.crds.yaml
```
And install it with Helm. Note that cert-manager also needs your proxy configured in case it needs to communicate with Let's Encrypt or other external certificate issuers:
```
helm upgrade --install cert-manager jetstack/cert-manager \
--namespace cert-manager --version v0.15.2 \
--set http_proxy=http://${proxy_host} \
--set https_proxy=http://${proxy_host} \
--set no_proxy=127.0.0.0/8\\,10.0.0.0/8\\,172.16.0.0/12\\,192.168.0.0/16
```
Now you should wait until cert-manager is finished starting up:
```
kubectl rollout status deployment -n cert-manager cert-manager
kubectl rollout status deployment -n cert-manager cert-manager-webhook
```
### Install Rancher
Next you can install Rancher itself. First add the helm repository:
```
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
```
Create a namespace:
```
kubectl create namespace cattle-system
```
And install Rancher with Helm. Rancher also needs a proxy configuration so that it can communicate with external application catalogs or retrieve Kubernetes version update metadata:
```
helm upgrade --install rancher rancher-latest/rancher \
--namespace cattle-system \
--set hostname=rancher.example.com \
--set proxy=http://${proxy_host}
```
After waiting for the deployment to finish:
```
kubectl rollout status deployment -n cattle-system rancher
```
You can now navigate to `https://rancher.example.com` and start using Rancher.
> **Note:** If you don't intend to send telemetry data, opt out [telemetry]({{<baseurl>}}/rancher/v2.x/en/faq/telemetry/) during the initial login. Leaving this active in an air-gapped environment can cause issues if the sockets cannot be opened successfully.
### Additional Resources
These resources could be helpful when installing Rancher:
- [Rancher Helm chart options]({{<baseurl>}}/rancher/v2.x/en/installation/options/chart-options/)
- [Adding TLS secrets]({{<baseurl>}}/rancher/v2.x/en/installation/options/tls-secrets/)
- [Troubleshooting Rancher Kubernetes Installations]({{<baseurl>}}/rancher/v2.x/en/installation/options/troubleshooting/)
@@ -0,0 +1,151 @@
---
title: '2. Install Kubernetes'
weight: 200
---
Once the infrastructure is ready, you can continue with setting up an RKE cluster to install Rancher in.
### Installing Docker
First, you have to install Docker and setup the HTTP proxy on all three Linux nodes. For this perform the following steps on all three nodes.
For convenience export the IP address and port of your proxy into an environment variable and set up the HTTP_PROXY variables for your current shell:
```
export proxy_host="10.0.0.5:8888"
export HTTP_PROXY=http://${proxy_host}
export HTTPS_PROXY=http://${proxy_host}
export NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
```
Next configure apt to use this proxy when installing packages. If you are not using Ubuntu, you have to adapt this step accordingly:
```
cat <<'EOF' | sudo tee /etc/apt/apt.conf.d/proxy.conf > /dev/null
Acquire::http::Proxy "http://${proxy_host}/";
Acquire::https::Proxy "http://${proxy_host}/";
EOF
```
Now you can install Docker:
```
curl -sL https://releases.rancher.com/install-docker/19.03.sh | sh
```
Then ensure that your current user is able to access the Docker daemon without sudo:
```
sudo usermod -aG docker YOUR_USERNAME
```
And configure the Docker daemon to use the proxy to pull images:
```
sudo mkdir -p mkdir /etc/systemd/system/docker.service.d
cat <<'EOF' | sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf > /dev/null
[Service]
Environment="HTTP_PROXY=http://${proxy_host}"
Environment="HTTPS_PROXY=http://${proxy_host}"
Environment="NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
EOF
```
To apply the configuration, restart the Docker daemon:
```
sudo systemctl daemon-reload
sudo systemctl restart docker
```
### Creating the RKE Cluster
You need several command line tools on the host where you have SSH access to the Linux nodes to create and interact with the cluster:
* [RKE CLI binary]({{<baseurl>}}/rke/latest/en/installation/#download-the-rke-binary)
```
sudo curl -fsSL -o /usr/local/bin/rke https://github.com/rancher/rke/releases/download/v1.1.4/rke_linux-amd64
sudo chmod +x /usr/local/bin/rke
```
* [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
```
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
```
* [helm](https://helm.sh/docs/intro/install/)
```
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod +x get_helm.sh
sudo ./get_helm.sh
```
Next, create a YAML file that describes the RKE cluster. Ensure that the IP addresses of the nodes and the SSH username are correct. For more information on the cluster YAML, have a look at the [RKE documentation]({{<baseurl>}}/rke/latest/en/example-yamls/).
```
nodes:
- address: 10.0.1.200
user: ubuntu
role: [controlplane,worker,etcd]
- address: 10.0.1.201
user: ubuntu
role: [controlplane,worker,etcd]
- address: 10.0.1.202
user: ubuntu
role: [controlplane,worker,etcd]
services:
etcd:
backup_config:
interval_hours: 12
retention: 6
```
After that, you can create the Kubernetes cluster by running:
```
rke up --config rancher-cluster.yaml
```
RKE creates a state file called `rancher-cluster.rkestate`, this is needed if you want to perform updates, modify your cluster configuration or restore it from a backup. It also creates a `kube_config_rancher-cluster.yaml` file, that you can use to connect to the remote Kubernetes cluster locally with tools like kubectl or Helm. Make sure to save all of these files in a secure location, for example by putting them into a version control system.
To have a look at your cluster run:
```
export KUBECONFIG=kube_config_rancher-cluster.yaml
kubectl cluster-info
kubectl get pods --all-namespaces
```
You can also verify that your external load balancer works, and the DNS entry is set up correctly. If you send a request to either, you should receive HTTP 404 response from the ingress controller:
```
$ curl 10.0.1.100
default backend - 404
$ curl rancher.example.com
default backend - 404
```
### Save Your Files
> **Important**
> The files mentioned below are needed to maintain, troubleshoot and upgrade your cluster.
Save a copy of the following files in a secure location:
- `rancher-cluster.yml`: The RKE cluster configuration file.
- `kube_config_rancher-cluster.yml`: The [Kubeconfig file]({{<baseurl>}}/rke/latest/en/kubeconfig/) for the cluster, this file contains credentials for full access to the cluster.
- `rancher-cluster.rkestate`: The [Kubernetes Cluster State file]({{<baseurl>}}/rke/latest/en/installation/#kubernetes-cluster-state), this file contains the current state of the cluster including the RKE configuration and the certificates.
> **Note:** The "rancher-cluster" parts of the two latter file names are dependent on how you name the RKE cluster configuration file.
### Issues or errors?
See the [Troubleshooting]({{<baseurl>}}/rancher/v2.x/en/installation/options/troubleshooting/) page.
### [Next: Install Rancher](../install-rancher)
@@ -0,0 +1,61 @@
---
title: '1. Set up Infrastructure'
weight: 100
---
In this section, you will provision the underlying infrastructure for your Rancher management server with internete access through a HTTP proxy.
To install the Rancher management server on a high-availability RKE cluster, we recommend setting up the following infrastructure:
- **Three Linux nodes,** typically virtual machines, in an infrastructure provider such as Amazon's EC2, Google Compute Engine, or vSphere.
- **A load balancer** to direct front-end traffic to the three nodes.
- **A DNS record** to map a URL to the load balancer. This will become the Rancher server URL, and downstream Kubernetes clusters will need to reach it.
These nodes must be in the same region/data center. You may place these servers in separate availability zones.
### Why three nodes?
In an RKE cluster, Rancher server data is stored on etcd. This etcd database runs on all three nodes.
The etcd database requires an odd number of nodes so that it can always elect a leader with a majority of the etcd cluster. If the etcd database cannot elect a leader, etcd can suffer from [split brain](https://www.quora.com/What-is-split-brain-in-distributed-systems), requiring the cluster to be restored from backup. If one of the three etcd nodes fails, the two remaining nodes can elect a leader because they have the majority of the total number of etcd nodes.
### 1. Set up Linux Nodes
These hosts will connect to the internet through an HTTP proxy.
Make sure that your nodes fulfill the general installation requirements for [OS, container runtime, hardware, and networking.]({{<baseurl>}}/rancher/v2.x/en/installation/requirements/)
For an example of one way to set up Linux nodes, refer to this [tutorial]({{<baseurl>}}/rancher/v2.x/en/installation/options/ec2-node) for setting up nodes as instances in Amazon EC2.
### 2. Set up the Load Balancer
You will also need to set up a load balancer to direct traffic to the Rancher replica on both nodes. That will prevent an outage of any single node from taking down communications to the Rancher management server.
When Kubernetes gets set up in a later step, the RKE tool will deploy an NGINX Ingress controller. This controller will listen on ports 80 and 443 of the worker nodes, answering traffic destined for specific hostnames.
When Rancher is installed (also in a later step), the Rancher system creates an Ingress resource. That Ingress tells the NGINX Ingress controller to listen for traffic destined for the Rancher hostname. The NGINX Ingress controller, when receiving traffic destined for the Rancher hostname, will forward that traffic to the running Rancher pods in the cluster.
For your implementation, consider if you want or need to use a Layer-4 or Layer-7 load balancer:
- **A layer-4 load balancer** is the simpler of the two choices, in which you are forwarding TCP traffic to your nodes. We recommend configuring your load balancer as a Layer 4 balancer, forwarding traffic to ports TCP/80 and TCP/443 to the Rancher management cluster nodes. The Ingress controller on the cluster will redirect HTTP traffic to HTTPS and terminate SSL/TLS on port TCP/443. The Ingress controller will forward traffic to port TCP/80 to the Ingress pod in the Rancher deployment.
- **A layer-7 load balancer** is a bit more complicated but can offer features that you may want. For instance, a layer-7 load balancer is capable of handling TLS termination at the load balancer, as opposed to Rancher doing TLS termination itself. This can be beneficial if you want to centralize your TLS termination in your infrastructure. Layer-7 load balancing also offers the capability for your load balancer to make decisions based on HTTP attributes such as cookies, etc. that a layer-4 load balancer is not able to concern itself with. If you decide to terminate the SSL/TLS traffic on a layer-7 load balancer, you will need to use the `--set tls=external` option when installing Rancher in a later step. For more information, refer to the [Rancher Helm chart options.]({{<baseurl>}}/rancher/v2.x/en/installation/options/chart-options/#external-tls-termination)
For an example showing how to set up an NGINX load balancer, refer to [this page.]({{<baseurl>}}/rancher/v2.x/en/installation/options/nginx/)
For a how-to guide for setting up an Amazon ELB Network Load Balancer, refer to [this page.]({{<baseurl>}}/rancher/v2.x/en/installation/options/nlb/)
> **Important:**
> Do not use this load balancer (i.e, the `local` cluster Ingress) to load balance applications other than Rancher following installation. Sharing this Ingress with other applications may result in websocket errors to Rancher following Ingress configuration reloads for other apps. We recommend dedicating the `local` cluster to Rancher and no other applications.
### 3. Set up the DNS Record
Once you have set up your load balancer, you will need to create a DNS record to send traffic to this load balancer.
Depending on your environment, this may be an A record pointing to the LB IP, or it may be a CNAME pointing to the load balancer hostname. In either case, make sure this record is the hostname that you intend Rancher to respond on.
You will need to specify this hostname in a later step when you install Rancher, and it is not possible to change it later. Make sure that your decision is a final one.
For a how-to guide for setting up a DNS record to route domain traffic to an Amazon ELB load balancer, refer to the [official AWS documentation.](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-to-elb-load-balancer)
### [Next: Set up a Kubernetes cluster]({{<baseurl>}}/rancher/v2.x/en/installation/other-installation-methods/behind-proxy/launch-kubernetes/)
@@ -161,11 +161,11 @@ Placeholder | Description
```
docker run -d --volumes-from rancher-data \
--restart=unless-stopped \
- 80:80 -p 443:443 \
- /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
- /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
- /<CERT_DIRECTORY>/<CA_CERTS.pem>:/etc/rancher/ssl/cacerts.pem \
--privileged
-p 80:80 -p 443:443 \
-v /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
-v /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
-v /<CERT_DIRECTORY>/<CA_CERTS.pem>:/etc/rancher/ssl/cacerts.pem \
--privileged \
rancher/rancher:<RANCHER_VERSION_TAG>
```
@@ -188,9 +188,9 @@ Placeholder | Description
```
docker run -d --volumes-from rancher-data \
--restart=unless-stopped \
- 80:80 -p 443:443 \
- /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
- /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
-p 80:80 -p 443:443 \
-v /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
-v /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
rancher/rancher:<RANCHER_VERSION_TAG> \
--privileged \
--no-cacerts
@@ -0,0 +1,18 @@
---
title: Installing Docker
weight: 1
---
Docker is required to be installed on any node that runs the Rancher server.
There are a couple of options for installing Docker. One option is to refer to the [official Docker documentation](https://docs.docker.com/install/) about how to install Docker on Linux. The steps will vary based on the Linux distribution.
Another option is to use one of Rancher's Docker installation scripts, which are available for most recent versions of Docker.
For example, this command could be used to install Docker 19.03 on Ubuntu:
```
curl https://releases.rancher.com/install-docker/19.03.sh | sh
```
Rancher has installation scripts for every version of upstream Docker that Kubernetes supports. To find out whether a script is available for installing a certain Docker version, refer to this [GitHub repository,](https://github.com/rancher/install-docker) which contains all of Rancher's Docker installation scripts.
@@ -148,7 +148,7 @@ If you have private registries, catalogs or a proxy that intercepts certificates
Once the Rancher deployment is created, copy your CA certs in pem format into a file named `ca-additional.pem` and use `kubectl` to create the `tls-ca-additional` secret in the `cattle-system` namespace.
```plain
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem=./ca-additional.pem
```
### Private Registry and Air Gap Installs
@@ -26,11 +26,9 @@ If you are using a private CA, Rancher requires a copy of the CA certificate whi
Copy the CA certificate into a file named `cacerts.pem` and use `kubectl` to create the `tls-ca` secret in the `cattle-system` namespace.
>**Important:** Make sure the file is called `cacerts.pem` as Rancher uses that filename to configure the CA certificate.
```
kubectl -n cattle-system create secret generic tls-ca \
--from-file=cacerts.pem
--from-file=cacerts.pem=./cacerts.pem
```
> **Note:** The configured `tls-ca` secret is retrieved when Rancher starts. On a running Rancher installation the updated CA will take effect after new Rancher pods are started.
@@ -1,6 +1,9 @@
---
title: Setting up Amazon ELB Network Load Balancer
weight: 5
aliases:
- /rancher/v2.x/en/installation/ha/create-nodes-lb/nlb
- /rancher/v2.x/en/installation/k8s-install/create-nodes-lb/nlb
---
This how-to guide describes how to set up a Network Load Balancer (NLB) in Amazon's EC2 service that will direct traffic to multiple instances on EC2.
@@ -15,6 +15,7 @@ The following table lists some of the most noteworthy issues to be considered wh
Upgrade Scenario | Issue
---|---
Upgrading to v2.4.6 or v2.4.7 | These Rancher versions had an issue where the `kms:ListKeys` permission was required to create, edit, or clone Amazon EC2 node templates. This requirement was removed in v2.4.8.
Upgrading to v2.3.0+ | Any user provisioned cluster will be automatically updated upon any edit as tolerations were added to the images used for Kubernetes provisioning.
Upgrading to v2.2.0-v2.2.x | Rancher introduced the [system charts](https://github.com/rancher/system-charts) repository which contains all the catalog items required for features such as monitoring, logging, alerting and global DNS. To be able to use these features in an air gap install, you will need to mirror the `system-charts` repository locally and configure Rancher to use that repository. Please follow the instructions to [configure Rancher system charts]({{<baseurl>}}/rancher/v2.x/en/installation/options/local-system-charts/#setting-up-system-charts-for-rancher-prior-to-v2-3-0).
Upgrading from v2.0.13 or earlier | If your cluster's certificates have expired, you will need to perform [additional steps]({{<baseurl>}}/rancher/v2.x/en/cluster-admin/certificate-rotation/#rotating-expired-certificates-after-upgrading-older-rancher-versions) to rotate the certificates.
@@ -16,6 +16,7 @@ If you installed Rancher using the RKE Add-on yaml, follow the directions to [mi
>
> - If you are upgrading to Rancher v2.5 from a Rancher server that was started with the Helm chart option `--add-local=false`, you will need to drop that flag when upgrading. Otherwise, the Rancher server will not start. The `restricted-admin` role can be used to continue restricting access to the local cluster. For more information, see [this section.]({{<baseurl>}}/rancher/v2.x/en/admin-settings/rbac/global-permissions/#upgrading-from-rancher-with-a-hidden-local-cluster)
> - [Let's Encrypt will be blocking cert-manager instances older than 0.8.0 starting November 1st 2019.](https://community.letsencrypt.org/t/blocking-old-cert-manager-versions/98753) Upgrade cert-manager to the latest version by following [these instructions.]({{<baseurl>}}/rancher/v2.x/en/installation/options/upgrading-cert-manager)
> - Helm should be run from the same location as your kubeconfig file (where you run your kubectl commands from. If you installed K8s with RKE, the config will have been created in the directory you ran `rke up` in) or should manually target the kubeconfig for the intended cluster with the `--kubeconfig` tag (see: https://helm.sh/docs/helm/helm/)
> - The upgrade instructions assume you are using Helm 3. For migration of installs started with Helm 2, refer to the official [Helm 2 to 3 migration docs.](https://helm.sh/blog/migrate-from-helm-v2-to-helm-v3/) The [Helm 2 upgrade page here]({{<baseurl>}}/rancher/v2.x/en/upgrades/upgrades/ha/helm2) provides a copy of the older upgrade instructions that used Helm 2, and it is intended to be used if upgrading to Helm 3 is not feasible.
> - If you are upgrading Rancher from v2.x to v2.3+, and you are using external TLS termination, you will need to edit the cluster.yml to [enable using forwarded host headers.]({{<baseurl>}}/rancher/v2.x/en/installation/options/chart-options/#configuring-ingress-for-external-tls-when-using-nginx-v0-25)
@@ -102,7 +103,16 @@ helm upgrade rancher rancher-<CHART_REPO>/rancher \
--set hostname=rancher.my.org
```
> **Note:** There will be many more options from the previous step that need to be appended.
> **Note:** The above is an example, there may be more values from the previous step that need to be appended.
Alternatively, it's possible to reuse current values and make small changes with the `--reuse-values` flag. For example, to only change the Rancher version:
```
helm upgrade rancher rancher-<CHART_REPO>/rancher \
--namespace cattle-system \
--reuse-values \
--version=2.4.5
```
{{% /accordion %}}
@@ -22,7 +22,7 @@ Prometheus [CPU Reservation](https://kubernetes.io/docs/concepts/configuration/m
Prometheus [Memory Limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) | Memory resource limit for the Prometheus pod.
Prometheus [Memory Reservation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) | Memory resource requests for the Prometheus pod.
Selector | Ability to select the nodes in which Prometheus and Grafana pods are deployed to. To use this option, the nodes must have labels.
Advanced Options | Since monitoring is an [application](https://github.com/rancher/system-charts/tree/dev/charts/rancher-monitoring) from the [Rancher catalog]({{<baseurl>}}/rancher/v2.x/en/catalog/), it can be [configured like other catalog application]({{<baseurl>}}/rancher/v2.x/en/catalog/apps/#configuration-options). _Warning: Any modification to the application without understanding the entire application can lead to catastrophic errors._
Advanced Options | Since monitoring is an [application](https://github.com/rancher/system-charts/tree/dev/charts/rancher-monitoring) from the [Rancher catalog]({{<baseurl>}}/rancher/v2.x/en/catalog/), it can be [configured like any other catalog application]({{<baseurl>}}/rancher/v2.x/en/catalog/catalog-config/). _Warning: Any modification to the application without understanding the entire application can lead to catastrophic errors._
## Node Exporter
@@ -20,3 +20,22 @@ Configure kubectl by visiting your cluster in the Rancher Web UI then clicking o
Run `kubectl cluster-info` or `kubectl get pods` successfully.
## Authentication with kubectl and kubeconfig Tokens with TTL
_**Available as of v2.4.6**_
_Requirements_
If admins have [enforced TTL on kubeconfig tokens](../../api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires the [Rancher cli](../cli) to be present in your PATH when you run `kubectl`. Otherwise, youll see error like:
`Unable to connect to the server: getting credentials: exec: exec: "rancher": executable file not found in $PATH`.
This feature enables kubectl to authenticate with the Rancher server and get a new kubeconfig token when required. The following auth providers are currently supported:
1. Local
2. Active Directory
3. FreeIpa, OpenLdap
4. SAML providers - Ping, Okta, ADFS, Keycloak, Shibboleth
When you first run kubectl, for example, `kubectl get pods`, it will ask you to pick an auth provider and log in with the Rancher server.
The kubeconfig token is cached in the path where you run kubectl under `./.cache/token`. This token is valid till [it expires](../../api/api-tokens/#expiration-period), or [gets deleted from the Rancher server](../../api/api-tokens/#deleting-tokens)
Upon expiration, the next `kubectl get pods` will ask you to log in with the Rancher server again.
@@ -12,9 +12,7 @@ The following steps will quickly deploy a Rancher Server on AWS with a single no
- [Amazon AWS Account](https://aws.amazon.com/account/): An Amazon AWS Account is required to create resources for deploying Rancher and Kubernetes.
- [Amazon AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html): Use this link to follow a tutorial to create an Amazon AWS Access Key if you don't have one yet.
- [Amazon AWS Key Pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair) Use this link and follow instructions to create a Key Pair.
- Install [Terraform](https://www.terraform.io/downloads.html): Used to provision the server and cluster in Amazon AWS.
- Install the [RKE terraform provider.](https://github.com/rancher/terraform-provider-rke#installing-the-provider) You will need to download the binary for the Terraform provider for RKE that corresponds to the operating system of your workstation. Then you will need to move the binary into your Terraform plugin directory. The name of the directory will depend on your operating system. For more information how to install Terraform plugins, refer to the [Terraform documentation.](https://www.terraform.io/docs/plugins/basics.html#installing-a-plugin)
## Getting Started
@@ -36,7 +34,6 @@ Suggestions include:
- `aws_region` - Amazon AWS region, choose the closest instead of the default
- `prefix` - Prefix for all created resources
- `instance_type` - EC2 instance size used, minimum is `t3a.medium` but `t3a.large` or `t3a.xlarge` could be used if within budget
- `ssh_key_file_name` - Use a specific SSH key instead of `~/.ssh/id_rsa` (public key is assumed to be `${ssh_key_file_name}.pub`)
1. Run `terraform init`.
@@ -48,7 +45,7 @@ Suggestions include:
Outputs:
rancher_node_ip = xx.xx.xx.xx
rancher_server_url = https://ec2-xx-xx-xx-xx.compute-1.amazonaws.com
rancher_server_url = https://rancher.xx.xx.xx.xx.xip.io
workload_node_ip = yy.yy.yy.yy
```
@@ -56,7 +53,7 @@ Suggestions include:
#### Result
Two Kubernetes clusters are deployed into your AWS account, one running Rancher Server and the other ready for experimentation deployments.
Two Kubernetes clusters are deployed into your AWS account, one running Rancher Server and the other ready for experimentation deployments. Please note that while this setup is a great way to explore Rancher functionality, a production setup should follow our high availability setup guidelines.
### What's Next?
@@ -37,8 +37,6 @@ Suggestions include:
1. Run `terraform init`.
1. Install the [RKE terraform provider](https://github.com/rancher/terraform-provider-rke), see [installation instructions](https://github.com/rancher/terraform-provider-rke#using-the-provider).
1. To initiate the creation of the environment, run `terraform apply --auto-approve`. Then wait for output similar to the following:
```
@@ -38,8 +38,6 @@ Suggestions include:
1. Run `terraform init`.
1. Install the [RKE terraform provider](https://github.com/rancher/terraform-provider-rke), see [installation instructions](https://github.com/rancher/terraform-provider-rke#using-the-provider).
1. To initiate the creation of the environment, run `terraform apply --auto-approve`. Then wait for output similar to the following:
```
@@ -48,7 +46,7 @@ Suggestions include:
Outputs:
rancher_node_ip = xx.xx.xx.xx
rancher_server_url = https://xx-xx-xx-xx.nip.io
rancher_server_url = https://rancher.xx.xx.xx.xx.xip.io
workload_node_ip = yy.yy.yy.yy
```
@@ -43,8 +43,6 @@ Suggestions include:
1. Run `terraform init`.
1. Install the [RKE terraform provider](https://github.com/rancher/terraform-provider-rke), see [installation instructions](https://github.com/rancher/terraform-provider-rke#using-the-provider).
1. To initiate the creation of the environment, run `terraform apply --auto-approve`. Then wait for output similar to the following:
```
@@ -53,7 +51,7 @@ Suggestions include:
Outputs:
rancher_node_ip = xx.xx.xx.xx
rancher_server_url = https://xx-xx-xx-xx.nip.io
rancher_server_url = https://rancher.xx.xx.xx.xx.xip.io
workload_node_ip = yy.yy.yy.yy
```
@@ -10,6 +10,13 @@ The following steps quickly deploy a Rancher Server with a single node cluster a
- [Virtualbox](https://www.virtualbox.org): The virtual machines that Vagrant provisions need to be provisioned to VirtualBox.
- At least 4GB of free RAM.
### Note
- Vagrant will require plugins to create VirtualBox VMs. Install them with the following commands:
`vagrant plugin install vagrant-vboxmanage`
`vagrant plugin install vagrant-vbguest`
## Getting Started
1. Clone [Rancher Quickstart](https://github.com/rancher/quickstart) to a folder using `git clone https://github.com/rancher/quickstart`.
@@ -21,7 +28,7 @@ The following steps quickly deploy a Rancher Server with a single node cluster a
- Change the number of nodes and the memory allocations, if required. (`node.count`, `node.cpus`, `node.memory`)
- Change the password of the `admin` user for logging into Rancher. (`default_password`)
4. To initiate the creation of the environment run, `vagrant up`.
4. To initiate the creation of the environment run, `vagrant up --provider=virtualbox`.
5. Once provisioning finishes, go to `https://172.22.101.101` in the browser. The default user/password is `admin/admin`.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,720 @@
---
title: Hardening Guide v2.4
weight: 99
---
This document provides prescriptive guidance for hardening a production installation of Rancher v2.4. It outlines the configurations and controls required to address Kubernetes benchmark controls from the Center for Information Security (CIS).
> This hardening guide describes how to secure the nodes in your cluster, and it is recommended to follow this guide before installing Kubernetes.
This hardening guide is intended to be used with specific versions of the CIS Kubernetes Benchmark, Kubernetes, and Rancher:
Hardening Guide Version | Rancher Version | CIS Benchmark Version | Kubernetes Version
------------------------|----------------|-----------------------|------------------
Hardening Guide v2.4 | Rancher v2.4 | Benchmark v1.5 | Kubernetes 1.15
[Click here to download a PDF version of this document](https://releases.rancher.com/documents/security/2.4/Rancher_Hardening_Guide.pdf)
### Overview
This document provides prescriptive guidance for hardening a production installation of Rancher v2.4 with Kubernetes v1.15. It outlines the configurations required to address Kubernetes benchmark controls from the Center for Information Security (CIS).
For more detail about evaluating a hardened cluster against the official CIS benchmark, refer to the [CIS Benchmark Rancher Self-Assessment Guide - Rancher v2.4]({{< baseurl >}}/rancher/v2.x/en/security/benchmark-2.4/).
#### Known Issues
- Rancher **exec shell** and **view logs** for pods are **not** functional in a CIS 1.5 hardened setup when only public IP is provided when registering custom nodes. This functionality requires a private IP to be provided when registering the custom nodes.
- When setting the `default_pod_security_policy_template_id:` to `restricted` Rancher creates **RoleBindings** and **ClusterRoleBindings** on the default service accounts. The CIS 1.5 5.1.5 check requires the default service accounts have no roles or cluster roles bound to it apart from the defaults. In addition the default service accounts should be configured such that it does not provide a service account token and does not have any explicit rights assignments.
### Configure Kernel Runtime Parameters
The following `sysctl` configuration is recommended for all nodes type in the cluster. Set the following parameters in `/etc/sysctl.d/90-kubelet.conf`:
```
vm.overcommit_memory=1
vm.panic_on_oom=0
kernel.panic=10
kernel.panic_on_oops=1
kernel.keys.root_maxbytes=25000000
```
Run `sysctl -p /etc/sysctl.d/90-kubelet.conf` to enable the settings.
### Configure `etcd` user and group
A user account and group for the **etcd** service is required to be setup prior to installing RKE. The **uid** and **gid** for the **etcd** user will be used in the RKE **config.yml** to set the proper permissions for files and directories during installation time.
#### create `etcd` user and group
To create the **etcd** group run the following console commands.
The commands below use `52034` for **uid** and **gid** are for example purposes. Any valid unused **uid** or **gid** could also be used in lieu of `52034`.
```
groupadd --gid 52034 etcd
useradd --comment "etcd service account" --uid 52034 --gid 52034 etcd
```
Update the RKE **config.yml** with the **uid** and **gid** of the **etcd** user:
``` yaml
services:
etcd:
gid: 52034
uid: 52034
```
#### Set `automountServiceAccountToken` to `false` for `default` service accounts
Kubernetes provides a default service account which is used by cluster workloads where no specific service account is assigned to the pod. Where access to the Kubernetes API from a pod is required, a specific service account should be created for that pod, and rights granted to that service account. The default service account should be configured such that it does not provide a service account token and does not have any explicit rights assignments.
For each namespace including **default** and **kube-system** on a standard RKE install the **default** service account must include this value:
```
automountServiceAccountToken: false
```
Save the following yaml to a file called `account_update.yaml`
``` yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
automountServiceAccountToken: false
```
Create a bash script file called `account_update.sh`. Be sure to `chmod +x account_update.sh` so the script has execute permissions.
```
#!/bin/bash -e
for namespace in $(kubectl get namespaces -A -o json | jq -r '.items[].metadata.name'); do
kubectl patch serviceaccount default -n ${namespace} -p "$(cat account_update.yaml)"
done
```
### Ensure that all Namespaces have Network Policies defined
Running different applications on the same Kubernetes cluster creates a risk of one
compromised application attacking a neighboring application. Network segmentation is
important to ensure that containers can communicate only with those they are supposed
to. A network policy is a specification of how selections of pods are allowed to
communicate with each other and other network endpoints.
Network Policies are namespace scoped. When a network policy is introduced to a given
namespace, all traffic not allowed by the policy is denied. However, if there are no network
policies in a namespace all traffic will be allowed into and out of the pods in that
namespace. To enforce network policies, a CNI (container network interface) plugin must be enabled.
This guide uses [canal](https://github.com/projectcalico/canal) to provide the policy enforcement.
Additional information about CNI providers can be found
[here](https://rancher.com/blog/2019/2019-03-21-comparing-kubernetes-cni-providers-flannel-calico-canal-and-weave/)
Once a CNI provider is enabled on a cluster a default network policy can be applied. For reference purposes a
**permissive** example is provide below. If you want to allow all traffic to all pods in a namespace
(even if policies are added that cause some pods to be treated as “isolated”),
you can create a policy that explicitly allows all traffic in that namespace. Save the following `yaml` as
`default-allow-all.yaml`. Additional [documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
about network policies can be found on the Kubernetes site.
> This `NetworkPolicy` is not recommended for production use
``` yaml
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-allow-all
spec:
podSelector: {}
ingress:
- {}
egress:
- {}
policyTypes:
- Ingress
- Egress
```
Create a bash script file called `apply_networkPolicy_to_all_ns.sh`. Be sure to
`chmod +x apply_networkPolicy_to_all_ns.sh` so the script has execute permissions.
```
#!/bin/bash -e
for namespace in $(kubectl get namespaces -A -o json | jq -r '.items[].metadata.name'); do
kubectl apply -f default-allow-all.yaml -n ${namespace}
done
```
Execute this script to apply the `default-allow-all.yaml` the **permissive** `NetworkPolicy` to all namespaces.
### Reference Hardened RKE `cluster.yml` configuration
The reference `cluster.yml` is used by the RKE CLI that provides the configuration needed to achieve a hardened install
of Rancher Kubernetes Engine (RKE). Install [documentation](https://rancher.com/docs/rke/latest/en/installation/) is
provided with additional details about the configuration items. This reference `cluster.yml` does not include the required **nodes** directive which will vary depending on your environment. Documentation for node configuration can be found here: https://rancher.com/docs/rke/latest/en/config-options/nodes
``` yaml
# If you intend to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
kubernetes_version: "v1.15.9-rancher1-1"
enable_network_policy: true
default_pod_security_policy_template_id: "restricted"
# the nodes directive is required and will vary depending on your environment
# documentation for node configuration can be found here:
# https://rancher.com/docs/rke/latest/en/config-options/nodes
nodes:
services:
etcd:
uid: 52034
gid: 52034
kube-api:
pod_security_policy: true
secrets_encryption_config:
enabled: true
audit_log:
enabled: true
admission_configuration:
event_rate_limit:
enabled: true
kube-controller:
extra_args:
feature-gates: "RotateKubeletServerCertificate=true"
scheduler:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
kubelet:
generate_serving_certificate: true
extra_args:
feature-gates: "RotateKubeletServerCertificate=true"
protect-kernel-defaults: "true"
tls-cipher-suites: "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256"
extra_binds: []
extra_env: []
cluster_domain: ""
infra_container_image: ""
cluster_dns_server: ""
fail_swap_on: false
kubeproxy:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
network:
plugin: ""
options: {}
mtu: 0
node_selector: {}
authentication:
strategy: ""
sans: []
webhook: null
addons: |
---
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: default-psp-role
namespace: ingress-nginx
rules:
- apiGroups:
- extensions
resourceNames:
- default-psp
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default-psp-rolebinding
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: v1
kind: Namespace
metadata:
name: cattle-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: default-psp-role
namespace: cattle-system
rules:
- apiGroups:
- extensions
resourceNames:
- default-psp
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default-psp-rolebinding
namespace: cattle-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
requiredDropCapabilities:
- NET_RAW
privileged: false
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- emptyDir
- secret
- persistentVolumeClaim
- downwardAPI
- configMap
- projected
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp:restricted
rules:
- apiGroups:
- extensions
resourceNames:
- restricted
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp:restricted
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp:restricted
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
addons_include: []
system_images:
etcd: ""
alpine: ""
nginx_proxy: ""
cert_downloader: ""
kubernetes_services_sidecar: ""
kubedns: ""
dnsmasq: ""
kubedns_sidecar: ""
kubedns_autoscaler: ""
coredns: ""
coredns_autoscaler: ""
kubernetes: ""
flannel: ""
flannel_cni: ""
calico_node: ""
calico_cni: ""
calico_controllers: ""
calico_ctl: ""
calico_flexvol: ""
canal_node: ""
canal_cni: ""
canal_flannel: ""
canal_flexvol: ""
weave_node: ""
weave_cni: ""
pod_infra_container: ""
ingress: ""
ingress_backend: ""
metrics_server: ""
windows_pod_infra_container: ""
ssh_key_path: ""
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
mode: ""
options: {}
ignore_docker_version: false
private_registries: []
ingress:
provider: ""
options: {}
node_selector: {}
extra_args: {}
dns_policy: ""
extra_envs: []
extra_volumes: []
extra_volume_mounts: []
cluster_name: ""
prefix_path: ""
addon_job_timeout: 0
bastion_host:
address: ""
port: ""
user: ""
ssh_key: ""
ssh_key_path: ""
ssh_cert: ""
ssh_cert_path: ""
monitoring:
provider: ""
options: {}
node_selector: {}
restore:
restore: false
snapshot_name: ""
dns: null
```
### Reference Hardened RKE Template configuration
The reference RKE Template provides the configuration needed to achieve a hardened install of Kubenetes.
RKE Templates are used to provision Kubernetes and define Rancher settings. Follow the Rancher
[documentaion](https://rancher.com/docs/rancher/v2.x/en/installation) for additional installation and RKE Template details.
``` yaml
#
# Cluster Config
#
default_pod_security_policy_template_id: restricted
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: true
#
# Rancher Config
#
rancher_kubernetes_engine_config:
addon_job_timeout: 30
addons: |-
---
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: default-psp-role
namespace: ingress-nginx
rules:
- apiGroups:
- extensions
resourceNames:
- default-psp
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default-psp-rolebinding
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: v1
kind: Namespace
metadata:
name: cattle-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: default-psp-role
namespace: cattle-system
rules:
- apiGroups:
- extensions
resourceNames:
- default-psp
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default-psp-rolebinding
namespace: cattle-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: default-psp-role
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
requiredDropCapabilities:
- NET_RAW
privileged: false
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- emptyDir
- secret
- persistentVolumeClaim
- downwardAPI
- configMap
- projected
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp:restricted
rules:
- apiGroups:
- extensions
resourceNames:
- restricted
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp:restricted
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp:restricted
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
ignore_docker_version: true
kubernetes_version: v1.15.9-rancher1-1
#
# If you are using calico on AWS
#
# network:
# plugin: calico
# calico_network_provider:
# cloud_provider: aws
#
# # To specify flannel interface
#
# network:
# plugin: flannel
# flannel_network_provider:
# iface: eth1
#
# # To specify flannel interface for canal plugin
#
# network:
# plugin: canal
# canal_network_provider:
# iface: eth1
#
network:
mtu: 0
plugin: canal
#
# services:
# kube-api:
# service_cluster_ip_range: 10.43.0.0/16
# kube-controller:
# cluster_cidr: 10.42.0.0/16
# service_cluster_ip_range: 10.43.0.0/16
# kubelet:
# cluster_domain: cluster.local
# cluster_dns_server: 10.43.0.10
#
services:
etcd:
backup_config:
enabled: false
interval_hours: 12
retention: 6
safe_timestamp: false
creation: 12h
extra_args:
election-timeout: '5000'
heartbeat-interval: '500'
gid: 52034
retention: 72h
snapshot: false
uid: 52034
kube_api:
always_pull_images: false
audit_log:
enabled: true
event_rate_limit:
enabled: true
pod_security_policy: true
secrets_encryption_config:
enabled: true
service_node_port_range: 30000-32767
kube_controller:
extra_args:
address: 127.0.0.1
feature-gates: RotateKubeletServerCertificate=true
profiling: 'false'
terminated-pod-gc-threshold: '1000'
kubelet:
extra_args:
anonymous-auth: 'false'
event-qps: '0'
feature-gates: RotateKubeletServerCertificate=true
make-iptables-util-chains: 'true'
protect-kernel-defaults: 'true'
streaming-connection-idle-timeout: 1800s
tls-cipher-suites: >-
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
fail_swap_on: false
generate_serving_certificate: true
scheduler:
extra_args:
address: 127.0.0.1
profiling: 'false'
ssh_agent_auth: false
windows_prefered_cluster: false
```
### Hardened Reference Ubuntu 18.04 LTS **cloud-config**:
The reference **cloud-config** is generally used in cloud infrastructure environments to allow for
configuration management of compute instances. The reference config configures Ubuntu operating system level settings
needed before installing kubernetes.
``` yaml
#cloud-config
packages:
- curl
- jq
runcmd:
- sysctl -w vm.overcommit_memory=1
- sysctl -w kernel.panic=10
- sysctl -w kernel.panic_on_oops=1
- curl https://releases.rancher.com/install-docker/18.09.sh | sh
- usermod -aG docker ubuntu
- return=1; while [ $return != 0 ]; do sleep 2; docker ps; return=$?; done
- addgroup --gid 52034 etcd
- useradd --comment "etcd service account" --uid 52034 --gid 52034 etcd
write_files:
- path: /etc/sysctl.d/kubelet.conf
owner: root:root
permissions: "0644"
content: |
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
```
@@ -3,4 +3,4 @@ title: Security Scans
weight: 299
---
The documentation about CIS security scans has moved [here.]({{<baseurl>}}/rancher/v2.x/en/cis-scans)
The documentation about CIS security scans has moved [here.]({{<baseurl>}}/rancher/v2.x/en/cis-scans)
@@ -28,6 +28,9 @@ API Keys are composed of four components:
3. **Optional:** Enter a description for the API key and select an expiration period or a scope. We recommend setting an expiration date.
The API key won't be valid after expiration. Shorter expiration periods are more secure.
_Available as of v2.4.6_
Expiration period will be bound by `v3/settings/auth-token-max-ttl-minutes`. If it exceeds the max-ttl, API key will be created with max-ttl as the expiration period.
A scope will limit the API key so that it will only work against the Kubernetes API of the specified cluster. If the cluster is configured with an Authorized Cluster Endpoint, you will be able to use a scoped token directly against the cluster's API without proxying through the Rancher server. See [Authorized Cluster Endpoints]({{<baseurl>}}/rancher/v2.x/en/overview/architecture/#4-authorized-cluster-endpoint) for more information.
@@ -34,8 +34,6 @@ You can upload this snapshot directly to an S3 backend with the [S3 options]({{<
$ rke etcd snapshot-save --name snapshot.db --config cluster.yml
```
{{< img "/img/rke/rke-etcd-backup.png" "etcd snapshot" >}}
### 2. Simulate a Node Failure
To simulate the failure, let's power down `node2`.
@@ -133,8 +131,6 @@ Back up the Kubernetes cluster by taking a local snapshot:
$ rke etcd snapshot-save --name snapshot.db --config cluster.yml
```
{{< img "/img/rke/rke-etcd-backup.png" "etcd snapshot" >}}
<a id="store-the-snapshot-externally-rke-prior-to-v0.2.0"></a>
### 2. Store the Snapshot Externally
+2 -2
View File
@@ -7,8 +7,8 @@ weight: 50
RKE is a fast, versatile Kubernetes installer that you can use to install Kubernetes on your Linux hosts. You can get started in a couple of quick and easy steps:
1. [Download the RKE Binary](#download-the-rke-binary)
1. [Alternative RKE macOS Install - Homebrew](#alternative-rke-macos-install---homebrew)
1. [Alternative RKE macOS Install - MacPorts](#alternative-rke-macos-install---macports)
1. [Alternative RKE macOS Install - Homebrew](#alternative-rke-macos-x-install-homebrew)
1. [Alternative RKE macOS Install - MacPorts](#alternative-rke-macos-install-macports)
1. [Prepare the Nodes for the Kubernetes Cluster](#prepare-the-nodes-for-the-kubernetes-cluster)
1. [Creating the Cluster Configuration File](#creating-the-cluster-configuration-file)
1. [Deploying Kubernetes with RKE](#deploying-kubernetes-with-rke)
@@ -39,13 +39,11 @@ The following certificates must exist in the certificate directory.
| Kube Node | kube-node.pem | kube-node-key.pem |
| Apiserver Proxy Client | kube-apiserver-proxy-client.pem | kube-apiserver-proxy-client-key.pem |
| Etcd Nodes | kube-etcd-x-x-x-x.pem | kube-etcd-x-x-x-x-key.pem |
| Kube Api Request Header CA | kube-apiserver-requestheader-ca.pem* | kube-apiserver-requestheader-ca-key.pem** |
| Kube Api Request Header CA | kube-apiserver-requestheader-ca.pem* | kube-apiserver-requestheader-ca-key.pem |
| Service Account Token | - | kube-service-account-token-key.pem |
\* Is the same as kube-ca.pem
\** Is the same as kube-ca-key
## Generating Certificate Signing Requests (CSRs) and Keys
If you want to create and sign the certificates by a real Certificate Authority (CA), you can use RKE to generate a set of Certificate Signing Requests (CSRs) and keys. Using the `rke cert generate-csr` command, you can generate the CSRs and keys.
@@ -26,7 +26,7 @@ You can add/remove only worker nodes, by running `rke up --update-only`. This wi
In order to remove the Kubernetes components from nodes, you use the `rke remove` command.
> **Warning:** This command is irreversible and will destroy the Kubernetes cluster, including etcd snapshots on S3. If there is a disaster and your cluster is inaccessible, refer to the process for [restoring your cluster from a snapshot]({{<baseurl>}}rke/latest/en/etcd-snapshots/#etcd-disaster-recovery).
> **Warning:** This command is irreversible and will destroy the Kubernetes cluster, including etcd snapshots on S3. If there is a disaster and your cluster is inaccessible, refer to the process for [restoring your cluster from a snapshot]({{<baseurl>}}/rke/latest/en/etcd-snapshots/#etcd-disaster-recovery).
The `rke remove` command does the following to each node in the `cluster.yml`:
+3 -3
View File
@@ -2035,9 +2035,9 @@ lodash.sortby@^4.7.0:
integrity sha1-7dFMgk4sycHgsKG0K7UhBRakJDg=
lodash@^4.13.1, lodash@^4.17.10, lodash@^4.17.5:
version "4.17.15"
resolved "https://registry.yarnpkg.com/lodash/-/lodash-4.17.15.tgz#b447f6670a0455bbfeedd11392eff330ea097548"
integrity sha512-8xOcRHvCjnocdS5cpwXQXVzmmh5e5+saE2QGoeQmbKmRS6J3VQppPOIt0MnmE+4xlZoumy0GPG0D0MVIQbNA1A==
version "4.17.19"
resolved "https://registry.yarnpkg.com/lodash/-/lodash-4.17.19.tgz#e48ddedbe30b3321783c5b4301fbd353bc1e4a4b"
integrity sha512-JNvd8XER9GQX0v2qJgsaN/mzFCNA5BRe/j8JN9d+tWyGLSodKQHKFicdwNYzWwI3wjRnaKPsGj1XkBjx/F96DQ==
loose-envify@^1.0.0:
version "1.4.0"