Merge pull request #3088 from rancher/staging

Merge staging into master
This commit is contained in:
Denise
2021-03-03 19:09:31 -08:00
committed by GitHub
817 changed files with 97369 additions and 151 deletions
+67
View File
@@ -0,0 +1,67 @@
.close {
float: right;
font-size: 20px;
font-weight: bold;
line-height: 18px;
color: #000000;
text-shadow: 0 1px 0 #ffffff;
opacity: 0.2;
filter: alpha(opacity=20);
}
.close:hover {
color: #000000;
text-decoration: none;
opacity: 0.4;
filter: alpha(opacity=40);
cursor: pointer;
}
.alert {
padding: 8px 35px 8px 14px;
margin-bottom: 18px;
text-shadow: 0 1px 0 rgba(255, 255, 255, 0.5);
background-color: #fcf8e3;
border: 1px solid #fbeed5;
-webkit-border-radius: 4px;
-moz-border-radius: 4px;
border-radius: 4px;
color: #c09853;
font-size: 15px;
line-height: 1.5;
}
.alert-heading {
color: inherit;
}
.alert .close {
position: relative;
top: -2px;
right: -21px;
line-height: 18px;
}
.alert-success {
background-color: #dff0d8;
border-color: #d6e9c6;
color: #468847;
}
.alert-danger,
.alert-error {
background-color: #f2dede;
border-color: #eed3d7;
color: #b94a48;
}
.alert-info {
background-color: #d9edf7;
border-color: #bce8f1;
color: #3a87ad;
}
.alert-block {
padding-top: 14px;
padding-bottom: 14px;
}
.alert-block > p,
.alert-block > ul {
margin-bottom: 0;
}
.alert-block p + p {
margin-top: 5px;
}
+1
View File
@@ -7,6 +7,7 @@
'instantsearch',
'instantsearch-theme-algolia',
'izimodal',
'alert',
'../../node_modules/rancher-website-theme/assets/sass/theme';
.modal-content {
+1 -1
View File
@@ -58,4 +58,4 @@ A unique node ID can be appended to the hostname by launching K3s servers or age
# Automatically Deployed Manifests
The [manifests](https://github.com/rancher/k3s/tree/master/manifests) located at the directory path `/var/lib/rancher/k3s/server/manifests` are bundled into the K3s binary at build time. These will be installed at runtime by the [rancher/helm-controller.](https://github.com/rancher/helm-controller#helm-controller)
The [manifests](https://github.com/rancher/k3s/tree/master/manifests) located at the directory path `/var/lib/rancher/k3s/server/manifests` are bundled into the K3s binary at build time. These will be installed at runtime by the [rancher/helm-controller.](https://github.com/rancher/helm-controller#helm-controller)
+5
View File
@@ -0,0 +1,5 @@
---
title: v2.0-v2.4
weight: 2
showBreadcrumb: false
---
+21
View File
@@ -0,0 +1,21 @@
---
title: "Rancher v2.0-v2.4"
shortTitle: "Rancher 2.0-2.4"
description: "Rancher adds significant value on top of Kubernetes: managing hundreds of clusters from one interface, centralizing RBAC, enabling monitoring and alerting. Read more."
metaTitle: "Rancher 2.x Docs: What is New?"
metaDescription: "Rancher 2 adds significant value on top of Kubernetes: managing hundreds of clusters from one interface, centralizing RBAC, enabling monitoring and alerting. Read more."
insertOneSix: true
weight: 1
ctaBanner: 0
---
Rancher was originally built to work with multiple orchestrators, and it included its own orchestrator called Cattle. With the rise of Kubernetes in the marketplace, Rancher 2.x exclusively deploys and manages Kubernetes clusters running anywhere, on any provider.
Rancher can provision Kubernetes from a hosted provider, provision compute nodes and then install Kubernetes onto them, or import existing Kubernetes clusters running anywhere.
One Rancher server installation can manage thousands of Kubernetes clusters and thousands of nodes from the same user interface.
Rancher adds significant value on top of Kubernetes, first by centralizing authentication and role-based access control (RBAC) for all of the clusters, giving global admins the ability to control cluster access from one location.
It then enables detailed monitoring and alerting for clusters and their resources, ships logs to external providers, and integrates directly with Helm via the Application Catalog. If you have an external CI/CD system, you can plug it into Rancher, but if you don't, Rancher even includes a pipeline engine to help you automatically deploy and upgrade workloads.
Rancher is a _complete_ container management platform for Kubernetes, giving you the tools to successfully run Kubernetes anywhere.
@@ -0,0 +1,60 @@
---
title: Authentication, Permissions and Global Configuration
weight: 6
aliases:
- /rancher/v2.0-v2.4/en/concepts/global-configuration/
- /rancher/v2.0-v2.4/en/tasks/global-configuration/
- /rancher/v2.0-v2.4/en/concepts/global-configuration/server-url/
- /rancher/v2.0-v2.4/en/tasks/global-configuration/server-url/
- /rancher/v2.0-v2.4/en/admin-settings/log-in/
---
After installation, the [system administrator]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) should configure Rancher to configure authentication, authorization, security, default settings, security policies, drivers and global DNS entries.
## First Log In
After you log into Rancher for the first time, Rancher will prompt you for a **Rancher Server URL**.You should set the URL to the main entry point to the Rancher Server. When a load balancer sits in front a Rancher Server cluster, the URL should resolve to the load balancer. The system will automatically try to infer the Rancher Server URL from the IP address or host name of the host running the Rancher Server. This is only correct if you are running a single node Rancher Server installation. In most cases, therefore, you need to set the Rancher Server URL to the correct value yourself.
>**Important!** After you set the Rancher Server URL, we do not support updating it. Set the URL with extreme care.
## Authentication
One of the key features that Rancher adds to Kubernetes is centralized user authentication. This feature allows to set up local users and/or connect to an external authentication provider. By connecting to an external authentication provider, you can leverage that provider's user and groups.
For more information how authentication works and how to configure each provider, see [Authentication]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/).
## Authorization
Within Rancher, each person authenticates as a _user_, which is a login that grants you access to Rancher. Once the user logs in to Rancher, their _authorization_, or their access rights within the system, is determined by the user's role. Rancher provides built-in roles to allow you to easily configure a user's permissions to resources, but Rancher also provides the ability to customize the roles for each Kubernetes resource.
For more information how authorization works and how to customize roles, see [Roles Based Access Control (RBAC)]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/).
## Pod Security Policies
_Pod Security Policies_ (or PSPs) are objects that control security-sensitive aspects of pod specification, e.g. root privileges. If a pod does not meet the conditions specified in the PSP, Kubernetes will not allow it to start, and Rancher will display an error message.
For more information how to create and use PSPs, see [Pod Security Policies]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/).
## Provisioning Drivers
Drivers in Rancher allow you to manage which providers can be used to provision [hosted Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/) or [nodes in an infrastructure provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/) to allow Rancher to deploy and manage Kubernetes.
For more information, see [Provisioning Drivers]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/drivers/).
## Adding Kubernetes Versions into Rancher
_Available as of v2.3.0_
With this feature, you can upgrade to the latest version of Kubernetes as soon as it is released, without upgrading Rancher. This feature allows you to easily upgrade Kubernetes patch versions (i.e. `v1.15.X`), but not intended to upgrade Kubernetes minor versions (i.e. `v1.X.0`) as Kubernetes tends to deprecate or add APIs between minor versions.
The information that Rancher uses to provision [RKE clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) is now located in the Rancher Kubernetes Metadata. For details on metadata configuration and how to change the Kubernetes version used for provisioning RKE clusters, see [Rancher Kubernetes Metadata.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/k8s-metadata/)
Rancher Kubernetes Metadata contains Kubernetes version information which Rancher uses to provision [RKE clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/).
For more information on how metadata works and how to configure metadata config, see [Rancher Kubernetes Metadata]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/k8s-metadata/).
## Enabling Experimental Features
_Available as of v2.3.0_
Rancher includes some features that are experimental and disabled by default. Feature flags were introduced to allow you to try these features. For more information, refer to the section about [feature flags.]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/options/feature-flags/)
@@ -0,0 +1,97 @@
---
title: Authentication
weight: 1115
aliases:
- /rancher/v2.0-v2.4/en/concepts/global-configuration/authentication/
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/
---
One of the key features that Rancher adds to Kubernetes is centralized user authentication. This feature allows your users to use one set of credentials to authenticate with any of your Kubernetes clusters.
This centralized user authentication is accomplished using the Rancher authentication proxy, which is installed along with the rest of Rancher. This proxy authenticates your users and forwards their requests to your Kubernetes clusters using a service account.
## External vs. Local Authentication
The Rancher authentication proxy integrates with the following external authentication services. The following table lists the first version of Rancher each service debuted.
| Auth Service | Available as of |
| ------------------------------------------------------------------------------------------------ | ---------------- |
| [Microsoft Active Directory]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/ad/) | v2.0.0 |
| [GitHub]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/github/) | v2.0.0 |
| [Microsoft Azure AD]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/azure-ad/) | v2.0.3 |
| [FreeIPA]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/freeipa/) | v2.0.5 |
| [OpenLDAP]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/openldap/) | v2.0.5 |
| [Microsoft AD FS]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/) | v2.0.7 |
| [PingIdentity]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/ping-federate/) | v2.0.7 |
| [Keycloak]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/keycloak/) | v2.1.0 |
| [Okta]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/okta/) | v2.2.0 |
| [Google OAuth]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/google/) | v2.3.0 |
| [Shibboleth]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/shibboleth) | v2.4.0 |
<br/>
However, Rancher also provides [local authentication]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/local/).
In most cases, you should use an external authentication service over local authentication, as external authentication allows user management from a central location. However, you may want a few local authentication users for managing Rancher under rare circumstances, such as if your external authentication provider is unavailable or undergoing maintenance.
## Users and Groups
Rancher relies on users and groups to determine who is allowed to log in to Rancher and which resources they can access. When authenticating with an external provider, groups are provided from the external provider based on the user. These users and groups are given specific roles to resources like clusters, projects, multi-cluster apps, and global DNS providers and entries. When you give access to a group, all users who are a member of that group in the authentication provider will be able to access the resource with the permissions that you've specified. For more information on roles and permissions, see [Role Based Access Control]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/).
> **Note:** Local authentication does not support creating or managing groups.
For more information, see [Users and Groups]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/user-groups/)
## Scope of Rancher Authorization
After you configure Rancher to allow sign on using an external authentication service, you should configure who should be allowed to log in and use Rancher. The following options are available:
| Access Level | Description |
|----------------------------------------------|-------------|
| Allow any valid Users | _Any_ user in the authorization service can access Rancher. We generally discourage use of this setting! |
| Allow members of Clusters, Projects, plus Authorized Users and Organizations | Any user in the authorization service and any group added as a **Cluster Member** or **Project Member** can log in to Rancher. Additionally, any user in the authentication service or group you add to the **Authorized Users and Organizations** list may log in to Rancher. |
| Restrict access to only Authorized Users and Organizations | Only users in the authentication service or groups added to the Authorized Users and Organizations can log in to Rancher. |
To set the Rancher access level for users in the authorization service, follow these steps:
1. From the **Global** view, click **Security > Authentication.**
1. Use the **Site Access** options to configure the scope of user authorization. The table above explains the access level for each option.
1. Optional: If you choose an option other than **Allow any valid Users,** you can add users to the list of authorized users and organizations by searching for them in the text field that appears.
1. Click **Save.**
**Result:** The Rancher access configuration settings are applied.
{{< saml_caveats >}}
## External Authentication Configuration and Principal Users
Configuration of external authentication requires:
- A local user assigned the administrator role, called hereafter the _local principal_.
- An external user that can authenticate with your external authentication service, called hereafter the _external principal_.
Configuration of external authentication affects how principal users are managed within Rancher. Follow the list below to better understand these effects.
1. Sign into Rancher as the local principal and complete configuration of external authentication.
![Sign In]({{<baseurl>}}/img/rancher/sign-in.png)
2. Rancher associates the external principal with the local principal. These two users share the local principal's user ID.
![Principal ID Sharing]({{<baseurl>}}/img/rancher/principal-ID.png)
3. After you complete configuration, Rancher automatically signs out the local principal.
![Sign Out Local Principal]({{<baseurl>}}/img/rancher/sign-out-local.png)
4. Then, Rancher automatically signs you back in as the external principal.
![Sign In External Principal]({{<baseurl>}}/img/rancher/sign-in-external.png)
5. Because the external principal and the local principal share an ID, no unique object for the external principal displays on the Users page.
![Sign In External Principal]({{<baseurl>}}/img/rancher/users-page.png)
6. The external principal and the local principal share the same access rights.
@@ -0,0 +1,199 @@
---
title: Configuring Active Directory (AD)
weight: 1112
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/active-directory/
---
If your organization uses Microsoft Active Directory as central user repository, you can configure Rancher to communicate with an Active Directory server to authenticate users. This allows Rancher admins to control access to clusters and projects based on users and groups managed externally in the Active Directory, while allowing end-users to authenticate with their AD credentials when logging in to the Rancher UI.
Rancher uses LDAP to communicate with the Active Directory server. The authentication flow for Active Directory is therefore the same as for the [OpenLDAP authentication]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/openldap) integration.
> **Note:**
>
> Before you start, please familiarise yourself with the concepts of [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
## Prerequisites
You'll need to create or obtain from your AD administrator a new AD user to use as service account for Rancher. This user must have sufficient permissions to perform LDAP searches and read attributes of users and groups under your AD domain.
Usually a (non-admin) **Domain User** account should be used for this purpose, as by default such user has read-only privileges for most objects in the domain partition.
Note however, that in some locked-down Active Directory configurations this default behaviour may not apply. In such case you will need to ensure that the service account user has at least **Read** and **List Content** permissions granted either on the Base OU (enclosing users and groups) or globally for the domain.
> **Using TLS?**
>
> If the certificate used by the AD server is self-signed or not from a recognised certificate authority, make sure have at hand the CA certificate (concatenated with any intermediate certificates) in PEM format. You will have to paste in this certificate during the configuration so that Rancher is able to validate the certificate chain.
## Configuration Steps
### Open Active Directory Configuration
1. Log into the Rancher UI using the initial local `admin` account.
2. From the **Global** view, navigate to **Security** > **Authentication**
3. Select **Active Directory**. The **Configure an AD server** form will be displayed.
### Configure Active Directory Server Settings
In the section titled `1. Configure an Active Directory server`, complete the fields with the information specific to your Active Directory server. Please refer to the following table for detailed information on the required values for each parameter.
> **Note:**
>
> If you are unsure about the correct values to enter in the user/group Search Base field, please refer to [Identify Search Base and Schema using ldapsearch](#annex-identify-search-base-and-schema-using-ldapsearch).
**Table 1: AD Server parameters**
| Parameter | Description |
|:--|:--|
| Hostname | Specify the hostname or IP address of the AD server |
| Port | Specify the port at which the Active Directory server is listening for connections. Unencrypted LDAP normally uses the standard port of 389, while LDAPS uses port 636.|
| TLS | Check this box to enable LDAP over SSL/TLS (commonly known as LDAPS).|
| Server Connection Timeout | The duration in number of seconds that Rancher waits before considering the AD server unreachable. |
| Service Account Username | Enter the username of an AD account with read-only access to your domain partition (see [Prerequisites](#prerequisites)). The username can be entered in NetBIOS format (e.g. "DOMAIN\serviceaccount") or UPN format (e.g. "serviceaccount@domain.com"). |
| Service Account Password | The password for the service account. |
| Default Login Domain | When you configure this field with the NetBIOS name of your AD domain, usernames entered without a domain (e.g. "jdoe") will automatically be converted to a slashed, NetBIOS logon (e.g. "LOGIN_DOMAIN\jdoe") when binding to the AD server. If your users authenticate with the UPN (e.g. "jdoe@acme.com") as username then this field **must** be left empty. |
| User Search Base | The Distinguished Name of the node in your directory tree from which to start searching for user objects. All users must be descendents of this base DN. For example: "ou=people,dc=acme,dc=com".|
| Group Search Base | If your groups live under a different node than the one configured under `User Search Base` you will need to provide the Distinguished Name here. Otherwise leave it empty. For example: "ou=groups,dc=acme,dc=com".|
---
### Configure User/Group Schema
In the section titled `2. Customize Schema` you must provide Rancher with a correct mapping of user and group attributes corresponding to the schema used in your directory.
Rancher uses LDAP queries to search for and retrieve information about users and groups within the Active Directory. The attribute mappings configured in this section are used to construct search filters and resolve group membership. It is therefore paramount that the provided settings reflect the reality of your AD domain.
> **Note:**
>
> If you are unfamiliar with the schema used in your Active Directory domain, please refer to [Identify Search Base and Schema using ldapsearch](#annex-identify-search-base-and-schema-using-ldapsearch) to determine the correct configuration values.
#### User Schema
The table below details the parameters for the user schema section configuration.
**Table 2: User schema configuration parameters**
| Parameter | Description |
|:--|:--|
| Object Class | The name of the object class used for user objects in your domain. If defined, only specify the name of the object class - *don't* include it in an LDAP wrapper such as &(objectClass=xxxx) |
| Username Attribute | The user attribute whose value is suitable as a display name. |
| Login Attribute | The attribute whose value matches the username part of credentials entered by your users when logging in to Rancher. If your users authenticate with their UPN (e.g. "jdoe@acme.com") as username then this field must normally be set to `userPrincipalName`. Otherwise for the old, NetBIOS-style logon names (e.g. "jdoe") it's usually `sAMAccountName`. |
| User Member Attribute | The attribute containing the groups that a user is a member of. |
| Search Attribute | When a user enters text to add users or groups in the UI, Rancher queries the AD server and attempts to match users by the attributes provided in this setting. Multiple attributes can be specified by separating them with the pipe ("\|") symbol. To match UPN usernames (e.g. jdoe@acme.com) you should usually set the value of this field to `userPrincipalName`. |
| Search Filter | This filter gets applied to the list of users that is searched when Rancher attempts to add users to a site access list or tries to add members to clusters or projects. For example, a user search filter could be <code>(&#124;(memberOf=CN=group1,CN=Users,DC=testad,DC=rancher,DC=io)(memberOf=CN=group2,CN=Users,DC=testad,DC=rancher,DC=io))</code>. Note: If the search filter does not use [valid AD search syntax,](https://docs.microsoft.com/en-us/windows/win32/adsi/search-filter-syntax) the list of users will be empty. |
| User Enabled Attribute | The attribute containing an integer value representing a bitwise enumeration of user account flags. Rancher uses this to determine if a user account is disabled. You should normally leave this set to the AD standard `userAccountControl`. |
| Disabled Status Bitmask | This is the value of the `User Enabled Attribute` designating a disabled user account. You should normally leave this set to the default value of "2" as specified in the Microsoft Active Directory schema (see [here](https://docs.microsoft.com/en-us/windows/desktop/adschema/a-useraccountcontrol#remarks)). |
---
#### Group Schema
The table below details the parameters for the group schema configuration.
**Table 3: Group schema configuration parameters**
| Parameter | Description |
|:--|:--|
| Object Class | The name of the object class used for group objects in your domain. If defined, only specify the name of the object class - *don't* include it in an LDAP wrapper such as &(objectClass=xxxx) |
| Name Attribute | The group attribute whose value is suitable for a display name. |
| Group Member User Attribute | The name of the **user attribute** whose format matches the group members in the `Group Member Mapping Attribute`. |
| Group Member Mapping Attribute | The name of the group attribute containing the members of a group. |
| Search Attribute | Attribute used to construct search filters when adding groups to clusters or projects. See description of user schema `Search Attribute`. |
| Search Filter | This filter gets applied to the list of groups that is searched when Rancher attempts to add groups to a site access list or tries to add groups to clusters or projects. For example, a group search filter could be <code>(&#124;(cn=group1)(cn=group2))</code>. Note: If the search filter does not use [valid AD search syntax,](https://docs.microsoft.com/en-us/windows/win32/adsi/search-filter-syntax) the list of groups will be empty. |
| Group DN Attribute | The name of the group attribute whose format matches the values in the user attribute describing a the user's memberships. See `User Member Attribute`. |
| Nested Group Membership | This settings defines whether Rancher should resolve nested group memberships. Use only if your organisation makes use of these nested memberships (ie. you have groups that contain other groups as members. We advise avoiding nested groups when possible). |
---
### Test Authentication
Once you have completed the configuration, proceed by testing the connection to the AD server **using your AD admin account**. If the test is successful, authentication with the configured Active Directory will be enabled implicitly with the account you test with set as admin.
> **Note:**
>
> The AD user pertaining to the credentials entered in this step will be mapped to the local principal account and assigned administrator privileges in Rancher. You should therefore make a conscious decision on which AD account you use to perform this step.
1. Enter the **username** and **password** for the AD account that should be mapped to the local principal account.
2. Click **Authenticate with Active Directory** to finalise the setup.
**Result:**
- Active Directory authentication has been enabled.
- You have been signed into Rancher as administrator using the provided AD credentials.
> **Note:**
>
> You will still be able to login using the locally configured `admin` account and password in case of a disruption of LDAP services.
## Annex: Identify Search Base and Schema using ldapsearch
In order to successfully configure AD authentication it is crucial that you provide the correct configuration pertaining to the hierarchy and schema of your AD server.
The [`ldapsearch`](http://manpages.ubuntu.com/manpages/artful/man1/ldapsearch.1.html) tool allows you to query your AD server to learn about the schema used for user and group objects.
For the purpose of the example commands provided below we will assume:
- The Active Directory server has a hostname of `ad.acme.com`
- The server is listening for unencrypted connections on port `389`
- The Active Directory domain is `acme`
- You have a valid AD account with the username `jdoe` and password `secret`
### Identify Search Base
First we will use `ldapsearch` to identify the Distinguished Name (DN) of the parent node(s) for users and groups:
```
$ ldapsearch -x -D "acme\jdoe" -w "secret" -p 389 \
-h ad.acme.com -b "dc=acme,dc=com" -s sub "sAMAccountName=jdoe"
```
This command performs an LDAP search with the search base set to the domain root (`-b "dc=acme,dc=com"`) and a filter targeting the user account (`sAMAccountNam=jdoe`), returning the attributes for said user:
{{< img "/img/rancher/ldapsearch-user.png" "LDAP User">}}
Since in this case the user's DN is `CN=John Doe,CN=Users,DC=acme,DC=com` [5], we should configure the **User Search Base** with the parent node DN `CN=Users,DC=acme,DC=com`.
Similarly, based on the DN of the group referenced in the **memberOf** attribute [4], the correct value for the **Group Search Base** would be the parent node of that value, ie. `OU=Groups,DC=acme,DC=com`.
### Identify User Schema
The output of the above `ldapsearch` query also allows to determine the correct values to use in the user schema configuration:
- `Object Class`: **person** [1]
- `Username Attribute`: **name** [2]
- `Login Attribute`: **sAMAccountName** [3]
- `User Member Attribute`: **memberOf** [4]
> **Note:**
>
> If the AD users in our organisation were to authenticate with their UPN (e.g. jdoe@acme.com) instead of the short logon name, then we would have to set the `Login Attribute` to **userPrincipalName** instead.
We'll also set the `Search Attribute` parameter to **sAMAccountName|name**. That way users can be added to clusters/projects in the Rancher UI either by entering their username or full name.
### Identify Group Schema
Next, we'll query one of the groups associated with this user, in this case `CN=examplegroup,OU=Groups,DC=acme,DC=com`:
```
$ ldapsearch -x -D "acme\jdoe" -w "secret" -p 389 \
-h ad.acme.com -b "ou=groups,dc=acme,dc=com" \
-s sub "CN=examplegroup"
```
This command will inform us on the attributes used for group objects:
{{< img "/img/rancher/ldapsearch-group.png" "LDAP Group">}}
Again, this allows us to determine the correct values to enter in the group schema configuration:
- `Object Class`: **group** [1]
- `Name Attribute`: **name** [2]
- `Group Member Mapping Attribute`: **member** [3]
- `Search Attribute`: **sAMAccountName** [4]
Looking at the value of the **member** attribute, we can see that it contains the DN of the referenced user. This corresponds to the **distinguishedName** attribute in our user object. Accordingly will have to set the value of the `Group Member User Attribute` parameter to this attribute.
In the same way, we can observe that the value in the **memberOf** attribute in the user object corresponds to the **distinguishedName** [5] of the group. We therefore need to set the value for the `Group DN Attribute` parameter to this attribute.
## Annex: Troubleshooting
If you are experiencing issues while testing the connection to the Active Directory server, first double-check the credentials entered for the service account as well as the search base configuration. You may also inspect the Rancher logs to help pinpointing the problem cause. Debug logs may contain more detailed information about the error. Please refer to [How can I enable debug logging]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/technical/#how-can-i-enable-debug-logging) in this documentation.
@@ -0,0 +1,209 @@
---
title: Configuring Azure AD
weight: 1115
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/azure-ad/
---
_Available as of v2.0.3_
If you have an instance of Active Directory (AD) hosted in Azure, you can configure Rancher to allow your users to log in using their AD accounts. Configuration of Azure AD external authentication requires you to make configurations in both Azure and Rancher.
>**Note:** Azure AD integration only supports Service Provider initiated logins.
>**Prerequisite:** Have an instance of Azure AD configured.
>**Note:** Most of this procedure takes place from the [Microsoft Azure Portal](https://portal.azure.com/).
## Azure Active Directory Configuration Outline
Configuring Rancher to allow your users to authenticate with their Azure AD accounts involves multiple procedures. Review the outline below before getting started.
<a id="tip"></a>
>**Tip:** Before you start, we recommend creating an empty text file. You can use this file to copy values from Azure that you'll paste into Rancher later.
<!-- TOC -->
- [1. Register Rancher with Azure](#1-register-rancher-with-azure)
- [2. Create a new client secret](#2-create-a-new-client-secret)
- [3. Set Required Permissions for Rancher](#3-set-required-permissions-for-rancher)
- [4. Add a Reply URL](#4-add-a-reply-url)
- [5. Copy Azure Application Data](#5-copy-azure-application-data)
- [6. Configure Azure AD in Rancher](#6-configure-azure-ad-in-rancher)
<!-- /TOC -->
### 1. Register Rancher with Azure
Before enabling Azure AD within Rancher, you must register Rancher with Azure.
1. Log in to [Microsoft Azure](https://portal.azure.com/) as an administrative user. Configuration in future steps requires administrative access rights.
1. Use search to open the **App registrations** service.
![Open App Registrations]({{<baseurl>}}/img/rancher/search-app-registrations.png)
1. Click **New registrations** and complete the **Create** form.
![New App Registration]({{<baseurl>}}/img/rancher/new-app-registration.png)
1. Enter a **Name** (something like `Rancher`).
1. From **Supported account types**, select "Accounts in this organizational directory only (AzureADTest only - Single tenant)" This corresponds to the legacy app registration options.
1. In the **Redirect URI** section, make sure **Web** is selected from the dropdown and enter the URL of your Rancher Server in the text box next to the dropdown. This Rancher server URL should be appended with the verification path: `<MY_RANCHER_URL>/verify-auth-azure`.
>**Tip:** You can find your personalized Azure reply URL in Rancher on the Azure AD Authentication page (Global View > Security Authentication > Azure AD).
1. Click **Register**.
>**Note:** It can take up to five minutes for this change to take affect, so don't be alarmed if you can't authenticate immediately after Azure AD configuration.
### 2. Create a new client secret
From the Azure portal, create a client secret. Rancher will use this key to authenticate with Azure AD.
1. Use search to open **App registrations** services. Then open the entry for Rancher that you created in the last procedure.
![Open Rancher Registration]({{<baseurl>}}/img/rancher/open-rancher-app.png)
1. From the navigation pane on left, click **Certificates and Secrets**.
1. Click **New client secret**.
![Create new client secret]({{< baseurl >}}/img/rancher/select-client-secret.png)
1. Enter a **Description** (something like `Rancher`).
1. Select duration for the key from the options under **Expires**. This drop-down sets the expiration date for the key. Shorter durations are more secure, but require you to create a new key after expiration.
1. Click **Add** (you don't need to enter a value—it will automatically populate after you save).
<a id="secret"></a>
1. Copy the key value and save it to an [empty text file](#tip).
You'll enter this key into the Rancher UI later as your **Application Secret**.
You won't be able to access the key value again within the Azure UI.
### 3. Set Required Permissions for Rancher
Next, set API permissions for Rancher within Azure.
1. From the navigation pane on left, select **API permissions**.
![Open Required Permissions]({{<baseurl>}}/img/rancher/select-required-permissions.png)
1. Click **Add a permission**.
1. From the **Azure Active Directory Graph**, select the following **Delegated Permissions**:
![Select API Permissions]({{< baseurl >}}/img/rancher/select-required-permissions-2.png)
<br/>
<br/>
- **Access the directory as the signed-in user**
- **Read directory data**
- **Read all groups**
- **Read all users' full profiles**
- **Read all users' basic profiles**
- **Sign in and read user profile**
1. Click **Add permissions**.
1. From **API permissions**, click **Grant admin consent**. Then click **Yes**.
>**Note:** You must be signed in as an Azure administrator to successfully save your permission settings.
### 4. Add a Reply URL
To use Azure AD with Rancher you must whitelist Rancher with Azure. You can complete this whitelisting by providing Azure with a reply URL for Rancher, which is your Rancher Server URL followed with a verification path.
1. From the **Setting** blade, select **Reply URLs**.
![Azure: Enter Reply URL]({{<baseurl>}}/img/rancher/enter-azure-reply-url.png)
1. From the **Reply URLs** blade, enter the URL of your Rancher Server, appended with the verification path: `<MY_RANCHER_URL>/verify-auth-azure`.
>**Tip:** You can find your personalized Azure reply URL in Rancher on the Azure AD Authentication page (Global View > Security Authentication > Azure AD).
1. Click **Save**.
**Result:** Your reply URL is saved.
>**Note:** It can take up to five minutes for this change to take affect, so don't be alarmed if you can't authenticate immediately after Azure AD configuration.
### 5. Copy Azure Application Data
As your final step in Azure, copy the data that you'll use to configure Rancher for Azure AD authentication and paste it into an empty text file.
1. Obtain your Rancher **Tenant ID**.
1. Use search to open the **Azure Active Directory** service.
![Open Azure Active Directory]({{<baseurl>}}/img/rancher/search-azure-ad.png)
1. From the left navigation pane, open **Overview**.
2. Copy the **Directory ID** and paste it into your [text file](#tip).
You'll paste this value into Rancher as your **Tenant ID**.
1. Obtain your Rancher **Application ID**.
1. Use search to open **App registrations**.
![Open App Registrations]({{<baseurl>}}/img/rancher/search-app-registrations.png)
1. Find the entry you created for Rancher.
1. Copy the **Application ID** and paste it to your [text file](#tip).
1. Obtain your Rancher **Graph Endpoint**, **Token Endpoint**, and **Auth Endpoint**.
1. From **App registrations**, click **Endpoints**.
![Click Endpoints]({{<baseurl>}}/img/rancher/click-endpoints.png)
2. Copy the following endpoints to your clipboard and paste them into your [text file](#tip) (these values will be your Rancher endpoint values).
- **Microsoft Graph API endpoint** (Graph Endpoint)
- **OAuth 2.0 token endpoint (v1)** (Token Endpoint)
- **OAuth 2.0 authorization endpoint (v1)** (Auth Endpoint)
>**Note:** Copy the v1 version of the endpoints
### 6. Configure Azure AD in Rancher
From the Rancher UI, enter information about your AD instance hosted in Azure to complete configuration.
Enter the values that you copied to your [text file](#tip).
1. Log into Rancher. From the **Global** view, select **Security > Authentication**.
1. Select **Azure AD**.
1. Complete the **Configure Azure AD Account** form using the information you copied while completing [Copy Azure Application Data](#5-copy-azure-application-data).
>**Important:** When entering your Graph Endpoint, remove the tenant ID from the URL, like below.
>
><code>http<span>s://g</span>raph.windows.net/<del>abb5adde-bee8-4821-8b03-e63efdc7701c</del></code>
The following table maps the values you copied in the Azure portal to the fields in Rancher.
| Rancher Field | Azure Value |
| ------------------ | ------------------------------------- |
| Tenant ID | Directory ID |
| Application ID | Application ID |
| Application Secret | Key Value |
| Endpoint | https://login.microsoftonline.com/ |
| Graph Endpoint | Microsoft Azure AD Graph API Endpoint |
| Token Endpoint | OAuth 2.0 Token Endpoint |
| Auth Endpoint | OAuth 2.0 Authorization Endpoint |
1. Click **Authenticate with Azure**.
**Result:** Azure Active Directory authentication is configured.
@@ -0,0 +1,56 @@
---
title: Configuring FreeIPA
weight: 1114
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/freeipa/
---
_Available as of v2.0.5_
If your organization uses FreeIPA for user authentication, you can configure Rancher to allow your users to login using their FreeIPA credentials.
>**Prerequisites:**
>
>- You must have a [FreeIPA Server](https://www.freeipa.org/) configured.
>- Create a service account in FreeIPA with `read-only` access. Rancher uses this account to verify group membership when a user makes a request using an API key.
>- Read [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
1. Sign into Rancher using a local user assigned the `administrator` role (i.e., the _local principal_).
2. From the **Global** view, select **Security > Authentication** from the main menu.
3. Select **FreeIPA**.
4. Complete the **Configure an FreeIPA server** form.
You may need to log in to your domain controller to find the information requested in the form.
>**Using TLS?**
>If the certificate is self-signed or not from a recognized certificate authority, make sure you provide the complete chain. That chain is needed to verify the server's certificate.
<br/>
<br/>
>**User Search Base vs. Group Search Base**
>
>Search base allows Rancher to search for users and groups that are in your FreeIPA. These fields are only for search bases and not for search filters.
>
>* If your users and groups are in the same search base, complete only the User Search Base.
>* If your groups are in a different search base, you can optionally complete the Group Search Base. This field is dedicated to searching groups, but is not required.
5. If your FreeIPA deviates from the standard AD schema, complete the **Customize Schema** form to match it. Otherwise, skip this step.
>**Search Attribute** The Search Attribute field defaults with three specific values: `uid|sn|givenName`. After FreeIPA is configured, when a user enters text to add users or groups, Rancher automatically queries the FreeIPA server and attempts to match fields by user id, last name, or first name. Rancher specifically searches for users/groups that begin with the text entered in the search field.
>
>The default field value `uid|sn|givenName`, but you can configure this field to a subset of these fields. The pipe (`|`) between the fields separates these fields.
>
> * `uid`: User ID
> * `sn`: Last Name
> * `givenName`: First Name
>
> With this search attribute, Rancher creates search filters for users and groups, but you *cannot* add your own search filters in this field.
6. Enter your FreeIPA username and password in **Authenticate with FreeIPA** to confirm that Rancher is configured to use FreeIPA authentication.
**Result:**
- FreeIPA authentication is configured.
- You are signed into Rancher with your FreeIPA account (i.e., the _external principal_).
@@ -0,0 +1,53 @@
---
title: Configuring GitHub
weight: 1116
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/github/
---
In environments using GitHub, you can configure Rancher to allow sign on using GitHub credentials.
>**Prerequisites:** Read [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
1. Sign into Rancher using a local user assigned the `administrator` role (i.e., the _local principal_).
2. From the **Global** view, select **Security > Authentication** from the main menu.
3. Select **GitHub**.
4. Follow the directions displayed to **Setup a GitHub Application**. Rancher redirects you to GitHub to complete registration.
>**What's an Authorization Callback URL?**
>
>The Authorization Callback URL is the URL where users go to begin using your application (i.e. the splash screen).
>When you use external authentication, authentication does not actually take place in your application. Instead, authentication takes place externally (in this case, GitHub). After this external authentication completes successfully, the Authorization Callback URL is the location where the user re-enters your application.
5. From GitHub, copy the **Client ID** and **Client Secret**. Paste them into Rancher.
>**Where do I find the Client ID and Client Secret?**
>
>From GitHub, select Settings > Developer Settings > OAuth Apps. The Client ID and Client Secret are displayed prominently.
6. Click **Authenticate with GitHub**.
7. Use the **Site Access** options to configure the scope of user authorization.
- **Allow any valid Users**
_Any_ GitHub user can access Rancher. We generally discourage use of this setting!
- **Allow members of Clusters, Projects, plus Authorized Users and Organizations**
Any GitHub user or group added as a **Cluster Member** or **Project Member** can log in to Rancher. Additionally, any GitHub user or group you add to the **Authorized Users and Organizations** list may log in to Rancher.
- **Restrict access to only Authorized Users and Organizations**
Only GitHub users or groups added to the Authorized Users and Organizations can log in to Rancher.
<br/>
8. Click **Save**.
**Result:**
- GitHub authentication is configured.
- You are signed into Rancher with your GitHub account (i.e., the _external principal_).
@@ -0,0 +1,106 @@
---
title: Configuring Google OAuth
---
_Available as of v2.3.0_
If your organization uses G Suite for user authentication, you can configure Rancher to allow your users to log in using their G Suite credentials.
Only admins of the G Suite domain have access to the Admin SDK. Therefore, only G Suite admins can configure Google OAuth for Rancher.
Within Rancher, only administrators or users with the **Manage Authentication** [global role]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) can configure authentication.
# Prerequisites
- You must have a [G Suite admin account](https://admin.google.com) configured.
- G Suite requires a [top private domain FQDN](https://github.com/google/guava/wiki/InternetDomainNameExplained#public-suffixes-and-private-domains) as an authorized domain. One way to get an FQDN is by creating an A-record in Route53 for your Rancher server. You do not need to update your Rancher Server URL setting with that record, because there could be clusters using that URL.
- You must have the Admin SDK API enabled for your G Suite domain. You can enable it using the steps on [this page.](https://support.google.com/a/answer/60757?hl=en)
After the Admin SDK API is enabled, your G Suite domain's API screen should look like this:
![Enable Admin APIs]({{<baseurl>}}/img/rancher/Google-Enable-APIs-Screen.png)
# Setting up G Suite for OAuth with Rancher
Before you can set up Google OAuth in Rancher, you need to log in to your G Suite account and do the following:
1. [Add Rancher as an authorized domain in G Suite](#1-adding-rancher-as-an-authorized-domain)
1. [Generate OAuth2 credentials for the Rancher server](#2-creating-oauth2-credentials-for-the-rancher-server)
1. [Create service account credentials for the Rancher server](#3-creating-service-account-credentials)
1. [Register the service account key as an OAuth Client](#4-register-the-service-account-key-as-an-oauth-client)
### 1. Adding Rancher as an Authorized Domain
1. Click [here](https://console.developers.google.com/apis/credentials) to go to credentials page of your Google domain.
1. Select your project and click **OAuth consent screen.**
![OAuth Consent Screen]({{<baseurl>}}/img/rancher/Google-OAuth-consent-screen-tab.png)
1. Go to **Authorized Domains** and enter the top private domain of your Rancher server URL in the list. The top private domain is the rightmost superdomain. So for example, www.foo.co.uk a top private domain of foo.co.uk. For more information on top-level domains, refer to [this article.](https://github.com/google/guava/wiki/InternetDomainNameExplained#public-suffixes-and-private-domains)
1. Go to **Scopes for Google APIs** and make sure **email,** **profile** and **openid** are enabled.
**Result:** Rancher has been added as an authorized domain for the Admin SDK API.
### 2. Creating OAuth2 Credentials for the Rancher Server
1. Go to the Google API console, select your project, and go to the [credentials page.](https://console.developers.google.com/apis/credentials)
![Credentials]({{<baseurl>}}/img/rancher/Google-Credentials-tab.png)
1. On the **Create Credentials** dropdown, select **OAuth client ID.**
1. Click **Web application.**
1. Provide a name.
1. Fill out the **Authorized JavaScript origins** and **Authorized redirect URIs.** Note: The Rancher UI page for setting up Google OAuth (available from the Global view under **Security > Authentication > Google**) provides you the exact links to enter for this step.
- Under **Authorized JavaScript origins,** enter your Rancher server URL.
- Under **Authorized redirect URIs,** enter your Rancher server URL appended with the path `verify-auth`. For example, if your URI is `https://rancherServer`, you will enter `https://rancherServer/verify-auth`.
1. Click on **Create.**
1. After the credential is created, you will see a screen with a list of your credentials. Choose the credential you just created, and in that row on rightmost side, click **Download JSON.** Save the file so that you can provide these credentials to Rancher.
**Result:** Your OAuth credentials have been successfully created.
### 3. Creating Service Account Credentials
Since the Google Admin SDK is available only to admins, regular users cannot use it to retrieve profiles of other users or their groups. Regular users cannot even retrieve their own groups.
Since Rancher provides group-based membership access, we require the users to be able to get their own groups, and look up other users and groups when needed.
As a workaround to get this capability, G Suite recommends creating a service account and delegating authority of your G Suite domain to that service account.
This section describes how to:
- Create a service account
- Create a key for the service account and download the credentials as JSON
1. Click [here](https://console.developers.google.com/iam-admin/serviceaccounts) and select your project for which you generated OAuth credentials.
1. Click on **Create Service Account.**
1. Enter a name and click **Create.**
![Service account creation Step 1]({{<baseurl>}}/img/rancher/Google-svc-acc-step1.png)
1. Don't provide any roles on the **Service account permissions** page and click **Continue**
![Service account creation Step 2]({{<baseurl>}}/img/rancher/Google-svc-acc-step2.png)
1. Click on **Create Key** and select the JSON option. Download the JSON file and save it so that you can provide it as the service account credentials to Rancher.
![Service account creation Step 3]({{<baseurl>}}/img/rancher/Google-svc-acc-step3-key-creation.png)
**Result:** Your service account is created.
### 4. Register the Service Account Key as an OAuth Client
You will need to grant some permissions to the service account you created in the last step. Rancher requires you to grant only read-only permissions for users and groups.
Using the Unique ID of the service account key, register it as an Oauth Client using the following steps:
1. Get the Unique ID of the key you just created. If it's not displayed in the list of keys right next to the one you created, you will have to enable it. To enable it, click **Unique ID** and click **OK.** This will add a **Unique ID** column to the list of service account keys. Save the one listed for the service account you created. NOTE: This is a numeric key, not to be confused with the alphanumeric field **Key ID.**
![Service account Unique ID]({{<baseurl>}}/img/rancher/Google-Select-UniqueID-column.png)
1. Go to the [**Manage OAuth Client Access** page.](https://admin.google.com/AdminHome?chromeless=1#OGX:ManageOauthClients)
1. Add the Unique ID obtained in the previous step in the **Client Name** field.
1. In the **One or More API Scopes** field, add the following scopes:
```
openid,profile,email,https://www.googleapis.com/auth/admin.directory.user.readonly,https://www.googleapis.com/auth/admin.directory.group.readonly
```
1. Click **Authorize.**
**Result:** The service account is registered as an OAuth client in your G Suite account.
# Configuring Google OAuth in Rancher
1. Sign into Rancher using a local user assigned the [administrator]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions) role. This user is also called the local principal.
1. From the **Global** view, click **Security > Authentication** from the main menu.
1. Click **Google.** The instructions in the UI cover the steps to set up authentication with Google OAuth.
1. Admin Email: Provide the email of an administrator account from your GSuite setup. In order to perform user and group lookups, google apis require an administrator's email in conjunction with the service account key.
1. Domain: Provide the domain on which you have configured GSuite. Provide the exact domain and not any aliases.
1. Nested Group Membership: Check this box to enable nested group memberships. Rancher admins can disable this at any time after configuring auth.
- **Step One** is about adding Rancher as an authorized domain, which we already covered in [this section.](#1-adding-rancher-as-an-authorized-domain)
- For **Step Two,** provide the OAuth credentials JSON that you downloaded after completing [this section.](#2-creating-oauth2-credentials-for-the-rancher-server) You can upload the file or paste the contents into the **OAuth Credentials** field.
- For **Step Three,** provide the service account credentials JSON that downloaded at the end of [this section.](#3-creating-service-account-credentials) The credentials will only work if you successfully [registered the service account key](#4-register-the-service-account-key-as-an-oauth-client) as an OAuth client in your G Suite account.
1. Click **Authenticate with Google**.
1. Click **Save**.
**Result:** Google authentication is successfully configured.
@@ -0,0 +1,126 @@
---
title: Configuring Keycloak (SAML)
description: Create a Keycloak SAML client and configure Rancher to work with Keycloak. By the end your users will be able to sign into Rancher using their Keycloak logins
weight: 1200
---
_Available as of v2.1.0_
If your organization uses Keycloak Identity Provider (IdP) for user authentication, you can configure Rancher to allow your users to log in using their IdP credentials.
## Prerequisites
- You must have a [Keycloak IdP Server](https://www.keycloak.org/docs/latest/server_installation/) configured.
- In Keycloak, create a [new SAML client](https://www.keycloak.org/docs/latest/server_admin/#saml-clients), with the settings below. See the [Keycloak documentation](https://www.keycloak.org/docs/latest/server_admin/#saml-clients) for help.
Setting | Value
------------|------------
`Sign Documents` | `ON` <sup>1</sup>
`Sign Assertions` | `ON` <sup>1</sup>
All other `ON/OFF` Settings | `OFF`
`Client ID` | Either `https://yourRancherHostURL/v1-saml/keycloak/saml/metadata` or the value configured in the `Entry ID Field` of the Rancher Keycloak configuration<sup>2</sup>
`Client Name` | <CLIENT_NAME> (e.g. `rancher`)
`Client Protocol` | `SAML`
`Valid Redirect URI` | `https://yourRancherHostURL/v1-saml/keycloak/saml/acs`
><sup>1</sup>: Optionally, you can enable either one or both of these settings.
><sup>2</sup>: Rancher SAML metadata won't be generated until a SAML provider is configured and saved.
{{< img "/img/rancher/keycloak/keycloak-saml-client-configuration.png" "">}}
- In the new SAML client, create Mappers to expose the users fields
- Add all "Builtin Protocol Mappers"
{{< img "/img/rancher/keycloak/keycloak-saml-client-builtin-mappers.png" "">}}
- Create a new "Group list" mapper to map the member attribute to a user's groups
{{< img "/img/rancher/keycloak/keycloak-saml-client-group-mapper.png" "">}}
- Export a `metadata.xml` file from your Keycloak client:
From the `Installation` tab, choose the `SAML Metadata IDPSSODescriptor` format option and download your file.
>**Note**
> Keycloak versions 6.0.0 and up no longer provide the IDP metadata under the `Installation` tab.
> You can still get the XML from the following url:
>
> `https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}/protocol/saml/descriptor`
>
> The XML obtained from this URL contains `EntitiesDescriptor` as the root element. Rancher expects the root element to be `EntityDescriptor` rather than `EntitiesDescriptor`. So before passing this XML to Rancher, follow these steps to adjust it:
>
> * Copy all the attributes from `EntitiesDescriptor` to the `EntityDescriptor` that are not present.
> * Remove the `<EntitiesDescriptor>` tag from the beginning.
> * Remove the `</EntitiesDescriptor>` from the end of the xml.
>
> You are left with something similar as the example below:
>
> ```
> <EntityDescriptor xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" entityID="https://{KEYCLOAK-URL}/auth/realms/{REALM-NAME}">
> ....
> </EntityDescriptor>
> ```
## Configuring Keycloak in Rancher
1. From the **Global** view, select **Security > Authentication** from the main menu.
1. Select **Keycloak**.
1. Complete the **Configure Keycloak Account** form.
| Field | Description |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Display Name Field | The attribute that contains the display name of users. <br/><br/>Example: `givenName` |
| User Name Field | The attribute that contains the user name/given name. <br/><br/>Example: `email` |
| UID Field | An attribute that is unique to every user. <br/><br/>Example: `email` |
| Groups Field | Make entries for managing group memberships. <br/><br/>Example: `member` |
| Entity ID Field | The ID that needs to be configured as a client ID in the Keycloak client. <br/><br/>Default: `https://yourRancherHostURL/v1-saml/keycloak/saml/metadata` |
| Rancher API Host | The URL for your Rancher Server. |
| Private Key / Certificate | A key/certificate pair to create a secure shell between Rancher and your IdP. |
| IDP-metadata | The `metadata.xml` file that you exported from your IdP server. |
>**Tip:** You can generate a key/certificate pair using an openssl command. For example:
>
> openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout myservice.key -out myservice.cert
1. After you complete the **Configure Keycloak Account** form, click **Authenticate with Keycloak**, which is at the bottom of the page.
Rancher redirects you to the IdP login page. Enter credentials that authenticate with Keycloak IdP to validate your Rancher Keycloak configuration.
>**Note:** You may have to disable your popup blocker to see the IdP login page.
**Result:** Rancher is configured to work with Keycloak. Your users can now sign into Rancher using their Keycloak logins.
{{< saml_caveats >}}
## Annex: Troubleshooting
If you are experiencing issues while testing the connection to the Keycloak server, first double-check the configuration option of your SAML client. You may also inspect the Rancher logs to help pinpointing the problem cause. Debug logs may contain more detailed information about the error. Please refer to [How can I enable debug logging]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/technical/#how-can-i-enable-debug-logging) in this documentation.
### You are not redirected to Keycloak
When you click on **Authenticate with Keycloak**, your are not redirected to your IdP.
* Verify your Keycloak client configuration.
* Make sure `Force Post Binding` set to `OFF`.
### Forbidden message displayed after IdP login
You are correctly redirected to your IdP login page and you are able to enter your credentials, however you get a `Forbidden` message afterwards.
* Check the Rancher debug log.
* If the log displays `ERROR: either the Response or Assertion must be signed`, make sure either `Sign Documents` or `Sign assertions` is set to `ON` in your Keycloak client.
### HTTP 502 when trying to access /v1-saml/keycloak/saml/metadata
This is usually due to the metadata not being created until a SAML provider is configured.
Try configuring and saving keycloak as your SAML provider and then accessing the metadata.
### Keycloak Error: "We're sorry, failed to process response"
* Check your Keycloak log.
* If the log displays `failed: org.keycloak.common.VerificationException: Client does not have a public key`, set `Encrypt Assertions` to `OFF` in your Keycloak client.
### Keycloak Error: "We're sorry, invalid requester"
* Check your Keycloak log.
* If the log displays `request validation failed: org.keycloak.common.VerificationException: SigAlg was null`, set `Client Signature Required` to `OFF` in your Keycloak client.
@@ -0,0 +1,16 @@
---
title: Local Authentication
weight: 1111
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/local-authentication/
---
Local authentication is the default until you configure an external authentication provider. Local authentication is where Rancher stores the user information, i.e. names and passwords, of who can log in to Rancher. By default, the `admin` user that logs in to Rancher for the first time is a local user.
## Adding Local Users
Regardless of whether you use external authentication, you should create a few local authentication users so that you can continue using Rancher if your external authentication service encounters issues.
1. From the **Global** view, select **Users** from the navigation bar.
2. Click **Add User**. Then complete the **Add User** form. Click **Create** when you're done.
@@ -0,0 +1,31 @@
---
title: Configuring Microsoft Active Directory Federation Service (SAML)
weight: 1205
---
_Available as of v2.0.7_
If your organization uses Microsoft Active Directory Federation Services (AD FS) for user authentication, you can configure Rancher to allow your users to log in using their AD FS credentials.
## Prerequisites
You must have Rancher installed.
- Obtain your Rancher Server URL. During AD FS configuration, substitute this URL for the `<RANCHER_SERVER>` placeholder.
- You must have a global administrator account on your Rancher installation.
You must have a [Microsoft AD FS Server](https://docs.microsoft.com/en-us/windows-server/identity/active-directory-federation-services) configured.
- Obtain your AD FS Server IP/DNS name. During AD FS configuration, substitute this IP/DNS name for the `<AD_SERVER>` placeholder.
- You must have access to add [Relying Party Trusts](https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/create-a-relying-party-trust) on your AD FS Server.
## Setup Outline
Setting up Microsoft AD FS with Rancher Server requires configuring AD FS on your Active Directory server, and configuring Rancher to utilize your AD FS server. The following pages serve as guides for setting up Microsoft AD FS authentication on your Rancher installation.
- [1. Configuring Microsoft AD FS for Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/microsoft-adfs-setup)
- [2. Configuring Rancher for Microsoft AD FS]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/rancher-adfs-setup)
{{< saml_caveats >}}
### [Next: Configuring Microsoft AD FS for Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/microsoft-adfs-setup)
@@ -0,0 +1,82 @@
---
title: 1. Configuring Microsoft AD FS for Rancher
weight: 1205
---
Before configuring Rancher to support AD FS users, you must add Rancher as a [relying party trust](https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/technical-reference/understanding-key-ad-fs-concepts) in AD FS.
1. Log into your AD server as an administrative user.
1. Open the **AD FS Management** console. Select **Add Relying Party Trust...** from the **Actions** menu and click **Start**.
{{< img "/img/rancher/adfs/adfs-overview.png" "">}}
1. Select **Enter data about the relying party manually** as the option for obtaining data about the relying party.
{{< img "/img/rancher/adfs/adfs-add-rpt-2.png" "">}}
1. Enter your desired **Display name** for your Relying Party Trust. For example, `Rancher`.
{{< img "/img/rancher/adfs/adfs-add-rpt-3.png" "">}}
1. Select **AD FS profile** as the configuration profile for your relying party trust.
{{< img "/img/rancher/adfs/adfs-add-rpt-4.png" "">}}
1. Leave the **optional token encryption certificate** empty, as Rancher AD FS will not be using one.
{{< img "/img/rancher/adfs/adfs-add-rpt-5.png" "">}}
1. Select **Enable support for the SAML 2.0 WebSSO protocol**
and enter `https://<rancher-server>/v1-saml/adfs/saml/acs` for the service URL.
{{< img "/img/rancher/adfs/adfs-add-rpt-6.png" "">}}
1. Add `https://<rancher-server>/v1-saml/adfs/saml/metadata` as the **Relying party trust identifier**.
{{< img "/img/rancher/adfs/adfs-add-rpt-7.png" "">}}
1. This tutorial will not cover multi-factor authentication; please refer to the [Microsoft documentation](https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/operations/configure-additional-authentication-methods-for-ad-fs) if you would like to configure multi-factor authentication.
{{< img "/img/rancher/adfs/adfs-add-rpt-8.png" "">}}
1. From **Choose Issuance Authorization RUles**, you may select either of the options available according to use case. However, for the purposes of this guide, select **Permit all users to access this relying party**.
{{< img "/img/rancher/adfs/adfs-add-rpt-9.png" "">}}
1. After reviewing your settings, select **Next** to add the relying party trust.
{{< img "/img/rancher/adfs/adfs-add-rpt-10.png" "">}}
1. Select **Open the Edit Claim Rules...** and click **Close**.
{{< img "/img/rancher/adfs/adfs-add-rpt-11.png" "">}}
1. On the **Issuance Transform Rules** tab, click **Add Rule...**.
{{< img "/img/rancher/adfs/adfs-edit-cr.png" "">}}
1. Select **Send LDAP Attributes as Claims** as the **Claim rule template**.
{{< img "/img/rancher/adfs/adfs-add-tcr-1.png" "">}}
1. Set the **Claim rule name** to your desired name (for example, `Rancher Attributes`) and select **Active Directory** as the **Attribute store**. Create the following mapping to reflect the table below:
| LDAP Attribute | Outgoing Claim Type |
| -------------------------------------------- | ------------------- |
| Given-Name | Given Name |
| User-Principal-Name | UPN |
| Token-Groups - Qualified by Long Domain Name | Group |
| SAM-Account-Name | Name |
<br/>
{{< img "/img/rancher/adfs/adfs-add-tcr-2.png" "">}}
1. Download the `federationmetadata.xml` from your AD server at:
```
https://<AD_SERVER>/federationmetadata/2007-06/federationmetadata.xml
```
**Result:** You've added Rancher as a relying trust party. Now you can configure Rancher to leverage AD.
### [Next: Configuring Rancher for Microsoft AD FS]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/rancher-adfs-setup/)
@@ -0,0 +1,56 @@
---
title: 2. Configuring Rancher for Microsoft AD FS
weight: 1205
---
_Available as of v2.0.7_
After you complete [Configuring Microsoft AD FS for Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/microsoft-adfs/microsoft-adfs-setup/), enter your AD FS information into Rancher to allow AD FS users to authenticate with Rancher.
>**Important Notes For Configuring Your AD FS Server:**
>
>- The SAML 2.0 WebSSO Protocol Service URL is: `https://<RANCHER_SERVER>/v1-saml/adfs/saml/acs`
>- The Relying Party Trust identifier URL is: `https://<RANCHER_SERVER>/v1-saml/adfs/saml/metadata`
>- You must export the `federationmetadata.xml` file from your AD FS server. This can be found at: `https://<AD_SERVER>/federationmetadata/2007-06/federationmetadata.xml`
1. From the **Global** view, select **Security > Authentication** from the main menu.
1. Select **Microsoft Active Directory Federation Services**.
1. Complete the **Configure AD FS Account** form. Microsoft AD FS lets you specify an existing Active Directory (AD) server. The [configuration section below](#configuration) describe how you can map AD attributes to fields within Rancher.
1. After you complete the **Configure AD FS Account** form, click **Authenticate with AD FS**, which is at the bottom of the page.
Rancher redirects you to the AD FS login page. Enter credentials that authenticate with Microsoft AD FS to validate your Rancher AD FS configuration.
>**Note:** You may have to disable your popup blocker to see the AD FS login page.
**Result:** Rancher is configured to work with MS FS. Your users can now sign into Rancher using their MS FS logins.
# Configuration
| Field | Description |
|---------------------------|-----------------|
| Display Name Field | The AD attribute that contains the display name of users. <br/><br/>Example: `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name` |
| User Name Field | The AD attribute that contains the user name/given name. <br/><br/>Example: `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname` |
| UID Field | An AD attribute that is unique to every user. <br/><br/>Example: `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/upn` |
| Groups Field | Make entries for managing group memberships. <br/><br/>Example: `http://schemas.xmlsoap.org/claims/Group` |
| Rancher API Host | The URL for your Rancher Server. |
| Private Key / Certificate | This is a key-certificate pair to create a secure shell between Rancher and your AD FS. Ensure you set the Common Name (CN) to your Rancher Server URL.<br/><br/>[Certificate creation command](#cert-command) |
| Metadata XML | The `federationmetadata.xml` file exported from your AD FS server. <br/><br/>You can find this file at `https://<AD_SERVER>/federationmetadata/2007-06/federationmetadata.xml`. |
<a id="cert-command"></a>
**Tip:** You can generate a certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
@@ -0,0 +1,53 @@
---
title: Configuring Okta (SAML)
weight: 1210
---
_Available as of v2.2.0_
If your organization uses Okta Identity Provider (IdP) for user authentication, you can configure Rancher to allow your users to log in using their IdP credentials.
>**Note:** Okta integration only supports Service Provider initiated logins.
## Prerequisites
In Okta, create a SAML Application with the settings below. See the [Okta documentation](https://developer.okta.com/standards/SAML/setting_up_a_saml_application_in_okta) for help.
Setting | Value
------------|------------
`Single Sign on URL` | `https://yourRancherHostURL/v1-saml/okta/saml/acs`
`Audience URI (SP Entity ID)` | `https://yourRancherHostURL/v1-saml/okta/saml/metadata`
## Configuring Okta in Rancher
1. From the **Global** view, select **Security > Authentication** from the main menu.
1. Select **Okta**.
1. Complete the **Configure Okta Account** form. The examples below describe how you can map Okta attributes from attribute statements to fields within Rancher.
| Field | Description |
| ------------------------- | ----------------------------------------------------------------------------- |
| Display Name Field | The attribute name from an attribute statement that contains the display name of users. |
| User Name Field | The attribute name from an attribute statement that contains the user name/given name. |
| UID Field | The attribute name from an attribute statement that is unique to every user. |
| Groups Field | The attribute name in a group attribute statement that exposes your groups. |
| Rancher API Host | The URL for your Rancher Server. |
| Private Key / Certificate | A key/certificate pair used for Assertion Encryption. |
| Metadata XML | The `Identity Provider metadata` file that you find in the application `Sign On` section. |
>**Tip:** You can generate a key/certificate pair using an openssl command. For example:
>
> openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout myservice.key -out myservice.crt
1. After you complete the **Configure Okta Account** form, click **Authenticate with Okta**, which is at the bottom of the page.
Rancher redirects you to the IdP login page. Enter credentials that authenticate with Okta IdP to validate your Rancher Okta configuration.
>**Note:** If nothing seems to happen, it's likely because your browser blocked the pop-up. Make sure you disable the pop-up blocker for your rancher domain and whitelist it in any other extensions you might utilize.
**Result:** Rancher is configured to work with Okta. Your users can now sign into Rancher using their Okta logins.
{{< saml_caveats >}}
@@ -0,0 +1,52 @@
---
title: Configuring OpenLDAP
weight: 1113
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/authentication/openldap/
---
_Available as of v2.0.5_
If your organization uses LDAP for user authentication, you can configure Rancher to communicate with an OpenLDAP server to authenticate users. This allows Rancher admins to control access to clusters and projects based on users and groups managed externally in the organisation's central user repository, while allowing end-users to authenticate with their LDAP credentials when logging in to the Rancher UI.
## Prerequisites
Rancher must be configured with a LDAP bind account (aka service account) to search and retrieve LDAP entries pertaining to users and groups that should have access. It is recommended to not use an administrator account or personal account for this purpose and instead create a dedicated account in OpenLDAP with read-only access to users and groups under the configured search base (see below).
> **Using TLS?**
>
> If the certificate used by the OpenLDAP server is self-signed or not from a recognised certificate authority, make sure have at hand the CA certificate (concatenated with any intermediate certificates) in PEM format. You will have to paste in this certificate during the configuration so that Rancher is able to validate the certificate chain.
## Configure OpenLDAP in Rancher
Configure the settings for the OpenLDAP server, groups and users. For help filling out each field, refer to the [configuration reference.](./openldap-config)
> Before you proceed with the configuration, please familiarise yourself with the concepts of [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
1. Log into the Rancher UI using the initial local `admin` account.
2. From the **Global** view, navigate to **Security** > **Authentication**
3. Select **OpenLDAP**. The **Configure an OpenLDAP server** form will be displayed.
### Test Authentication
Once you have completed the configuration, proceed by testing the connection to the OpenLDAP server. Authentication with OpenLDAP will be enabled implicitly if the test is successful.
> **Note:**
>
> The OpenLDAP user pertaining to the credentials entered in this step will be mapped to the local principal account and assigned administrator privileges in Rancher. You should therefore make a conscious decision on which LDAP account you use to perform this step.
1. Enter the **username** and **password** for the OpenLDAP account that should be mapped to the local principal account.
2. Click **Authenticate With OpenLDAP** to test the OpenLDAP connection and finalise the setup.
**Result:**
- OpenLDAP authentication is configured.
- The LDAP user pertaining to the entered credentials is mapped to the local principal (administrative) account.
> **Note:**
>
> You will still be able to login using the locally configured `admin` account and password in case of a disruption of LDAP services.
## Annex: Troubleshooting
If you are experiencing issues while testing the connection to the OpenLDAP server, first double-check the credentials entered for the service account as well as the search base configuration. You may also inspect the Rancher logs to help pinpointing the problem cause. Debug logs may contain more detailed information about the error. Please refer to [How can I enable debug logging]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/technical/#how-can-i-enable-debug-logging) in this documentation.
@@ -0,0 +1,86 @@
---
title: OpenLDAP Configuration Reference
weight: 2
---
This section is intended to be used as a reference when setting up an OpenLDAP authentication provider in Rancher.
For further details on configuring OpenLDAP, refer to the [official documentation.](https://www.openldap.org/doc/)
> Before you proceed with the configuration, please familiarize yourself with the concepts of [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
- [Background: OpenLDAP Authentication Flow](#background-openldap-authentication-flow)
- [OpenLDAP server configuration](#openldap-server-configuration)
- [User/group schema configuration](#user-group-schema-configuration)
- [User schema configuration](#user-schema-configuration)
- [Group schema configuration](#group-schema-configuration)
## Background: OpenLDAP Authentication Flow
1. When a user attempts to login with his LDAP credentials, Rancher creates an initial bind to the LDAP server using a service account with permissions to search the directory and read user/group attributes.
2. Rancher then searches the directory for the user by using a search filter based on the provided username and configured attribute mappings.
3. Once the user has been found, he is authenticated with another LDAP bind request using the user's DN and provided password.
4. Once authentication succeeded, Rancher then resolves the group memberships both from the membership attribute in the user's object and by performing a group search based on the configured user mapping attribute.
# OpenLDAP Server Configuration
You will need to enter the address, port, and protocol to connect to your OpenLDAP server. `389` is the standard port for insecure traffic, `636` for TLS traffic.
> **Using TLS?**
>
> If the certificate used by the OpenLDAP server is self-signed or not from a recognized certificate authority, make sure have at hand the CA certificate (concatenated with any intermediate certificates) in PEM format. You will have to paste in this certificate during the configuration so that Rancher is able to validate the certificate chain.
If you are in doubt about the correct values to enter in the user/group Search Base configuration fields, consult your LDAP administrator or refer to the section [Identify Search Base and Schema using ldapsearch]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/ad/#annex-identify-search-base-and-schema-using-ldapsearch) in the Active Directory authentication documentation.
<figcaption>OpenLDAP Server Parameters</figcaption>
| Parameter | Description |
|:--|:--|
| Hostname | Specify the hostname or IP address of the OpenLDAP server |
| Port | Specify the port at which the OpenLDAP server is listening for connections. Unencrypted LDAP normally uses the standard port of 389, while LDAPS uses port 636.|
| TLS | Check this box to enable LDAP over SSL/TLS (commonly known as LDAPS). You will also need to paste in the CA certificate if the server uses a self-signed/enterprise-signed certificate. |
| Server Connection Timeout | The duration in number of seconds that Rancher waits before considering the server unreachable. |
| Service Account Distinguished Name | Enter the Distinguished Name (DN) of the user that should be used to bind, search and retrieve LDAP entries. |
| Service Account Password | The password for the service account. |
| User Search Base | Enter the Distinguished Name of the node in your directory tree from which to start searching for user objects. All users must be descendents of this base DN. For example: "ou=people,dc=acme,dc=com".|
| Group Search Base | If your groups live under a different node than the one configured under `User Search Base` you will need to provide the Distinguished Name here. Otherwise leave this field empty. For example: "ou=groups,dc=acme,dc=com".|
# User/Group Schema Configuration
If your OpenLDAP directory deviates from the standard OpenLDAP schema, you must complete the **Customize Schema** section to match it.
Note that the attribute mappings configured in this section are used by Rancher to construct search filters and resolve group membership. It is therefore always recommended to verify that the configuration here matches the schema used in your OpenLDAP.
If you are unfamiliar with the user/group schema used in the OpenLDAP server, consult your LDAP administrator or refer to the section [Identify Search Base and Schema using ldapsearch]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/ad/#annex-identify-search-base-and-schema-using-ldapsearch) in the Active Directory authentication documentation.
### User Schema Configuration
The table below details the parameters for the user schema configuration.
<figcaption>User Schema Configuration Parameters</figcaption>
| Parameter | Description |
|:--|:--|
| Object Class | The name of the object class used for user objects in your domain. If defined, only specify the name of the object class - *don't* include it in an LDAP wrapper such as &(objectClass=xxxx) |
| Username Attribute | The user attribute whose value is suitable as a display name. |
| Login Attribute | The attribute whose value matches the username part of credentials entered by your users when logging in to Rancher. This is typically `uid`. |
| User Member Attribute | The user attribute containing the Distinguished Name of groups a user is member of. Usually this is one of `memberOf` or `isMemberOf`. |
| Search Attribute | When a user enters text to add users or groups in the UI, Rancher queries the LDAP server and attempts to match users by the attributes provided in this setting. Multiple attributes can be specified by separating them with the pipe ("\|") symbol. |
| User Enabled Attribute | If the schema of your OpenLDAP server supports a user attribute whose value can be evaluated to determine if the account is disabled or locked, enter the name of that attribute. The default OpenLDAP schema does not support this and the field should usually be left empty. |
| Disabled Status Bitmask | This is the value for a disabled/locked user account. The parameter is ignored if `User Enabled Attribute` is empty. |
### Group Schema Configuration
The table below details the parameters for the group schema configuration.
<figcaption>Group Schema Configuration Parameters<figcaption>
| Parameter | Description |
|:--|:--|
| Object Class | The name of the object class used for group entries in your domain. If defined, only specify the name of the object class - *don't* include it in an LDAP wrapper such as &(objectClass=xxxx) |
| Name Attribute | The group attribute whose value is suitable for a display name. |
| Group Member User Attribute | The name of the **user attribute** whose format matches the group members in the `Group Member Mapping Attribute`. |
| Group Member Mapping Attribute | The name of the group attribute containing the members of a group. |
| Search Attribute | Attribute used to construct search filters when adding groups to clusters or projects in the UI. See description of user schema `Search Attribute`. |
| Group DN Attribute | The name of the group attribute whose format matches the values in the user's group membership attribute. See `User Member Attribute`. |
| Nested Group Membership | This settings defines whether Rancher should resolve nested group memberships. Use only if your organization makes use of these nested memberships (ie. you have groups that contain other groups as members). This option is disabled if you are using Shibboleth. |
@@ -0,0 +1,52 @@
---
title: Configuring PingIdentity (SAML)
weight: 1200
---
_Available as of v2.0.7_
If your organization uses Ping Identity Provider (IdP) for user authentication, you can configure Rancher to allow your users to log in using their IdP credentials.
>**Prerequisites:**
>
>- You must have a [Ping IdP Server](https://www.pingidentity.com/) configured.
>- Following are the Rancher Service Provider URLs needed for configuration:
Metadata URL: `https://<rancher-server>/v1-saml/ping/saml/metadata`
Assertion Consumer Service (ACS) URL: `https://<rancher-server>/v1-saml/ping/saml/acs`
Note that these URLs will not return valid data until the authentication configuration is saved in Rancher.
>- Export a `metadata.xml` file from your IdP Server. For more information, see the [PingIdentity documentation](https://documentation.pingidentity.com/pingfederate/pf83/index.shtml#concept_exportingMetadata.html).
1. From the **Global** view, select **Security > Authentication** from the main menu.
1. Select **PingIdentity**.
1. Complete the **Configure Ping Account** form. Ping IdP lets you specify what data store you want to use. You can either add a database or use an existing ldap server. For example, if you select your Active Directory (AD) server, the examples below describe how you can map AD attributes to fields within Rancher.
1. **Display Name Field**: Enter the AD attribute that contains the display name of users (example: `displayName`).
1. **User Name Field**: Enter the AD attribute that contains the user name/given name (example: `givenName`).
1. **UID Field**: Enter an AD attribute that is unique to every user (example: `sAMAccountName`, `distinguishedName`).
1. **Groups Field**: Make entries for managing group memberships (example: `memberOf`).
1. **Rancher API Host**: Enter the URL for your Rancher Server.
1. **Private Key** and **Certificate**: This is a key-certificate pair to create a secure shell between Rancher and your IdP.
You can generate one using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
1. **IDP-metadata**: The `metadata.xml` file that you [exported from your IdP server](https://documentation.pingidentity.com/pingfederate/pf83/index.shtml#concept_exportingMetadata.html).
1. After you complete the **Configure Ping Account** form, click **Authenticate with Ping**, which is at the bottom of the page.
Rancher redirects you to the IdP login page. Enter credentials that authenticate with Ping IdP to validate your Rancher PingIdentity configuration.
>**Note:** You may have to disable your popup blocker to see the IdP login page.
**Result:** Rancher is configured to work with PingIdentity. Your users can now sign into Rancher using their PingIdentity logins.
{{< saml_caveats >}}
@@ -0,0 +1,109 @@
---
title: Configuring Shibboleth (SAML)
weight: 1210
---
_Available as of v2.4.0_
If your organization uses Shibboleth Identity Provider (IdP) for user authentication, you can configure Rancher to allow your users to log in to Rancher using their Shibboleth credentials.
In this configuration, when Rancher users log in, they will be redirected to the Shibboleth IdP to enter their credentials. After authentication, they will be redirected back to the Rancher UI.
If you also configure OpenLDAP as the back end to Shibboleth, it will return a SAML assertion to Rancher with user attributes that include groups. Then the authenticated user will be able to access resources in Rancher that their groups have permissions for.
> The instructions in this section assume that you understand how Rancher, Shibboleth, and OpenLDAP work together. For a more detailed explanation of how it works, refer to [this page.](./about)
This section covers the following topics:
- [Setting up Shibboleth in Rancher](#setting-up-shibboleth-in-rancher)
- [Shibboleth Prerequisites](#shibboleth-prerequisites)
- [Configure Shibboleth in Rancher](#configure-shibboleth-in-rancher)
- [SAML Provider Caveats](#saml-provider-caveats)
- [Setting up OpenLDAP in Rancher](#setting-up-openldap-in-rancher)
- [OpenLDAP Prerequisites](#openldap-prerequisites)
- [Configure OpenLDAP in Rancher](#configure-openldap-in-rancher)
- [Troubleshooting](#troubleshooting)
# Setting up Shibboleth in Rancher
### Shibboleth Prerequisites
>
>- You must have a Shibboleth IdP Server configured.
>- Following are the Rancher Service Provider URLs needed for configuration:
Metadata URL: `https://<rancher-server>/v1-saml/shibboleth/saml/metadata`
Assertion Consumer Service (ACS) URL: `https://<rancher-server>/v1-saml/shibboleth/saml/acs`
>- Export a `metadata.xml` file from your IdP Server. For more information, see the [Shibboleth documentation.](https://wiki.shibboleth.net/confluence/display/SP3/Home)
### Configure Shibboleth in Rancher
If your organization uses Shibboleth for user authentication, you can configure Rancher to allow your users to log in using their IdP credentials.
1. From the **Global** view, select **Security > Authentication** from the main menu.
1. Select **Shibboleth**.
1. Complete the **Configure Shibboleth Account** form. Shibboleth IdP lets you specify what data store you want to use. You can either add a database or use an existing ldap server. For example, if you select your Active Directory (AD) server, the examples below describe how you can map AD attributes to fields within Rancher.
1. **Display Name Field**: Enter the AD attribute that contains the display name of users (example: `displayName`).
1. **User Name Field**: Enter the AD attribute that contains the user name/given name (example: `givenName`).
1. **UID Field**: Enter an AD attribute that is unique to every user (example: `sAMAccountName`, `distinguishedName`).
1. **Groups Field**: Make entries for managing group memberships (example: `memberOf`).
1. **Rancher API Host**: Enter the URL for your Rancher Server.
1. **Private Key** and **Certificate**: This is a key-certificate pair to create a secure shell between Rancher and your IdP.
You can generate one using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
1. **IDP-metadata**: The `metadata.xml` file that you exported from your IdP server.
1. After you complete the **Configure Shibboleth Account** form, click **Authenticate with Shibboleth**, which is at the bottom of the page.
Rancher redirects you to the IdP login page. Enter credentials that authenticate with Shibboleth IdP to validate your Rancher Shibboleth configuration.
>**Note:** You may have to disable your popup blocker to see the IdP login page.
**Result:** Rancher is configured to work with Shibboleth. Your users can now sign into Rancher using their Shibboleth logins.
### SAML Provider Caveats
If you configure Shibboleth without OpenLDAP, the following caveats apply due to the fact that SAML Protocol does not support search or lookup for users or groups.
- There is no validation on users or groups when assigning permissions to them in Rancher.
- When adding users, the exact user IDs (i.e. UID Field) must be entered correctly. As you type the user ID, there will be no search for other user IDs that may match.
- When adding groups, you must select the group from the drop-down that is next to the text box. Rancher assumes that any input from the text box is a user.
- The group drop-down shows only the groups that you are a member of. You will not be able to add groups that you are not a member of.
To enable searching for groups when assigning permissions in Rancher, you will need to configure a back end for the SAML provider that supports groups, such as OpenLDAP.
# Setting up OpenLDAP in Rancher
If you also configure OpenLDAP as the back end to Shibboleth, it will return a SAML assertion to Rancher with user attributes that include groups. Then authenticated users will be able to access resources in Rancher that their groups have permissions for.
### OpenLDAP Prerequisites
Rancher must be configured with a LDAP bind account (aka service account) to search and retrieve LDAP entries pertaining to users and groups that should have access. It is recommended to not use an administrator account or personal account for this purpose and instead create a dedicated account in OpenLDAP with read-only access to users and groups under the configured search base (see below).
> **Using TLS?**
>
> If the certificate used by the OpenLDAP server is self-signed or not from a recognized certificate authority, make sure have at hand the CA certificate (concatenated with any intermediate certificates) in PEM format. You will have to paste in this certificate during the configuration so that Rancher is able to validate the certificate chain.
### Configure OpenLDAP in Rancher
Configure the settings for the OpenLDAP server, groups and users. For help filling out each field, refer to the [configuration reference.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/openldap/openldap-config) Note that nested group membership is not available for Shibboleth.
> Before you proceed with the configuration, please familiarise yourself with the concepts of [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
1. Log into the Rancher UI using the initial local `admin` account.
2. From the **Global** view, navigate to **Security** > **Authentication**
3. Select **OpenLDAP**. The **Configure an OpenLDAP server** form will be displayed.
# Troubleshooting
If you are experiencing issues while testing the connection to the OpenLDAP server, first double-check the credentials entered for the service account as well as the search base configuration. You may also inspect the Rancher logs to help pinpointing the problem cause. Debug logs may contain more detailed information about the error. Please refer to [How can I enable debug logging]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/technical/#how-can-i-enable-debug-logging) in this documentation.
@@ -0,0 +1,34 @@
---
title: Group Permissions with Shibboleth and OpenLDAP
weight: 1
---
_Available as of Rancher v2.4_
This page provides background information and context for Rancher users who intend to set up the Shibboleth authentication provider in Rancher.
Because Shibboleth is a SAML provider, it does not support searching for groups. While a Shibboleth integration can validate user credentials, it can't be used to assign permissions to groups in Rancher without additional configuration.
One solution to this problem is to configure an OpenLDAP identity provider. With an OpenLDAP back end for Shibboleth, you will be able to search for groups in Rancher and assign them to resources such as clusters, projects, or namespaces from the Rancher UI.
### Terminology
- **Shibboleth** is a single sign-on log-in system for computer networks and the Internet. It allows people to sign in using just one identity to various systems. It validates user credentials, but does not, on its own, handle group memberships.
- **SAML:** Security Assertion Markup Language, an open standard for exchanging authentication and authorization data between an identity provider and a service provider.
- **OpenLDAP:** a free, open-source implementation of the Lightweight Directory Access Protocol (LDAP). It is used to manage an organizations computers and users. OpenLDAP is useful for Rancher users because it supports groups. In Rancher, it is possible to assign permissions to groups so that they can access resources such as clusters, projects, or namespaces, as long as the groups already exist in the identity provider.
- **IdP or IDP:** An identity provider. OpenLDAP is an example of an identity provider.
### Adding OpenLDAP Group Permissions to Rancher Resources
The diagram below illustrates how members of an OpenLDAP group can access resources in Rancher that the group has permissions for.
For example, a cluster owner could add an OpenLDAP group to a cluster so that they have permissions view most cluster level resources and create new projects. Then the OpenLDAP group members will have access to the cluster as soon as they log in to Rancher.
In this scenario, OpenLDAP allows the cluster owner to search for groups when assigning persmissions. Without OpenLDAP, the functionality to search for groups would not be supported.
When a member of the OpenLDAP group logs in to Rancher, she is redirected to Shibboleth and enters her username and password.
Shibboleth validates her credentials, and retrieves user attributes from OpenLDAP, including groups. Then Shibboleth sends a SAML assertion to Rancher including the user attributes. Rancher uses the group data so that she can access all of the resources and permissions that her groups have permissions for.
![Adding OpenLDAP Group Permissions to Rancher Resources]({{<baseurl>}}/img/rancher/shibboleth-with-openldap-groups.svg)
@@ -0,0 +1,64 @@
---
title: Users and Groups
weight: 1
---
Rancher relies on users and groups to determine who is allowed to log in to Rancher and which resources they can access. When you configure an external authentication provider, users from that provider will be able to log in to your Rancher server. When a user logs in, the authentication provider will supply your Rancher server with a list of groups to which the user belongs.
Access to clusters, projects, multi-cluster apps, and global DNS providers and entries can be controlled by adding either individual users or groups to these resources. When you add a group to a resource, all users who are members of that group in the authentication provider, will be able to access the resource with the permissions that you've specified for the group. For more information on roles and permissions, see [Role Based Access Control]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/).
## Managing Members
When adding a user or group to a resource, you can search for users or groups by beginning to type their name. The Rancher server will query the authentication provider to find users and groups that match what you've entered. Searching is limited to the authentication provider that you are currently logged in with. For example, if you've enabled GitHub authentication but are logged in using a [local]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/local/) user account, you will not be able to search for GitHub users or groups.
All users, whether they are local users or from an authentication provider, can be viewed and managed. From the **Global** view, click on **Users**.
{{< saml_caveats >}}
## User Information
Rancher maintains information about each user that logs in through an authentication provider. This information includes whether the user is allowed to access your Rancher server and the list of groups that the user belongs to. Rancher keeps this user information so that the CLI, API, and kubectl can accurately reflect the access that the user has based on their group membership in the authentication provider.
Whenever a user logs in to the UI using an authentication provider, Rancher automatically updates this user information.
### Automatically Refreshing User Information
_Available as of v2.2.0_
Rancher will periodically refresh the user information even before a user logs in through the UI. You can control how often Rancher performs this refresh. From the **Global** view, click on **Settings**. Two settings control this behavior:
- **`auth-user-info-max-age-seconds`**
This setting controls how old a user's information can be before Rancher refreshes it. If a user makes an API call (either directly or by using the Rancher CLI or kubectl) and the time since the user's last refresh is greater than this setting, then Rancher will trigger a refresh. This setting defaults to `3600` seconds, i.e. 1 hour.
- **`auth-user-info-resync-cron`**
This setting controls a recurring schedule for resyncing authentication provider information for all users. Regardless of whether a user has logged in or used the API recently, this will cause the user to be refreshed at the specified interval. This setting defaults to `0 0 * * *`, i.e. once a day at midnight. See the [Cron documentation](https://en.wikipedia.org/wiki/Cron) for more information on valid values for this setting.
> **Note:** Since SAML does not support user lookup, SAML-based authentication providers do not support periodically refreshing user information. User information will only be refreshed when the user logs into the Rancher UI.
### Manually Refreshing User Information
If you are not sure the last time Rancher performed an automatic refresh of user information, you can perform a manual refresh of all users.
1. From the **Global** view, click on **Users** in the navigation bar.
1. Click on **Refresh Group Memberships**.
**Results:** Rancher refreshes the user information for all users. Requesting this refresh will update which users can access Rancher as well as all the groups that each user belongs to.
>**Note:** Since SAML does not support user lookup, SAML-based authentication providers do not support the ability to manually refresh user information. User information will only be refreshed when the user logs into the Rancher UI.
## Session Length
_Available as of v2.3.0_
The default length (TTL) of each user session is adjustable. The default session length is 16 hours.
1. From the **Global** view, click on **Settings**.
1. In the **Settings** page, find **`auth-user-session-ttl-minutes`** and click **Edit.**
1. Enter the amount of time in minutes a session length should last and click **Save.**
**Result:** Users are automatically logged out of Rancher after the set number of minutes.
@@ -0,0 +1,44 @@
---
title: Configuring a Global Default Private Registry
weight: 400
aliases:
---
You might want to use a private Docker registry to share your custom base images within your organization. With a private registry, you can keep a private, consistent, and centralized source of truth for the Docker images that are used in your clusters.
There are two main ways to set up private registries in Rancher: by setting up the global default registry through the **Settings** tab in the global view, and by setting up a private registry in the advanced options in the cluster-level settings. The global default registry is intended to be used for air-gapped setups, for registries that do not require credentials. The cluster-level private registry is intended to be used in all setups in which the private registry requires credentials.
This section is about configuring the global default private registry, and focuses on how to configure the registry from the Rancher UI after Rancher is installed.
For instructions on setting up a private registry with command line options during the installation of Rancher, refer to the [air gapped Docker installation]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/air-gap-single-node) or [air gapped Kubernetes installation]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/air-gap-high-availability) instructions.
If your private registry requires credentials, it cannot be used as the default registry. There is no global way to set up a private registry with authorization for every Rancher-provisioned cluster. Therefore, if you want a Rancher-provisioned cluster to pull images from a private registry with credentials, you will have to [pass in the registry credentials through the advanced cluster options](#setting-a-private-registry-with-credentials-when-deploying-a-cluster) every time you create a new cluster.
# Setting a Private Registry with No Credentials as the Default Registry
1. Log into Rancher and configure the default administrator password.
1. Go into the **Settings** view.
{{< img "/img/rancher/airgap/settings.png" "Settings" >}}
1. Look for the setting called `system-default-registry` and choose **Edit**.
{{< img "/img/rancher/airgap/edit-system-default-registry.png" "Edit" >}}
1. Change the value to your registry (e.g. `registry.yourdomain.com:port`). Do not prefix the registry with `http://` or `https://`.
{{< img "/img/rancher/airgap/enter-system-default-registry.png" "Save" >}}
**Result:** Rancher will use your private registry to pull system images.
# Setting a Private Registry with Credentials when Deploying a Cluster
You can follow these steps to configure a private registry when you provision a cluster with Rancher:
1. When you create a cluster through the Rancher UI, go to the **Cluster Options** section and click **Show Advanced Options.**
1. In the <b>Enable Private Registries</b> section, click **Enabled.**
1. Enter the registry URL and credentials.
1. Click **Save.**
**Result:** The new cluster will be able to pull images from the private registry.
@@ -0,0 +1,46 @@
---
title: Provisioning Drivers
weight: 1140
---
Drivers in Rancher allow you to manage which providers can be used to deploy [hosted Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/) or [nodes in an infrastructure provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/) to allow Rancher to deploy and manage Kubernetes.
### Rancher Drivers
With Rancher drivers, you can enable/disable existing built-in drivers that are packaged in Rancher. Alternatively, you can add your own driver if Rancher has not yet implemented it.
There are two types of drivers within Rancher:
* [Cluster Drivers](#cluster-drivers)
* [Node Drivers](#node-drivers)
### Cluster Drivers
_Available as of v2.2.0_
Cluster drivers are used to provision [hosted Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/), such as GKE, EKS, AKS, etc.. The availability of which cluster driver to display when creating a cluster is defined based on the cluster driver's status. Only `active` cluster drivers will be displayed as an option for creating clusters for hosted Kubernetes clusters. By default, Rancher is packaged with several existing cluster drivers, but you can also create custom cluster drivers to add to Rancher.
By default, Rancher has activated several hosted Kubernetes cloud providers including:
* [Amazon EKS]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/eks/)
* [Google GKE]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/gke/)
* [Azure AKS]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/aks/)
There are several other hosted Kubernetes cloud providers that are disabled by default, but are packaged in Rancher:
* [Alibaba ACK]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/ack/)
* [Huawei CCE]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/cce/)
* [Tencent]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/tke/)
### Node Drivers
Node drivers are used to provision hosts, which Rancher uses to launch and manage Kubernetes clusters. A node driver is the same as a [Docker Machine driver](https://docs.docker.com/machine/drivers/). The availability of which node driver to display when creating node templates is defined based on the node driver's status. Only `active` node drivers will be displayed as an option for creating node templates. By default, Rancher is packaged with many existing Docker Machine drivers, but you can also create custom node drivers to add to Rancher.
If there are specific node drivers that you don't want to show to your users, you would need to de-activate these node drivers.
Rancher supports several major cloud providers, but by default, these node drivers are active and available for deployment:
* [Amazon EC2]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/ec2/)
* [Azure]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/azure/)
* [Digital Ocean]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/digital-ocean/)
* [vSphere]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/vsphere/)
@@ -0,0 +1,44 @@
---
title: Cluster Drivers
weight: 1
---
_Available as of v2.2.0_
Cluster drivers are used to create clusters in a [hosted Kubernetes provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/), such as Google GKE. The availability of which cluster driver to display when creating clusters is defined by the cluster driver's status. Only `active` cluster drivers will be displayed as an option for creating clusters. By default, Rancher is packaged with several existing cloud provider cluster drivers, but you can also add custom cluster drivers to Rancher.
If there are specific cluster drivers that you do not want to show your users, you may deactivate those cluster drivers within Rancher and they will not appear as an option for cluster creation.
### Managing Cluster Drivers
>**Prerequisites:** To create, edit, or delete cluster drivers, you need _one_ of the following permissions:
>
>- [Administrator Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/)
>- [Custom Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/#custom-global-permissions) with the [Manage Cluster Drivers]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) role assigned.
## Activating/Deactivating Cluster Drivers
By default, Rancher only activates drivers for the most popular cloud providers, Google GKE, Amazon EKS and Azure AKS. If you want to show or hide any node driver, you can change its status.
1. From the **Global** view, choose **Tools > Drivers** in the navigation bar.
2. From the **Drivers** page, select the **Cluster Drivers** tab.
3. Select the driver that you wish to **Activate** or **Deactivate** and select the appropriate icon.
## Adding Custom Cluster Drivers
If you want to use a cluster driver that Rancher doesn't support out-of-the-box, you can add the provider's driver in order to start using them to create _hosted_ kubernetes clusters.
1. From the **Global** view, choose **Tools > Drivers** in the navigation bar.
2. From the **Drivers** page select the **Cluster Drivers** tab.
3. Click **Add Cluster Driver**.
4. Complete the **Add Cluster Driver** form. Then click **Create**.
### Developing your own Cluster Driver
In order to develop cluster driver to add to Rancher, please refer to our [example](https://github.com/rancher-plugins/kontainer-engine-driver-example).
@@ -0,0 +1,40 @@
---
title: Node Drivers
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/concepts/global-configuration/node-drivers/
- /rancher/v2.0-v2.4/en/tasks/global-configuration/node-drivers/
---
Node drivers are used to provision hosts, which Rancher uses to launch and manage Kubernetes clusters. A node driver is the same as a [Docker Machine driver](https://docs.docker.com/machine/drivers/). The availability of which node driver to display when creating node templates is defined based on the node driver's status. Only `active` node drivers will be displayed as an option for creating node templates. By default, Rancher is packaged with many existing Docker Machine drivers, but you can also create custom node drivers to add to Rancher.
If there are specific node drivers that you don't want to show to your users, you would need to de-activate these node drivers.
#### Managing Node Drivers
>**Prerequisites:** To create, edit, or delete drivers, you need _one_ of the following permissions:
>
>- [Administrator Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/)
>- [Custom Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/#custom-global-permissions) with the [Manage Node Drivers]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) role assigned.
## Activating/Deactivating Node Drivers
By default, Rancher only activates drivers for the most popular cloud providers, Amazon EC2, Azure, DigitalOcean and vSphere. If you want to show or hide any node driver, you can change its status.
1. From the **Global** view, choose **Tools > Drivers** in the navigation bar. From the **Drivers** page, select the **Node Drivers** tab. In version before v2.2.0, you can select **Node Drivers** directly in the navigation bar.
2. Select the driver that you wish to **Activate** or **Deactivate** and select the appropriate icon.
## Adding Custom Node Drivers
If you want to use a node driver that Rancher doesn't support out-of-the-box, you can add that provider's driver in order to start using them to create node templates and eventually node pools for your Kubernetes cluster.
1. From the **Global** view, choose **Tools > Drivers** in the navigation bar. From the **Drivers** page, select the **Node Drivers** tab. In version before v2.2.0, you can select **Node Drivers** directly in the navigation bar.
2. Click **Add Node Driver**.
3. Complete the **Add Node Driver** form. Then click **Create**.
### Developing your own node driver
Node drivers are implemented with [Docker Machine](https://docs.docker.com/machine/).
@@ -0,0 +1,93 @@
---
title: Upgrading Kubernetes without Upgrading Rancher
weight: 1120
---
_Available as of v2.3.0_
The RKE metadata feature allows you to provision clusters with new versions of Kubernetes as soon as they are released, without upgrading Rancher. This feature is useful for taking advantage of patch versions of Kubernetes, for example, if you want to upgrade to Kubernetes v1.14.7 when your Rancher server originally supported v1.14.6.
> **Note:** The Kubernetes API can change between minor versions. Therefore, we don't support introducing minor Kubernetes versions, such as introducing v1.15 when Rancher currently supports v1.14. You would need to upgrade Rancher to add support for minor Kubernetes versions.
Rancher's Kubernetes metadata contains information specific to the Kubernetes version that Rancher uses to provision [RKE clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/). Rancher syncs the data periodically and creates custom resource definitions (CRDs) for **system images,** **service options** and **addon templates.** Consequently, when a new Kubernetes version is compatible with the Rancher server version, the Kubernetes metadata makes the new version available to Rancher for provisioning clusters. The metadata gives you an overview of the information that the [Rancher Kubernetes Engine]({{<baseurl>}}/rke/latest/en/) (RKE) uses for deploying various Kubernetes versions.
This table below describes the CRDs that are affected by the periodic data sync.
> **Note:** Only administrators can edit metadata CRDs. It is recommended not to update existing objects unless explicitly advised.
| Resource | Description | Rancher API URL |
|----------|-------------|-----------------|
| System Images | List of system images used to deploy Kubernetes through RKE. | `<RANCHER_SERVER_URL>/v3/rkek8ssystemimages` |
| Service Options | Default options passed to Kubernetes components like `kube-api`, `scheduler`, `kubelet`, `kube-proxy`, and `kube-controller-manager` | `<RANCHER_SERVER_URL>/v3/rkek8sserviceoptions` |
| Addon Templates | YAML definitions used to deploy addon components like Canal, Calico, Flannel, Weave, Kube-dns, CoreDNS, `metrics-server`, `nginx-ingress` | `<RANCHER_SERVER_URL>/v3/rkeaddons` |
Administrators might configure the RKE metadata settings to do the following:
- Refresh the Kubernetes metadata, if a new patch version of Kubernetes comes out and they want Rancher to provision clusters with the latest version of Kubernetes without having to upgrade Rancher
- Change the metadata URL that Rancher uses to sync the metadata, which is useful for air gap setups if you need to sync Rancher locally instead of with GitHub
- Prevent Rancher from auto-syncing the metadata, which is one way to prevent new and unsupported Kubernetes versions from being available in Rancher
### Refresh Kubernetes Metadata
The option to refresh the Kubernetes metadata is available for administrators by default, or for any user who has the **Manage Cluster Drivers** [global role.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/)
To force Rancher to refresh the Kubernetes metadata, a manual refresh action is available under **Tools > Drivers > Refresh Kubernetes Metadata** on the right side corner.
You can configure Rancher to only refresh metadata when desired by setting `refresh-interval-minutes` to `0` (see below) and using this button to perform the metadata refresh manually when desired.
### Configuring the Metadata Synchronization
> Only administrators can change these settings.
The RKE metadata config controls how often Rancher syncs metadata and where it downloads data from. You can configure the metadata from the settings in the Rancher UI, or through the Rancher API at the endpoint `v3/settings/rke-metadata-config`.
The way that the metadata is configured depends on the Rancher version.
{{% tabs %}}
{{% tab "Rancher v2.4+" %}}
To edit the metadata config in Rancher,
1. Go to the **Global** view and click the **Settings** tab.
1. Go to the **rke-metadata-config** section. Click the **&#8942;** and click **Edit.**
1. You can optionally fill in the following parameters:
- `refresh-interval-minutes`: This is the amount of time that Rancher waits to sync the metadata. To disable the periodic refresh, set `refresh-interval-minutes` to 0.
- `url`: This is the HTTP path that Rancher fetches data from. The path must be a direct path to a JSON file. For example, the default URL for Rancher v2.4 is `https://releases.rancher.com/kontainer-driver-metadata/release-v2.4/data.json`.
If you don't have an air gap setup, you don't need to specify the URL where Rancher gets the metadata, because the default setting is to pull from [Rancher's metadata Git repository.](https://github.com/rancher/kontainer-driver-metadata/blob/dev-v2.5/data/data.json)
However, if you have an [air gap setup,](#air-gap-setups) you will need to mirror the Kubernetes metadata repository in a location available to Rancher. Then you need to change the URL to point to the new location of the JSON file.
{{% /tab %}}
{{% tab "Rancher v2.3" %}}
To edit the metadata config in Rancher,
1. Go to the **Global** view and click the **Settings** tab.
1. Go to the **rke-metadata-config** section. Click the **&#8942;** and click **Edit.**
1. You can optionally fill in the following parameters:
- `refresh-interval-minutes`: This is the amount of time that Rancher waits to sync the metadata. To disable the periodic refresh, set `refresh-interval-minutes` to 0.
- `url`: This is the HTTP path that Rancher fetches data from.
- `branch`: This refers to the Git branch name if the URL is a Git URL.
If you don't have an air gap setup, you don't need to specify the URL or Git branch where Rancher gets the metadata, because the default setting is to pull from [Rancher's metadata Git repository.](https://github.com/rancher/kontainer-driver-metadata.git)
However, if you have an [air gap setup,](#air-gap-setups) you will need to mirror the Kubernetes metadata repository in a location available to Rancher. Then you need to change the URL and Git branch in the `rke-metadata-config` settings to point to the new location of the repository.
{{% /tab %}}
{{% /tabs %}}
### Air Gap Setups
Rancher relies on a periodic refresh of the `rke-metadata-config` to download new Kubernetes version metadata if it is supported with the current version of the Rancher server. For a table of compatible Kubernetes and Rancher versions, refer to the [service terms section.](https://rancher.com/support-maintenance-terms/all-supported-versions/rancher-v2.2.8/)
If you have an air gap setup, you might not be able to get the automatic periodic refresh of the Kubernetes metadata from Rancher's Git repository. In that case, you should disable the periodic refresh to prevent your logs from showing errors. Optionally, you can configure your metadata settings so that Rancher can sync with a local copy of the RKE metadata.
To sync Rancher with a local mirror of the RKE metadata, an administrator would configure the `rke-metadata-config` settings to point to the mirror. For details, refer to [Configuring the Metadata Synchronization.](#configuring-the-metadata-synchronization)
After new Kubernetes versions are loaded into the Rancher setup, additional steps would be required in order to use them for launching clusters. Rancher needs access to updated system images. While the metadata settings can only be changed by administrators, any user can download the Rancher system images and prepare a private Docker registry for them.
1. To download the system images for the private registry, click the Rancher server version at the bottom left corner of the Rancher UI.
1. Download the OS specific image lists for Linux or Windows.
1. Download `rancher-images.txt`.
1. Prepare the private registry using the same steps during the [air gap install]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/other-installation-methods/air-gap/populate-private-registry), but instead of using the `rancher-images.txt` from the releases page, use the one obtained from the previous steps.
**Result:** The air gap installation of Rancher can now sync the Kubernetes metadata. If you update your private registry when new versions of Kubernetes are released, you can provision clusters with the new version without having to upgrade Rancher.
@@ -0,0 +1,89 @@
---
title: Pod Security Policies
weight: 1135
aliases:
- /rancher/v2.0-v2.4/en/concepts/global-configuration/pod-security-policies/
- /rancher/v2.0-v2.4/en/tasks/global-configuration/pod-security-policies/
- /rancher/v2.0-v2.4/en/tasks/clusters/adding-a-pod-security-policy/
---
_Pod Security Policies_ (or PSPs) are objects that control security-sensitive aspects of pod specification (like root privileges).
If a pod does not meet the conditions specified in the PSP, Kubernetes will not allow it to start, and Rancher will display an error message of `Pod <NAME> is forbidden: unable to validate...`.
- [How PSPs Work](#how-psps-work)
- [Default PSPs](#default-psps)
- [Restricted](#restricted)
- [Unrestricted](#unrestricted)
- [Creating PSPs](#creating-psps)
- [Requirements](#requirements)
- [Creating PSPs in the Rancher UI](#creating-psps-in-the-rancher-ui)
- [Configuration](#configuration)
# How PSPs Work
You can assign PSPs at the cluster or project level.
PSPs work through inheritance:
- By default, PSPs assigned to a cluster are inherited by its projects, as well as any namespaces added to those projects.
- **Exception:** Namespaces that are not assigned to projects do not inherit PSPs, regardless of whether the PSP is assigned to a cluster or project. Because these namespaces have no PSPs, workload deployments to these namespaces will fail, which is the default Kubernetes behavior.
- You can override the default PSP by assigning a different PSP directly to the project.
Any workloads that are already running in a cluster or project before a PSP is assigned will not be checked if it complies with the PSP. Workloads would need to be cloned or upgraded to see if they pass the PSP.
Read more about Pod Security Policies in the [Kubernetes Documentation](https://kubernetes.io/docs/concepts/policy/pod-security-policy/).
# Default PSPs
_Available as of v2.0.7_
Rancher ships with two default Pod Security Policies (PSPs): the `restricted` and `unrestricted` policies.
### Restricted
This policy is based on the Kubernetes [example restricted policy](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/policy/restricted-psp.yaml). It significantly restricts what types of pods can be deployed to a cluster or project. This policy:
- Prevents pods from running as a privileged user and prevents escalation of privileges.
- Validates that server-required security mechanisms are in place (such as restricting what volumes can be mounted to only the core volume types and preventing root supplemental groups from being added.
### Unrestricted
This policy is equivalent to running Kubernetes with the PSP controller disabled. It has no restrictions on what pods can be deployed into a cluster or project.
# Creating PSPs
Using Rancher, you can create a Pod Security Policy using our GUI rather than creating a YAML file.
### Requirements
Rancher can only assign PSPs for clusters that are [launched using RKE.]({{< baseurl >}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/)
You must enable PSPs at the cluster level before you can assign them to a project. This can be configured by [editing the cluster.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/editing-clusters/)
It is a best practice to set PSP at the cluster level.
We recommend adding PSPs during cluster and project creation instead of adding it to an existing one.
### Creating PSPs in the Rancher UI
1. From the **Global** view, select **Security** > **Pod Security Policies** from the main menu. Then click **Add Policy**.
**Step Result:** The **Add Policy** form opens.
2. Name the policy.
3. Complete each section of the form. Refer to the [Kubernetes documentation]((https://kubernetes.io/docs/concepts/policy/pod-security-policy/)) for more information on what each policy does.
# Configuration
The Kubernetes documentation on PSPs is [here.](https://kubernetes.io/docs/concepts/policy/pod-security-policy/)
<!-- links -->
[1]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems
[2]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces
[3]: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups
@@ -0,0 +1,28 @@
---
title: Role-Based Access Control (RBAC)
weight: 1120
aliases:
- /rancher/v2.0-v2.4/en/concepts/global-configuration/users-permissions-roles/
---
Within Rancher, each person authenticates as a _user_, which is a login that grants you access to Rancher. As mentioned in [Authentication]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/), users can either be local or external.
After you configure external authentication, the users that display on the **Users** page changes.
- If you are logged in as a local user, only local users display.
- If you are logged in as an external user, both external and local users display.
## Users and Roles
Once the user logs in to Rancher, their _authorization_, or their access rights within the system, is determined by _global permissions_, and _cluster and project roles_.
- [Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/):
Define user authorization outside the scope of any particular cluster.
- [Cluster and Project Roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/):
Define user authorization inside the specific cluster or project where they are assigned the role.
Both global permissions and cluster and project roles are implemented on top of [Kubernetes RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/). Therefore, enforcement of permissions and roles is performed by Kubernetes.
@@ -0,0 +1,183 @@
---
title: Cluster and Project Roles
weight: 1127
---
Cluster and project roles define user authorization inside a cluster or project. You can manage these roles from the **Global > Security > Roles** page.
### Membership and Role Assignment
The projects and clusters accessible to non-administrative users is determined by _membership_. Membership is a list of users who have access to a specific cluster or project based on the roles they were assigned in that cluster or project. Each cluster and project includes a tab that a user with the appropriate permissions can use to manage membership.
When you create a cluster or project, Rancher automatically assigns you as the `Owner` for it. Users assigned the `Owner` role can assign other users roles in the cluster or project.
> **Note:** Non-administrative users cannot access any existing projects/clusters by default. A user with appropriate permissions (typically the owner) must explicitly assign the project and cluster membership.
### Cluster Roles
_Cluster roles_ are roles that you can assign to users, granting them access to a cluster. There are two primary cluster roles: `Owner` and `Member`.
- **Cluster Owner:**
These users have full control over the cluster and all resources in it.
- **Cluster Member:**
These users can view most cluster level resources and create new projects.
#### Custom Cluster Roles
Rancher lets you assign _custom cluster roles_ to a standard user instead of the typical `Owner` or `Member` roles. These roles can be either a built-in custom cluster role or one defined by a Rancher administrator. They are convenient for defining narrow or specialized access for a standard user within a cluster. See the table below for a list of built-in custom cluster roles.
#### Cluster Role Reference
The following table lists each built-in custom cluster role available and whether that level of access is included in the default cluster-level permissions, `Cluster Owner` and `Cluster Member`.
| Built-in Cluster Role | Owner | Member <a id="clus-roles"></a> |
| ---------------------------------- | ------------- | --------------------------------- |
| Create Projects | ✓ | ✓ |
| Manage Cluster Backups             | ✓ | |
| Manage Cluster Catalogs | ✓ | |
| Manage Cluster Members | ✓ | |
| Manage Nodes | ✓ | |
| Manage Storage | ✓ | |
| View All Projects | ✓ | |
| View Cluster Catalogs | ✓ | ✓ |
| View Cluster Members | ✓ | ✓ |
| View Nodes | ✓ | ✓ |
For details on how each cluster role can access Kubernetes resources, you can go to the **Global** view in the Rancher UI. Then click **Security > Roles** and go to the **Clusters** tab. If you click an individual role, you can refer to the **Grant Resources** table to see all of the operations and resources that are permitted by the role.
> **Note:**
>When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
### Giving a Custom Cluster Role to a Cluster Member
After an administrator [sets up a custom cluster role,]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/default-custom-roles/) cluster owners and admins can then assign those roles to cluster members.
To assign a custom role to a new cluster member, you can use the Rancher UI. To modify the permissions of an existing member, you will need to use the Rancher API view.
To assign the role to a new cluster member,
1. Go to the **Cluster** view, then go to the **Members** tab.
1. Click **Add Member.** Then in the **Cluster Permissions** section, choose the custom cluster role that should be assigned to the member.
1. Click **Create.**
**Result:** The member has the assigned role.
To assign any custom role to an existing cluster member,
1. Go to the member you want to give the role to. Click the **&#8942; > View in API.**
1. In the **roleTemplateId** field, go to the drop-down menu and choose the role you want to assign to the member. Click **Show Request** and **Send Request.**
**Result:** The member has the assigned role.
### Project Roles
_Project roles_ are roles that can be used to grant users access to a project. There are three primary project roles: `Owner`, `Member`, and `Read Only`.
- **Project Owner:**
These users have full control over the project and all resources in it.
- **Project Member:**
These users can manage project-scoped resources like namespaces and workloads, but cannot manage other project members.
- **Read Only:**
These users can view everything in the project but cannot create, update, or delete anything.
>**Caveat:**
>
>Users assigned the `Owner` or `Member` role for a project automatically inherit the `namespace creation` role. However, this role is a [Kubernetes ClusterRole](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#role-and-clusterrole), meaning its scope extends to all projects in the cluster. Therefore, users explicitly assigned the `owner` or `member` role for a project can create namespaces in other projects they're assigned to, even with only the `Read Only` role assigned.
#### Custom Project Roles
Rancher lets you assign _custom project roles_ to a standard user instead of the typical `Owner`, `Member`, or `Read Only` roles. These roles can be either a built-in custom project role or one defined by a Rancher administrator. They are convenient for defining narrow or specialized access for a standard user within a project. See the table below for a list of built-in custom project roles.
#### Project Role Reference
The following table lists each built-in custom project role available in Rancher and whether it is also granted by the `Owner`, `Member`, or `Read Only` role.
| Built-in Project Role | Owner | Member<a id="proj-roles"><a/> | Read Only |
| ---------------------------------- | ------------- | ----------------------------- | ------------- |
| Manage Project Members | ✓ | | |
| Create Namespaces | ✓ | ✓ | |
| Manage Config Maps | ✓ | ✓ | |
| Manage Ingress | ✓ | ✓ | |
| Manage Project Catalogs | ✓ | | |
| Manage Secrets | ✓ | ✓ | |
| Manage Service Accounts | ✓ | ✓ | |
| Manage Services | ✓ | ✓ | |
| Manage Volumes | ✓ | ✓ | |
| Manage Workloads | ✓ | ✓ | |
| View Secrets | ✓ | ✓ | |
| View Config Maps | ✓ | ✓ | ✓ |
| View Ingress | ✓ | ✓ | ✓ |
| View Project Members | ✓ | ✓ | ✓ |
| View Project Catalogs | ✓ | ✓ | ✓ |
| View Service Accounts | ✓ | ✓ | ✓ |
| View Services | ✓ | ✓ | ✓ |
| View Volumes | ✓ | ✓ | ✓ |
| View Workloads | ✓ | ✓ | ✓ |
> **Notes:**
>
>- Each project role listed above, including `Owner`, `Member`, and `Read Only`, is comprised of multiple rules granting access to various resources. You can view the roles and their rules on the Global > Security > Roles page.
>- When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
>- The `Manage Project Members` role allows the project owner to manage any members of the project **and** grant them any project scoped role regardless of their access to the project resources. Be cautious when assigning this role out individually.
### Defining Custom Roles
As previously mentioned, custom roles can be defined for use at the cluster or project level. The context field defines whether the role will appear on the cluster member page, project member page, or both.
When defining a custom role, you can grant access to specific resources or specify roles from which the custom role should inherit. A custom role can be made up of a combination of specific grants and inherited roles. All grants are additive. This means that defining a narrower grant for a specific resource **will not** override a broader grant defined in a role that the custom role is inheriting from.
### Default Cluster and Project Roles
By default, when a standard user creates a new cluster or project, they are automatically assigned an ownership role: either [cluster owner](#cluster-roles) or [project owner](#project-roles). However, in some organizations, these roles may overextend administrative access. In this use case, you can change the default role to something more restrictive, such as a set of individual roles or a custom role.
There are two methods for changing default cluster/project roles:
- **Assign Custom Roles**: Create a [custom role]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/default-custom-roles) for either your [cluster](#custom-cluster-roles) or [project](#custom-project-roles), and then set the custom role as default.
- **Assign Individual Roles**: Configure multiple [cluster](#cluster-role-reference)/[project](#project-role-reference) roles as default for assignment to the creating user.
For example, instead of assigning a role that inherits other roles (such as `cluster owner`), you can choose a mix of individual roles (such as `manage nodes` and `manage storage`).
>**Note:**
>
>- Although you can [lock]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/locked-roles/) a default role, the system still assigns the role to users who create a cluster/project.
>- Only users that create clusters/projects inherit their roles. Users added to the cluster/project membership afterward must be explicitly assigned their roles.
### Configuring Default Roles for Cluster and Project Creators
You can change the cluster or project role(s) that are automatically assigned to the creating user.
1. From the **Global** view, select **Security > Roles** from the main menu. Select either the **Cluster** or **Project** tab.
1. Find the custom or individual role that you want to use as default. Then edit the role by selecting **&#8942; > Edit**.
1. Enable the role as default.
{{% accordion id="cluster" label="For Clusters" %}}
1. From **Cluster Creator Default**, choose **Yes: Default role for new cluster creation**.
1. Click **Save**.
{{% /accordion %}}
{{% accordion id="project" label="For Projects" %}}
1. From **Project Creator Default**, choose **Yes: Default role for new project creation**.
1. Click **Save**.
{{% /accordion %}}
1. If you want to remove a default role, edit the permission and select **No** from the default roles option.
**Result:** The default roles are configured based on your changes. Roles assigned to cluster/project creators display a check in the **Cluster/Project Creator Default** column.
### Cluster Membership Revocation Behavior
When you revoke the cluster membership for a standard user that's explicitly assigned membership to both the cluster _and_ a project within the cluster, that standard user [loses their cluster roles](#clus-roles) but [retains their project roles](#proj-roles). In other words, although you have revoked the user's permissions to access the cluster and its nodes, the standard user can still:
- Access the projects they hold membership in.
- Exercise any [individual project roles](#project-role-reference) they are assigned.
If you want to completely revoke a user's access within a cluster, revoke both their cluster and project memberships.
@@ -0,0 +1,176 @@
---
title: Custom Roles
weight: 1128
aliases:
- /rancher/v2.0-v2.4/en/tasks/global-configuration/roles/
---
Within Rancher, _roles_ determine what actions a user can make within a cluster or project.
Note that _roles_ are different from _permissions_, which determine what clusters and projects you can access.
This section covers the following topics:
- [Prerequisites](#prerequisites)
- [Creating a custom role for a cluster or project](#creating-a-custom-role-for-a-cluster-or-project)
- [Creating a custom global role](#creating-a-custom-global-role)
- [Deleting a custom global role](#deleting-a-custom-global-role)
- [Assigning a custom global role to a group](#assigning-a-custom-global-role-to-a-group)
## Prerequisites
To complete the tasks on this page, one of the following permissions are required:
- [Administrator Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/).
- [Custom Global Permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/#custom-global-permissions) with the [Manage Roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) role assigned.
## Creating A Custom Role for a Cluster or Project
While Rancher comes out-of-the-box with a set of default user roles, you can also create default custom roles to provide users with very specific permissions within Rancher.
The steps to add custom roles differ depending on the version of Rancher.
{{% tabs %}}
{{% tab "Rancher v2.0.7+" %}}
1. From the **Global** view, select **Security > Roles** from the main menu.
1. Select a tab to determine the scope of the roles you're adding. The tabs are:
- **Cluster:** The role is valid for assignment when adding/managing members to _only_ clusters.
- **Project:** The role is valid for assignment when adding/managing members to _only_ projects.
1. Click **Add Cluster/Project Role.**
1. **Name** the role.
1. Optional: Choose the **Cluster/Project Creator Default** option to assign this role to a user when they create a new cluster or project. Using this feature, you can expand or restrict the default roles for cluster/project creators.
> Out of the box, the Cluster Creator Default and the Project Creator Default roles are `Cluster Owner` and `Project Owner` respectively.
1. Use the **Grant Resources** options to assign individual [Kubernetes API endpoints](https://kubernetes.io/docs/reference/) to the role.
> When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
You can also choose the individual cURL methods (`Create`, `Delete`, `Get`, etc.) available for use with each endpoint you assign.
1. Use the **Inherit from a Role** options to assign individual Rancher roles to your custom roles. Note: When a custom role inherits from a parent role, the parent role cannot be deleted until the child role is deleted.
1. Click **Create**.
{{% /tab %}}
{{% tab "Rancher before v2.0.7" %}}
1. From the **Global** view, select **Security > Roles** from the main menu.
1. Click **Add Role**.
1. **Name** the role.
1. Choose whether to set the role to a status of [locked]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/locked-roles/).
> **Note:** Locked roles cannot be assigned to users.
1. In the **Context** dropdown menu, choose the scope of the role assigned to the user. The contexts are:
- **All:** The user can use their assigned role regardless of context. This role is valid for assignment when adding/managing members to clusters or projects.
- **Cluster:** This role is valid for assignment when adding/managing members to _only_ clusters.
- **Project:** This role is valid for assignment when adding/managing members to _only_ projects.
1. Use the **Grant Resources** options to assign individual [Kubernetes API endpoints](https://kubernetes.io/docs/reference/) to the role.
> When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
You can also choose the individual cURL methods (`Create`, `Delete`, `Get`, etc.) available for use with each endpoint you assign.
1. Use the **Inherit from a Role** options to assign individual Rancher roles to your custom roles. Note: When a custom role inherits from a parent role, the parent role cannot be deleted until the child role is deleted.
1. Click **Create**.
{{% /tab %}}
{{% /tabs %}}
## Creating a Custom Global Role
_Available as of v2.4.0_
### Creating a Custom Global Role that Copies Rules from an Existing Role
If you have a group of individuals that need the same level of access in Rancher, it can save time to create a custom global role in which all of the rules from another role, such as the administrator role, are copied into a new role. This allows you to only configure the variations between the existing role and the new role.
The custom global role can then be assigned to a user or group so that the custom global role takes effect the first time the user or users sign into Rancher.
To create a custom global role based on an existing role,
1. Go to the **Global** view and click **Security > Roles.**
1. On the **Global** tab, go to the role that the custom global role will be based on. Click **&#8942; (…) > Clone.**
1. Enter a name for the role.
1. Optional: To assign the custom role default for new users, go to the **New User Default** section and click **Yes: Default role for new users.**
1. In the **Grant Resources** section, select the Kubernetes resource operations that will be enabled for users with the custom role.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
1. Click **Save.**
### Creating a Custom Global Role that Does Not Copy Rules from Another Role
Custom global roles don't have to be based on existing roles. To create a custom global role by choosing the specific Kubernetes resource operations that should be allowed for the role, follow these steps:
1. Go to the **Global** view and click **Security > Roles.**
1. On the **Global** tab, click **Add Global Role.**
1. Enter a name for the role.
1. Optional: To assign the custom role default for new users, go to the **New User Default** section and click **Yes: Default role for new users.**
1. In the **Grant Resources** section, select the Kubernetes resource operations that will be enabled for users with the custom role.
> The Resource text field provides a method to search for pre-defined Kubernetes API resources, or enter a custom resource name for the grant. The pre-defined or `(Custom)` resource must be selected from the dropdown, after entering a resource name into this field.
1. Click **Save.**
## Deleting a Custom Global Role
_Available as of v2.4.0_
When deleting a custom global role, all global role bindings with this custom role are deleted.
If a user is only assigned one custom global role, and the role is deleted, the user would lose access to Rancher. For the user to regain access, an administrator would need to edit the user and apply new global permissions.
Custom global roles can be deleted, but built-in roles cannot be deleted.
To delete a custom global role,
1. Go to the **Global** view and click **Security > Roles.**
2. On the **Global** tab, go to the custom global role that should be deleted and click **&#8942; (…) > Delete.**
3. Click **Delete.**
## Assigning a Custom Global Role to a Group
_Available as of v2.4.0_
If you have a group of individuals that need the same level of access in Rancher, it can save time to create a custom global role. When the role is assigned to a group, the users in the group have the appropriate level of access the first time they sign into Rancher.
When a user in the group logs in, they get the built-in Standard User global role by default. They will also get the permissions assigned to their groups.
If a user is removed from the external authentication provider group, they would lose their permissions from the custom global role that was assigned to the group. They would continue to have their individual Standard User role.
> **Prerequisites:** You can only assign a global role to a group if:
>
> * You have set up an [external authentication provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-vs-local-authentication)
> * The external authentication provider supports [user groups]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/user-groups/)
> * You have already set up at least one user group with the authentication provider
To assign a custom global role to a group, follow these steps:
1. From the **Global** view, go to **Security > Groups.**
1. Click **Assign Global Role.**
1. In the **Select Group To Add** field, choose the existing group that will be assigned the custom global role.
1. In the **Custom** section, choose any custom global role that will be assigned to the group.
1. Optional: In the **Global Permissions** or **Built-in** sections, select any additional permissions that the group should have.
1. Click **Create.**
**Result:** The custom global role will take effect when the users in the group log into Rancher.
@@ -0,0 +1,193 @@
---
title: Global Permissions
weight: 1126
---
_Permissions_ are individual access rights that you can assign when selecting a custom permission for a user.
Global Permissions define user authorization outside the scope of any particular cluster. Out-of-the-box, there are three default global permissions: `Administrator`, `Standard User` and `User-base`.
- **Administrator:** These users have full control over the entire Rancher system and all clusters within it.
- <a id="user"></a>**Standard User:** These users can create new clusters and use them. Standard users can also assign other users permissions to their clusters.
- **User-Base:** User-Base users have login-access only.
You cannot update or delete the built-in Global Permissions.
This section covers the following topics:
- [Global permission assignment](#global-permission-assignment)
- [Global permissions for new local users](#global-permissions-for-new-local-users)
- [Global permissions for users with external authentication](#global-permissions-for-users-with-external-authentication)
- [Custom global permissions](#custom-global-permissions)
- [Custom global permissions reference](#custom-global-permissions-reference)
- [Configuring default global permissions for new users](#configuring-default-global-permissions)
- [Configuring global permissions for existing individual users](#configuring-global-permissions-for-existing-individual-users)
- [Configuring global permissions for groups](#configuring-global-permissions-for-groups)
- [Refreshing group memberships](#refreshing-group-memberships)
### List of `restricted-admin` Permissions
The `restricted-admin` permissions are as follows:
- Has full admin access to all downstream clusters managed by Rancher.
- Has very limited access to the local Kubernetes cluster. Can access Rancher custom resource definitions, but has no access to any Kubernetes native types.
- Can add other users and assign them to clusters outside of the local cluster.
- Can create other restricted admins.
- Cannot grant any permissions in the local cluster they don't currently have. (This is how Kubernetes normally operates)
### Changing Global Administrators to Restricted Admins
If Rancher already has a global administrator, they should change all global administrators over to the new `restricted-admin` role.
This can be done through **Security > Users** and moving any Administrator role over to Restricted Administrator.
Signed-in users can change themselves over to the `restricted-admin` if they wish, but they should only do that as the last step, otherwise they won't have the permissions to do so.
# Global Permission Assignment
Global permissions for local users are assigned differently than users who log in to Rancher using external authentication.
### Global Permissions for New Local Users
When you create a new local user, you assign them a global permission as you complete the **Add User** form.
To see the default permissions for new users, go to the **Global** view and click **Security > Roles.** On the **Global** tab, there is a column named **New User Default.** When adding a new local user, the user receives all default global permissions that are marked as checked in this column. You can [change the default global permissions to meet your needs.](#configuring-default-global-permissions)
### Global Permissions for Users with External Authentication
When a user logs into Rancher using an external authentication provider for the first time, they are automatically assigned the **New User Default** global permissions. By default, Rancher assigns the **Standard User** permission for new users.
To see the default permissions for new users, go to the **Global** view and click **Security > Roles.** On the **Global** tab, there is a column named **New User Default.** When adding a new local user, the user receives all default global permissions that are marked as checked in this column, and you can [change them to meet your needs.](#configuring-default-global-permissions)
Permissions can be assigned to an individual user with [these steps.](#configuring-global-permissions-for-existing-individual-users)
As of Rancher v2.4.0, you can [assign a role to everyone in the group at the same time](#configuring-global-permissions-for-groups) if the external authentication provider supports groups.
# Custom Global Permissions
Using custom permissions is convenient for providing users with narrow or specialized access to Rancher.
When a user from an [external authentication source]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/) signs into Rancher for the first time, they're automatically assigned a set of global permissions (hereafter, permissions). By default, after a user logs in for the first time, they are created as a user and assigned the default `user` permission. The standard `user` permission allows users to login and create clusters.
However, in some organizations, these permissions may extend too much access. Rather than assigning users the default global permissions of `Administrator` or `Standard User`, you can assign them a more restrictive set of custom global permissions.
The default roles, Administrator and Standard User, each come with multiple global permissions built into them. The Administrator role includes all global permissions, while the default user role includes three global permissions: Create Clusters, Use Catalog Templates, and User Base, which is equivalent to the minimum permission to log in to Rancher. In other words, the custom global permissions are modularized so that if you want to change the default user role permissions, you can choose which subset of global permissions are included in the new default user role.
Administrators can enforce custom global permissions in multiple ways:
- [Changing the default permissions for new users](#configuring-default-global-permissions)
- [Configuring global permissions for individual users](#configuring-global-permissions-for-individual-users)
- [Configuring global permissions for groups](#configuring-global-permissions-for-groups)
### Custom Global Permissions Reference
The following table lists each custom global permission available and whether it is included in the default global permissions, `Administrator`, `Standard User` and `User-Base`.
| Custom Global Permission | Administrator | Standard User | User-Base |
| ---------------------------------- | ------------- | ------------- |-----------|
| Create Clusters | ✓ | ✓ | |
| Create RKE Templates | ✓ | ✓ | |
| Manage Authentication | ✓ | | |
| Manage Catalogs | ✓ | | |
| Manage Cluster Drivers | ✓ | | |
| Manage Node Drivers | ✓ | | |
| Manage PodSecurityPolicy Templates | ✓ | | |
| Manage Roles | ✓ | | |
| Manage Settings | ✓ | | |
| Manage Users | ✓ | | |
| Use Catalog Templates | ✓ | ✓ | |
| User Base\* (Basic log-in access) | ✓ | ✓ | |
> \*This role has two names:
>
> - When you go to the <b>Users</b> tab and edit a user's global role, this role is called <b>Login Access</b> in the custom global permissions list.
> - When you go to the <b>Security</b> tab and edit the roles from the roles page, this role is called <b>User Base.</b>
For details on which Kubernetes resources correspond to each global permission, you can go to the **Global** view in the Rancher UI. Then click **Security > Roles** and go to the **Global** tab. If you click an individual role, you can refer to the **Grant Resources** table to see all of the operations and resources that are permitted by the role.
> **Notes:**
>
> - Each permission listed above is comprised of multiple individual permissions not listed in the Rancher UI. For a full list of these permissions and the rules they are comprised of, access through the API at `/v3/globalRoles`.
> - When viewing the resources associated with default roles created by Rancher, if there are multiple Kubernetes API resources on one line item, the resource will have `(Custom)` appended to it. These are not custom resources but just an indication that there are multiple Kubernetes API resources as one resource.
### Configuring Default Global Permissions
If you want to restrict the default permissions for new users, you can remove the `user` permission as default role and then assign multiple individual permissions as default instead. Conversely, you can also add administrative permissions on top of a set of other standard permissions.
> **Note:** Default roles are only assigned to users added from an external authentication provider. For local users, you must explicitly assign global permissions when adding a user to Rancher. You can customize these global permissions when adding the user.
To change the default global permissions that are assigned to external users upon their first log in, follow these steps:
1. From the **Global** view, select **Security > Roles** from the main menu. Make sure the **Global** tab is selected.
1. Find the permissions set that you want to add or remove as a default. Then edit the permission by selecting **&#8942; > Edit**.
1. If you want to add the permission as a default, Select **Yes: Default role for new users** and then click **Save**.
1. If you want to remove a default permission, edit the permission and select **No** from **New User Default**.
**Result:** The default global permissions are configured based on your changes. Permissions assigned to new users display a check in the **New User Default** column.
### Configuring Global Permissions for Individual Users
To configure permission for a user,
1. Go to the **Users** tab.
1. On this page, go to the user whose access level you want to change and click **&#8942; > Edit.**
1. In the **Global Permissions** section, click **Custom.**
1. Check the boxes for each subset of permissions you want the user to have access to.
1. Click **Save.**
> **Result:** The user's global permissions have been updated.
### Configuring Global Permissions for Groups
_Available as of v2.4.0_
If you have a group of individuals that need the same level of access in Rancher, it can save time to assign permissions to the entire group at once, so that the users in the group have the appropriate level of access the first time they sign into Rancher.
After you assign a custom global role to a group, the custom global role will be assigned to a user in the group when they log in to Rancher.
For existing users, the new permissions will take effect when the users log out of Rancher and back in again, or when an administrator [refreshes the group memberships.](#refreshing-group-memberships)
For new users, the new permissions take effect when the users log in to Rancher for the first time. New users from this group will receive the permissions from the custom global role in addition to the **New User Default** global permissions. By default, the **New User Default** permissions are equivalent to the **Standard User** global role, but the default permissions can be [configured.](#configuring-default-global-permissions)
If a user is removed from the external authentication provider group, they would lose their permissions from the custom global role that was assigned to the group. They would continue to have any remaining roles that were assigned to them, which would typically include the roles marked as **New User Default.** Rancher will remove the permissions that are associated with the group when the user logs out, or when an administrator [refreshes group memberships,](#refreshing-group-memberships) whichever comes first.
> **Prerequisites:** You can only assign a global role to a group if:
>
> * You have set up an [external authentication provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-vs-local-authentication)
> * The external authentication provider supports [user groups]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/user-groups/)
> * You have already set up at least one user group with the authentication provider
To assign a custom global role to a group, follow these steps:
1. From the **Global** view, go to **Security > Groups.**
1. Click **Assign Global Role.**
1. In the **Select Group To Add** field, choose the existing group that will be assigned the custom global role.
1. In the **Global Permissions,** **Custom,** and/or **Built-in** sections, select the permissions that the group should have.
1. Click **Create.**
**Result:** The custom global role will take effect when the users in the group log into Rancher.
### Refreshing Group Memberships
When an administrator updates the global permissions for a group, the changes take effect for individual group members after they log out of Rancher and log in again.
To make the changes take effect immediately, an administrator or cluster owner can refresh group memberships.
An administrator might also want to refresh group memberships if a user is removed from a group in the external authentication service. In that case, the refresh makes Rancher aware that the user was removed from the group.
To refresh group memberships,
1. From the **Global** view, click **Security > Users.**
1. Click **Refresh Group Memberships.**
**Result:** Any changes to the group members' permissions will take effect.
@@ -0,0 +1,37 @@
---
title: Locked Roles
weight: 1129
---
You can set roles to a status of `locked`. Locking roles prevent them from being assigned users in the future.
Locked roles:
- Cannot be assigned to users that don't already have it assigned.
- Are not listed in the **Member Roles** drop-down when you are adding a user to a cluster or project.
- Do not affect users assigned the role before you lock the role. These users retain access that the role provides.
**Example:** let's say your organization creates an internal policy that users assigned to a cluster are prohibited from creating new projects. It's your job to enforce this policy.
To enforce it, before you add new users to the cluster, you should lock the following roles: `Cluster Owner`, `Cluster Member`, and `Create Projects`. Then you could create a new custom role that includes the same permissions as a __Cluster Member__, except the ability to create projects. Then, you use this new custom role when adding users to a cluster.
Roles can be locked by the following users:
- Any user assigned the `Administrator` global permission.
- Any user assigned the `Custom Users` permission, along with the `Manage Roles` role.
## Locking/Unlocking Roles
If you want to prevent a role from being assigned to users, you can set it to a status of `locked`.
You can lock roles in two contexts:
- When you're [adding a custom role]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/default-custom-roles/).
- When you editing an existing role (see below).
1. From the **Global** view, select **Security** > **Roles**.
2. From the role that you want to lock (or unlock), select **&#8942;** > **Edit**.
3. From the **Locked** option, choose the **Yes** or **No** radio button. Then click **Save**.
@@ -0,0 +1,127 @@
---
title: RKE Templates
weight: 7010
---
_Available as of Rancher v2.3.0_
RKE templates are designed to allow DevOps and security teams to standardize and simplify the creation of Kubernetes clusters.
RKE is the [Rancher Kubernetes Engine,]({{<baseurl>}}/rke/latest/en/) which is the tool that Rancher uses to provision Kubernetes clusters.
With Kubernetes increasing in popularity, there is a trend toward managing a larger number of smaller clusters. When you want to create many clusters, its more important to manage them consistently. Multi-cluster management comes with challenges to enforcing security and add-on configurations that need to be standardized before turning clusters over to end users.
RKE templates help standardize these configurations. Regardless of whether clusters are created with the Rancher UI, the Rancher API, or an automated process, Rancher will guarantee that every cluster it provisions from an RKE template is uniform and consistent in the way it is produced.
Admins control which cluster options can be changed by end users. RKE templates can also be shared with specific users and groups, so that admins can create different RKE templates for different sets of users.
If a cluster was created with an RKE template, you can't change it to a different RKE template. You can only update the cluster to a new revision of the same template.
As of Rancher v2.3.3, you can [save the configuration of an existing cluster as an RKE template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#converting-an-existing-cluster-to-use-an-rke-template) Then the cluster's settings can only be changed if the template is updated. The new template can also be used to launch new clusters.
The core features of RKE templates allow DevOps and security teams to:
- Standardize cluster configuration and ensure that Rancher-provisioned clusters are created following best practices
- Prevent less technical users from making uninformed choices when provisioning clusters
- Share different templates with different sets of users and groups
- Delegate ownership of templates to users who are trusted to make changes to them
- Control which users can create templates
- Require users to create clusters from a template
# Configurable Settings
RKE templates can be created in the Rancher UI or defined in YAML format. They can define all the same parameters that can be specified when you use Rancher to provision custom nodes or nodes from an infrastructure provider:
- Cloud provider options
- Pod security options
- Network providers
- Ingress controllers
- Network security configuration
- Network plugins
- Private registry URL and credentials
- Add-ons
- Kubernetes options, including configurations for Kubernetes components such as kube-api, kube-controller, kubelet, and services
The [add-on section](#add-ons) of an RKE template is especially powerful because it allows a wide range of customization options.
# Scope of RKE Templates
RKE templates are supported for Rancher-provisioned clusters. The templates can be used to provision custom clusters or clusters that are launched by an infrastructure provider.
RKE templates are for defining Kubernetes and Rancher settings. Node templates are responsible for configuring nodes. For tips on how to use RKE templates in conjunction with hardware, refer to [RKE Templates and Hardware]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/rke-templates-and-hardware).
RKE templates can be created from scratch to pre-define cluster configuration. They can be applied to launch new clusters, or templates can also be exported from existing running clusters.
As of v2.3.3, the settings of an existing cluster can be [saved as an RKE template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#converting-an-existing-cluster-to-use-an-rke-template) This creates a new template and binds the cluster settings to the template, so that the cluster can only be upgraded if the [template is updated]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#updating-a-template), and the cluster is upgraded to [use a newer version of the template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#upgrading-a-cluster-to-use-a-new-template-revision) The new template can also be used to create new clusters.
# Example Scenarios
When an organization has both basic and advanced Rancher users, administrators might want to give the advanced users more options for cluster creation, while restricting the options for basic users.
These [example scenarios]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-scenarios) describe how an organization could use templates to standardize cluster creation.
Some of the example scenarios include the following:
- **Enforcing templates:** Administrators might want to [enforce one or more template settings for everyone]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-scenarios/#enforcing-a-template-setting-for-everyone) if they want all new Rancher-provisioned clusters to have those settings.
- **Sharing different templates with different users:** Administrators might give [different templates to basic and advanced users,]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-scenarios/#templates-for-basic-and-advanced-users) so that basic users can have more restricted options and advanced users can use more discretion when creating clusters.
- **Updating template settings:** If an organization's security and DevOps teams decide to embed best practices into the required settings for new clusters, those best practices could change over time. If the best practices change, [a template can be updated to a new revision]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-scenarios/#updating-templates-and-clusters-created-with-them) and clusters created from the template can [upgrade to the new version]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#upgrading-a-cluster-to-use-a-new-template-revision) of the template.
- **Sharing ownership of a template:** When a template owner no longer wants to maintain a template, or wants to share ownership of the template, this scenario describes how [template ownership can be shared.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-scenarios/#allowing-other-users-to-control-and-share-a-template)
# Template Management
When you create an RKE template, it is available in the Rancher UI from the **Global** view under **Tools > RKE Templates.** When you create a template, you become the template owner, which gives you permission to revise and share the template. You can share the RKE templates with specific users or groups, and you can also make it public.
Administrators can turn on template enforcement to require users to always use RKE templates when creating a cluster. This allows administrators to guarantee that Rancher always provisions clusters with specific settings.
RKE template updates are handled through a revision system. If you want to change or update a template, you create a new revision of the template. Then a cluster that was created with the older version of the template can be upgraded to the new template revision.
In an RKE template, settings can be restricted to what the template owner chooses, or they can be open for the end user to select the value. The difference is indicated by the **Allow User Override** toggle over each setting in the Rancher UI when the template is created.
For the settings that cannot be overridden, the end user will not be able to directly edit them. In order for a user to get different options of these settings, an RKE template owner would need to create a new revision of the RKE template, which would allow the user to upgrade and change that option.
The documents in this section explain the details of RKE template management:
- [Getting permission to create templates]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creator-permissions/)
- [Creating and revising templates]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/)
- [Enforcing template settings](./enforcement/#requiring-new-clusters-to-use-an-rke-template)
- [Overriding template settings]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/overrides/)
- [Sharing templates with cluster creators]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing/#sharing-templates-with-specific-users-or-groups)
- [Sharing ownership of a template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing/#sharing-ownership-of-templates)
An [example YAML configuration file for a template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-yaml) is provided for reference.
# Applying Templates
You can [create a cluster from a template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#creating-a-cluster-from-an-rke-template) that you created, or from a template that has been [shared with you.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing)
If the RKE template owner creates a new revision of the template, you can [upgrade your cluster to that revision.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#updating-a-cluster-created-with-an-rke-template)
RKE templates can be created from scratch to pre-define cluster configuration. They can be applied to launch new clusters, or templates can also be exported from existing running clusters.
As of Rancher v2.3.3, you can [save the configuration of an existing cluster as an RKE template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#converting-an-existing-cluster-to-use-an-rke-template) Then the cluster's settings can only be changed if the template is updated.
# Standardizing Hardware
RKE templates are designed to standardize Kubernetes and Rancher settings. If you want to standardize your infrastructure as well, you use RKE templates [in conjunction with other tools]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/rke-templates-and-hardware).
# YAML Customization
If you define an RKE template as a YAML file, you can modify this [example RKE template YAML]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/example-yaml). The YAML in the RKE template uses the same customization that Rancher uses when creating an RKE cluster, but since the YAML is located within the context of a Rancher provisioned cluster, you will need to nest the RKE template customization under the `rancher_kubernetes_engine_config` directive in the YAML.
The RKE documentation also has [annotated]({{<baseurl>}}/rke/latest/en/example-yamls/) `cluster.yml` files that you can use for reference.
For guidance on available options, refer to the RKE documentation on [cluster configuration.]({{<baseurl>}}/rke/latest/en/config-options/)
### Add-ons
The add-on section of the RKE template configuration file works the same way as the [add-on section of a cluster configuration file]({{<baseurl>}}/rke/latest/en/config-options/add-ons/).
The user-defined add-ons directive allows you to either call out and pull down Kubernetes manifests or put them inline directly. If you include these manifests as part of your RKE template, Rancher will provision those in the cluster.
Some things you could do with add-ons include:
- Install applications on the Kubernetes cluster after it starts
- Install plugins on nodes that are deployed with a Kubernetes daemonset
- Automatically set up namespaces, service accounts, or role binding
The RKE template configuration must be nested within the `rancher_kubernetes_engine_config` directive. To set add-ons, when creating the template, you will click **Edit as YAML.** Then use the `addons` directive to add a manifest, or the `addons_include` directive to set which YAML files are used for the add-ons. For more information on custom add-ons, refer to the [user-defined add-ons documentation.]({{<baseurl>}}/rke/latest/en/config-options/add-ons/user-defined-add-ons/)
@@ -0,0 +1,63 @@
---
title: Applying Templates
weight: 50
---
You can create a cluster from an RKE template that you created, or from a template that has been [shared with you.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing)
RKE templates can be applied to new clusters.
As of Rancher v2.3.3, you can [save the configuration of an existing cluster as an RKE template.](#converting-an-existing-cluster-to-use-an-rke-template) Then the cluster's settings can only be changed if the template is updated.
You can't change a cluster to use a different RKE template. You can only update the cluster to a new revision of the same template.
This section covers the following topics:
- [Creating a cluster from an RKE template](#creating-a-cluster-from-an-rke-template)
- [Updating a cluster created with an RKE template](#updating-a-cluster-created-with-an-rke-template)
- [Converting an existing cluster to use an RKE template](#converting-an-existing-cluster-to-use-an-rke-template)
### Creating a Cluster from an RKE Template
To add a cluster [hosted by an infrastructure provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters) using an RKE template, use these steps:
1. From the **Global** view, go to the **Clusters** tab.
1. Click **Add Cluster** and choose the infrastructure provider.
1. Provide the cluster name and node template details as usual.
1. To use an RKE template, under the **Cluster Options**, check the box for **Use an existing RKE template and revision.**
1. Choose an existing template and revision from the dropdown menu.
1. Optional: You can edit any settings that the RKE template owner marked as **Allow User Override** when the template was created. If there are settings that you want to change, but don't have the option to, you will need to contact the template owner to get a new revision of the template. Then you will need to edit the cluster to upgrade it to the new revision.
1. Click **Save** to launch the cluster.
### Updating a Cluster Created with an RKE Template
When the template owner creates a template, each setting has a switch in the Rancher UI that indicates if users can override the setting.
- If the setting allows a user override, you can update these settings in the cluster by [editing the cluster.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/editing-clusters/)
- If the switch is turned off, you cannot change these settings unless the cluster owner creates a template revision that lets you override them. If there are settings that you want to change, but don't have the option to, you will need to contact the template owner to get a new revision of the template.
If a cluster was created from an RKE template, you can edit the cluster to update the cluster to a new revision of the template.
As of Rancher v2.3.3, an existing cluster's settings can be [saved as an RKE template.](#converting-an-existing-cluster-to-use-an-rke-template) In that situation, you can also edit the cluster to update the cluster to a new revision of the template.
> **Note:** You can't change the cluster to use a different RKE template. You can only update the cluster to a new revision of the same template.
### Converting an Existing Cluster to Use an RKE Template
_Available as of v2.3.3_
This section describes how to create an RKE template from an existing cluster.
RKE templates cannot be applied to existing clusters, except if you save an existing cluster's settings as an RKE template. This exports the cluster's settings as a new RKE template, and also binds the cluster to that template. The result is that the cluster can only be changed if the [template is updated,]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#updating-a-template) and the cluster is upgraded to [use a newer version of the template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#upgrading-a-cluster-to-use-a-new-template-revision)
To convert an existing cluster to use an RKE template,
1. From the **Global** view in Rancher, click the **Clusters** tab.
1. Go to the cluster that will be converted to use an RKE template. Click **&#8942;** > **Save as RKE Template.**
1. Enter a name for the template in the form that appears, and click **Create.**
**Results:**
- A new RKE template is created.
- The cluster is converted to use the new template.
- New clusters can be [created from the new template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#creating-a-cluster-from-an-rke-template)
@@ -0,0 +1,162 @@
---
title: Creating and Revising Templates
weight: 32
---
This section describes how to manage RKE templates and revisions. You an create, share, update, and delete templates from the **Global** view under **Tools > RKE Templates.**
Template updates are handled through a revision system. When template owners want to change or update a template, they create a new revision of the template. Individual revisions cannot be edited. However, if you want to prevent a revision from being used to create a new cluster, you can disable it.
Template revisions can be used in two ways: to create a new cluster, or to upgrade a cluster that was created with an earlier version of the template. The template creator can choose a default revision, but when end users create a cluster, they can choose any template and any template revision that is available to them. After the cluster is created from a specific revision, it cannot change to another template, but the cluster can be upgraded to a newer available revision of the same template.
The template owner has full control over template revisions, and can create new revisions to update the template, delete or disable revisions that should not be used to create clusters, and choose which template revision is the default.
This section covers the following topics:
- [Prerequisites](#prerequisites)
- [Creating a template](#creating-a-template)
- [Updating a template](#updating-a-template)
- [Deleting a template](#deleting-a-template)
- [Creating a revision based on the default revision](#creating-a-revision-based-on-the-default-revision)
- [Creating a revision based on a cloned revision](#creating-a-revision-based-on-a-cloned-revision)
- [Disabling a template revision](#disabling-a-template-revision)
- [Re-enabling a disabled template revision](#re-enabling-a-disabled-template-revision)
- [Setting a template revision as default](#setting-a-template-revision-as-default)
- [Deleting a template revision](#deleting-a-template-revision)
- [Upgrading a cluster to use a new template revision](#upgrading-a-cluster-to-use-a-new-template-revision)
- [Exporting a running cluster to a new RKE template and revision](#exporting-a-running-cluster-to-a-new-rke-template-and-revision)
### Prerequisites
You can create RKE templates if you have the **Create RKE Templates** permission, which can be [given by an administrator.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creator-permissions)
You can revise, share, and delete a template if you are an owner of the template. For details on how to become an owner of a template, refer to [the documentation on sharing template ownership.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing/#sharing-ownership-of-templates)
### Creating a Template
1. From the **Global** view, click **Tools > RKE Templates.**
1. Click **Add Template.**
1. Provide a name for the template. An auto-generated name is already provided for the template' first version, which is created along with this template.
1. Optional: Share the template with other users or groups by [adding them as members.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing/#sharing-templates-with-specific-users-or-groups) You can also make the template public to share with everyone in the Rancher setup.
1. Then follow the form on screen to save the cluster configuration parameters as part of the template's revision. The revision can be marked as default for this template.
**Result:** An RKE template with one revision is configured. You can use this RKE template revision later when you [provision a Rancher-launched cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters). After a cluster is managed by an RKE template, it cannot be disconnected and the option to uncheck **Use an existing RKE Template and Revision** will be unavailable.
### Updating a Template
When you update an RKE template, you are creating a revision of the existing template. Clusters that were created with an older version of the template can be updated to match the new revision.
You can't edit individual revisions. Since you can't edit individual revisions of a template, in order to prevent a revision from being used, you can [disable it.](#disabling-a-template-revision)
When new template revisions are created, clusters using an older revision of the template are unaffected.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template that you want to edit and click the **&#8942; > Edit.**
1. Edit the required information and click **Save.**
1. Optional: You can change the default revision of this template and also change who it is shared with.
**Result:** The template is updated. To apply it to a cluster using an older version of the template, refer to the section on [upgrading a cluster to use a new revision of a template.](#upgrading-a-cluster-to-use-a-new-template-revision)
### Deleting a Template
When you no longer use an RKE template for any of your clusters, you can delete it.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the RKE template that you want to delete and click the **&#8942; > Delete.**
1. Confirm the deletion when prompted.
**Result:** The template is deleted.
### Creating a Revision Based on the Default Revision
You can clone the default template revision and quickly update its settings rather than creating a new revision from scratch. Cloning templates saves you the hassle of re-entering the access keys and other parameters needed for cluster creation.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the RKE template that you want to clone and click the **&#8942; > New Revision From Default.**
1. Complete the rest of the form to create a new revision.
**Result:** The RKE template revision is cloned and configured.
### Creating a Revision Based on a Cloned Revision
When creating new RKE template revisions from your user settings, you can clone an existing revision and quickly update its settings rather than creating a new one from scratch. Cloning template revisions saves you the hassle of re-entering the cluster parameters.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template revision you want to clone. Then select **&#8942; > Clone Revision.**
1. Complete the rest of the form.
**Result:** The RKE template revision is cloned and configured. You can use the RKE template revision later when you provision a cluster. Any existing cluster using this RKE template can be upgraded to this new revision.
### Disabling a Template Revision
When you no longer want an RKE template revision to be used for creating new clusters, you can disable it. A disabled revision can be re-enabled.
You can disable the revision if it is not being used by any cluster.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template revision you want to disable. Then select **&#8942; > Disable.**
**Result:** The RKE template revision cannot be used to create a new cluster.
### Re-enabling a Disabled Template Revision
If you decide that a disabled RKE template revision should be used to create new clusters, you can re-enable it.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template revision you want to re-enable. Then select **&#8942; > Enable.**
**Result:** The RKE template revision can be used to create a new cluster.
### Setting a Template Revision as Default
When end users create a cluster using an RKE template, they can choose which revision to create the cluster with. You can configure which revision is used by default.
To set an RKE template revision as default,
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the RKE template revision that should be default and click the **&#8942; > Set as Default.**
**Result:** The RKE template revision will be used as the default option when clusters are created with the template.
### Deleting a Template Revision
You can delete all revisions of a template except for the default revision.
To permanently delete a revision,
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the RKE template revision that should be deleted and click the **&#8942; > Delete.**
**Result:** The RKE template revision is deleted.
### Upgrading a Cluster to Use a New Template Revision
> This section assumes that you already have a cluster that [has an RKE template applied.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates)
> This section also assumes that you have [updated the template that the cluster is using](#updating-a-template) so that a new template revision is available.
To upgrade a cluster to use a new template revision,
1. From the **Global** view in Rancher, click the **Clusters** tab.
1. Go to the cluster that you want to upgrade and click **&#8942; > Edit.**
1. In the **Cluster Options** section, click the dropdown menu for the template revision, then select the new template revision.
1. Click **Save.**
**Result:** The cluster is upgraded to use the settings defined in the new template revision.
### Exporting a Running Cluster to a New RKE Template and Revision
You can save an existing cluster's settings as an RKE template.
This exports the cluster's settings as a new RKE template, and also binds the cluster to that template. The result is that the cluster can only be changed if the [template is updated,]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#updating-a-template) and the cluster is upgraded to [use a newer version of the template.]
To convert an existing cluster to use an RKE template,
1. From the **Global** view in Rancher, click the **Clusters** tab.
1. Go to the cluster that will be converted to use an RKE template. Click **&#8942;** > **Save as RKE Template.**
1. Enter a name for the template in the form that appears, and click **Create.**
**Results:**
- A new RKE template is created.
- The cluster is converted to use the new template.
- New clusters can be [created from the new template and revision.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#creating-a-cluster-from-an-rke-template)
@@ -0,0 +1,50 @@
---
title: Template Creator Permissions
weight: 10
---
Administrators have the permission to create RKE templates, and only administrators can give that permission to other users.
For more information on administrator permissions, refer to the [documentation on global permissions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/).
# Giving Users Permission to Create Templates
Templates can only be created by users who have the global permission **Create RKE Templates.**
Administrators have the global permission to create templates, and only administrators can give that permission to other users.
For information on allowing users to modify existing templates, refer to [Sharing Templates.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing)
Administrators can give users permission to create RKE templates in two ways:
- By editing the permissions of an [individual user](#allowing-a-user-to-create-templates)
- By changing the [default permissions of new users](#allowing-new-users-to-create-templates-by-default)
### Allowing a User to Create Templates
An administrator can individually grant the role **Create RKE Templates** to any existing user by following these steps:
1. From the global view, click the **Users** tab. Choose the user you want to edit and click the **&#8942; > Edit.**
1. In the **Global Permissions** section, choose **Custom** and select the **Create RKE Templates** role along with any other roles the user should have. Click **Save.**
**Result:** The user has permission to create RKE templates.
### Allowing New Users to Create Templates by Default
Alternatively, the administrator can give all new users the default permission to create RKE templates by following the following steps. This will not affect the permissions of existing users.
1. From the **Global** view, click **Security > Roles.**
1. Under the **Global** roles tab, go to the role **Create RKE Templates** and click the **&#8942; > Edit**.
1. Select the option **Yes: Default role for new users** and click **Save.**
**Result:** Any new user created in this Rancher installation will be able to create RKE templates. Existing users will not get this permission.
### Revoking Permission to Create Templates
Administrators can remove a user's permission to create templates with the following steps:
1. From the global view, click the **Users** tab. Choose the user you want to edit and click the **&#8942; > Edit.**
1. In the **Global Permissions** section, un-check the box for **Create RKE Templates**. In this section, you can change the user back to a standard user, or give the user a different set of custom permissions.
1. Click **Save.**
**Result:** The user cannot create RKE templates.
@@ -0,0 +1,38 @@
---
title: Template Enforcement
weight: 32
---
This section describes how template administrators can enforce templates in Rancher, restricting the ability of users to create clusters without a template.
By default, any standard user in Rancher can create clusters. But when RKE template enforcement is turned on,
- Only an administrator has the ability to create clusters without a template.
- All standard users must use an RKE template to create a new cluster.
- Standard users cannot create a cluster without using a template.
Users can only create new templates if the administrator [gives them permission.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creator-permissions/#allowing-a-user-to-create-templates)
After a cluster is created with an RKE template, the cluster creator cannot edit settings that are defined in the template. The only way to change those settings after the cluster is created is to [upgrade the cluster to a new revision]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/applying-templates/#updating-a-cluster-created-with-an-rke-template) of the same template. If cluster creators want to change template-defined settings, they would need to contact the template owner to get a new revision of the template. For details on how template revisions work, refer to the [documentation on revising templates.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#updating-a-template)
# Requiring New Clusters to Use an RKE Template
You might want to require new clusters to use a template to ensure that any cluster launched by a [standard user]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) will use the Kubernetes and/or Rancher settings that are vetted by administrators.
To require new clusters to use an RKE template, administrators can turn on RKE template enforcement with the following steps:
1. From the **Global** view, click the **Settings** tab.
1. Go to the `cluster-template-enforcement` setting. Click the vertical **&#8942;** and click **Edit.**
1. Set the value to **True** and click **Save.**
**Result:** All clusters provisioned by Rancher must use a template, unless the creator is an administrator.
# Disabling RKE Template Enforcement
To allow new clusters to be created without an RKE template, administrators can turn off RKE template enforcement with the following steps:
1. From the **Global** view, click the **Settings** tab.
1. Go to the `cluster-template-enforcement` setting. Click the vertical **&#8942;** and click **Edit.**
1. Set the value to **False** and click **Save.**
**Result:** When clusters are provisioned by Rancher, they don't need to use a template.
@@ -0,0 +1,71 @@
---
title: Example Scenarios
weight: 5
---
These example scenarios describe how an organization could use templates to standardize cluster creation.
- **Enforcing templates:** Administrators might want to [enforce one or more template settings for everyone](#enforcing-a-template-setting-for-everyone) if they want all new Rancher-provisioned clusters to have those settings.
- **Sharing different templates with different users:** Administrators might give [different templates to basic and advanced users,](#templates-for-basic-and-advanced-users) so that basic users have more restricted options and advanced users have more discretion when creating clusters.
- **Updating template settings:** If an organization's security and DevOps teams decide to embed best practices into the required settings for new clusters, those best practices could change over time. If the best practices change, [a template can be updated to a new revision](#updating-templates-and-clusters-created-with-them) and clusters created from the template can upgrade to the new version of the template.
- **Sharing ownership of a template:** When a template owner no longer wants to maintain a template, or wants to delegate ownership of the template, this scenario describes how [template ownership can be shared.](#allowing-other-users-to-control-and-share-a-template)
# Enforcing a Template Setting for Everyone
Let's say there is an organization in which the administrators decide that all new clusters should be created with Kubernetes version 1.14.
1. First, an administrator creates a template which specifies the Kubernetes version as 1.14 and marks all other settings as **Allow User Override**.
1. The administrator makes the template public.
1. The administrator turns on template enforcement.
**Results:**
- All Rancher users in the organization have access to the template.
- All new clusters created by [standard users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) with this template will use Kubernetes 1.14 and they are unable to use a different Kubernetes version. By default, standard users don't have permission to create templates, so this template will be the only template they can use unless more templates are shared with them.
- All standard users must use a cluster template to create a new cluster. They cannot create a cluster without using a template.
In this way, the administrators enforce the Kubernetes version across the organization, while still allowing end users to configure everything else.
# Templates for Basic and Advanced Users
Let's say an organization has both basic and advanced users. Administrators want the basic users to be required to use a template, while the advanced users and administrators create their clusters however they want.
1. First, an administrator turns on [RKE template enforcement.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/enforcement/#requiring-new-clusters-to-use-an-rke-template) This means that every [standard user]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) in Rancher will need to use an RKE template when they create a cluster.
1. The administrator then creates two templates:
- One template for basic users, with almost every option specified except for access keys
- One template for advanced users, which has most or all options has **Allow User Override** turned on
1. The administrator shares the advanced template with only the advanced users.
1. The administrator makes the template for basic users public, so the more restrictive template is an option for everyone who creates a Rancher-provisioned cluster.
**Result:** All Rancher users, except for administrators, are required to use a template when creating a cluster. Everyone has access to the restrictive template, but only advanced users have permission to use the more permissive template. The basic users are more restricted, while advanced users have more freedom when configuring their Kubernetes clusters.
# Updating Templates and Clusters Created with Them
Let's say an organization has a template that requires clusters to use Kubernetes v1.14. However, as time goes on, the administrators change their minds. They decide they want users to be able to upgrade their clusters to use newer versions of Kubernetes.
In this organization, many clusters were created with a template that requires Kubernetes v1.14. Because the template does not allow that setting to be overridden, the users who created the cluster cannot directly edit that setting.
The template owner has several options for allowing the cluster creators to upgrade Kubernetes on their clusters:
- **Specify Kubernetes v1.15 on the template:** The template owner can create a new template revision that specifies Kubernetes v1.15. Then the owner of each cluster that uses that template can upgrade their cluster to a new revision of the template. This template upgrade allows the cluster creator to upgrade Kubernetes to v1.15 on their cluster.
- **Allow any Kubernetes version on the template:** When creating a template revision, the template owner can also mark the the Kubernetes version as **Allow User Override** using the switch near that setting on the Rancher UI. This will allow clusters that upgrade to this template revision to use any version of Kubernetes.
- **Allow the latest minor Kubernetes version on the template:** The template owner can also create a template revision in which the Kubernetes version is defined as **Latest v1.14 (Allows patch version upgrades).** This means clusters that use that revision will be able to get patch version upgrades, but major version upgrades will not be allowed.
# Allowing Other Users to Control and Share a Template
Let's say Alice is a Rancher administrator. She owns an RKE template that reflects her organization's agreed-upon best practices for creating a cluster.
Bob is an advanced user who can make informed decisions about cluster configuration. Alice trusts Bob to create new revisions of her template as the best practices get updated over time. Therefore, she decides to make Bob an owner of the template.
To share ownership of the template with Bob, Alice [adds Bob as an owner of her template.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing/#sharing-ownership-of-templates)
The result is that as a template owner, Bob is in charge of version control for that template. Bob can now do all of the following:
- [Revise the template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#updating-a-template) when the best practices change
- [Disable outdated revisions]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#disabling-a-template-revision) of the template so that no new clusters can be created with it
- [Delete the whole template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#deleting-a-template) if the organization wants to go in a different direction
- [Set a certain revision as default]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising/#setting-a-template-revision-as-default) when users create a cluster with it. End users of the template will still be able to choose which revision they want to create the cluster with.
- [Share the template]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/template-access-and-sharing) with specific users, make the template available to all Rancher users, or share ownership of the template with another user.
@@ -0,0 +1,112 @@
---
title: Example YAML
weight: 60
---
Below is an example RKE template configuration file for reference.
The YAML in the RKE template uses the same customization that is used when you create an RKE cluster. However, since the YAML is within the context of a Rancher provisioned RKE cluster, the customization from the RKE docs needs to be nested under the `rancher_kubernetes_engine` directive.
```yaml
#
# Cluster Config
#
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
# This setting is not enforced. Clusters
# created with this sample template
# would have alerting turned off by default,
# but end users could still turn alerting
# on or off.
enable_cluster_monitoring: true
# This setting is not enforced. Clusters
# created with this sample template
# would have monitoring turned on
# by default, but end users could still
# turn monitoring on or off.
enable_network_policy: false
local_cluster_auth_endpoint:
enabled: true
#
# Rancher Config
#
rancher_kubernetes_engine_config: # Your RKE template config goes here.
addon_job_timeout: 30
authentication:
strategy: x509
ignore_docker_version: true
#
# # Currently only nginx ingress provider is supported.
# # To disable ingress controller, set `provider: none`
# # To enable ingress on specific nodes, use the node_selector, eg:
# provider: nginx
# node_selector:
# app: ingress
#
ingress:
provider: nginx
kubernetes_version: v1.15.3-rancher3-1
monitoring:
provider: metrics-server
#
# If you are using calico on AWS
#
# network:
# plugin: calico
# calico_network_provider:
# cloud_provider: aws
#
# # To specify flannel interface
#
# network:
# plugin: flannel
# flannel_network_provider:
# iface: eth1
#
# # To specify flannel interface for canal plugin
#
# network:
# plugin: canal
# canal_network_provider:
# iface: eth1
#
network:
options:
flannel_backend_type: vxlan
plugin: canal
#
# services:
# kube-api:
# service_cluster_ip_range: 10.43.0.0/16
# kube-controller:
# cluster_cidr: 10.42.0.0/16
# service_cluster_ip_range: 10.43.0.0/16
# kubelet:
# cluster_domain: cluster.local
# cluster_dns_server: 10.43.0.10
#
services:
etcd:
backup_config:
enabled: true
interval_hours: 12
retention: 6
safe_timestamp: false
creation: 12h
extra_args:
election-timeout: 5000
heartbeat-interval: 500
gid: 0
retention: 72h
snapshot: false
uid: 0
kube_api:
always_pull_images: false
pod_security_policy: false
service_node_port_range: 30000-32767
ssh_agent_auth: false
windows_prefered_cluster: false
```
@@ -0,0 +1,15 @@
---
title: Overriding Template Settings
weight: 33
---
When a user creates an RKE template, each setting in the template has a switch in the Rancher UI that indicates if users can override the setting. This switch marks those settings as **Allow User Override.**
After a cluster is created with a template, end users can't update any of the settings defined in the template unless the template owner marked them as **Allow User Override.** However, if the template is [updated to a new revision]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising) that changes the settings or allows end users to change them, the cluster can be upgraded to a new revision of the template and the changes in the new revision will be applied to the cluster.
When any parameter is set as **Allow User Override** on the RKE template, it means that end users have to fill out those fields during cluster creation and they can edit those settings afterward at any time.
The **Allow User Override** model of the RKE template is useful for situations such as:
- Administrators know that some settings will need the flexibility to be frequently updated over time
- End users will need to enter their own access keys or secret keys, for example, cloud credentials or credentials for backup snapshots
@@ -0,0 +1,70 @@
---
title: RKE Templates and Infrastructure
weight: 90
---
In Rancher, RKE templates are used to provision Kubernetes and define Rancher settings, while node templates are used to provision nodes.
Therefore, even if RKE template enforcement is turned on, the end user still has flexibility when picking the underlying hardware when creating a Rancher cluster. The end users of an RKE template can still choose an infrastructure provider and the nodes they want to use.
If you want to standardize the hardware in your clusters, use RKE templates conjunction with node templates or with a server provisioning tool such as Terraform.
### Node Templates
[Node templates]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/node-templates) are responsible for node configuration and node provisioning in Rancher. From your user profile, you can set up node templates to define which templates are used in each of your node pools. With node pools enabled, you can make sure you have the required number of nodes in each node pool, and ensure that all nodes in the pool are the same.
### Terraform
Terraform is a server provisioning tool. It uses infrastructure-as-code that lets you create almost every aspect of your infrastructure with Terraform configuration files. It can automate the process of server provisioning in a way that is self-documenting and easy to track in version control.
This section focuses on how to use Terraform with the [Rancher 2 Terraform provider](https://www.terraform.io/docs/providers/rancher2/), which is a recommended option to standardize the hardware for your Kubernetes clusters. If you use the Rancher Terraform provider to provision hardware, and then use an RKE template to provision a Kubernetes cluster on that hardware, you can quickly create a comprehensive, production-ready cluster.
Terraform allows you to:
- Define almost any kind of infrastructure-as-code, including servers, databases, load balancers, monitoring, firewall settings, and SSL certificates
- Leverage catalog apps and multi-cluster apps
- Codify infrastructure across many platforms, including Rancher and major cloud providers
- Commit infrastructure-as-code to version control
- Easily repeat configuration and setup of infrastructure
- Incorporate infrastructure changes into standard development practices
- Prevent configuration drift, in which some servers become configured differently than others
# How Does Terraform Work?
Terraform is written in files with the extension `.tf`. It is written in HashiCorp Configuration Language, which is a declarative language that lets you define the infrastructure you want in your cluster, the cloud provider you are using, and your credentials for the provider. Then Terraform makes API calls to the provider in order to efficiently create that infrastructure.
To create a Rancher-provisioned cluster with Terraform, go to your Terraform configuration file and define the provider as Rancher 2. You can set up your Rancher 2 provider with a Rancher API key. Note: The API key has the same permissions and access level as the user it is associated with.
Then Terraform calls the Rancher API to provision your infrastructure, and Rancher calls the infrastructure provider. As an example, if you wanted to use Rancher to provision infrastructure on AWS, you would provide both your Rancher API key and your AWS credentials in the Terraform configuration file or in environment variables so that they could be used to provision the infrastructure.
When you need to make changes to your infrastructure, instead of manually updating the servers, you can make changes in the Terraform configuration files. Then those files can be committed to version control, validated, and reviewed as necessary. Then when you run `terraform apply`, the changes would be deployed.
# Tips for Working with Terraform
- There are examples of how to provide most aspects of a cluster in the [documentation for the Rancher 2 provider.](https://www.terraform.io/docs/providers/rancher2/)
- In the Terraform settings, you can install Docker Machine by using the Docker Machine node driver.
- You can also modify auth in the Terraform provider.
- You can reverse engineer how to do define a setting in Terraform by changing the setting in Rancher, then going back and checking your Terraform state file to see how it maps to the current state of your infrastructure.
- If you want to manage Kubernetes cluster settings, Rancher settings, and hardware settings all in one place, use [Terraform modules](https://github.com/rancher/terraform-modules). You can pass a cluster configuration YAML file or an RKE template configuration file to a Terraform module so that the Terraform module will create it. In that case, you could use your infrastructure-as-code to manage the version control and revision history of both your Kubernetes cluster and its underlying hardware.
# Tip for Creating CIS Benchmark Compliant Clusters
This section describes one way that you can make security and compliance-related config files standard in your clusters.
When you create a [CIS benchmark compliant cluster,]({{<baseurl>}}/rancher/v2.0-v2.4/en/security/) you have an encryption config file and an audit log config file.
Your infrastructure provisioning system can write those files to disk. Then in your RKE template, you would specify where those files will be, then add your encryption config file and audit log config file as extra mounts to the `kube-api-server`.
Then you would make sure that the `kube-api-server` flag in your RKE template uses your CIS-compliant config files.
In this way, you can create flags that comply with the CIS benchmark.
# Resources
- [Terraform documentation](https://www.terraform.io/docs/)
- [Rancher2 Terraform provider documentation](https://www.terraform.io/docs/providers/rancher2/)
- [The RanchCast - Episode 1: Rancher 2 Terraform Provider](https://youtu.be/YNCq-prI8-8): In this demo, Director of Community Jason van Brackel walks through using the Rancher 2 Terraform Provider to provision nodes and create a custom cluster.
@@ -0,0 +1,61 @@
---
title: Access and Sharing
weight: 31
---
If you are an RKE template owner, you can share it with users or groups of users, who can then use the template to create clusters.
Since RKE templates are specifically shared with users and groups, owners can share different RKE templates with different sets of users.
When you share a template, each user can have one of two access levels:
- **Owner:** This user can update, delete, and share the templates that they own. The owner can also share the template with other users.
- **User:** These users can create clusters using the template. They can also upgrade those clusters to new revisions of the same template. When you share a template as **Make Public (read-only),** all users in your Rancher setup have the User access level for the template.
If you create a template, you automatically become an owner of that template.
If you want to delegate responsibility for updating the template, you can share ownership of the template. For details on how owners can modify templates, refer to the [documentation about revising templates.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rke-templates/creating-and-revising)
There are several ways to share templates:
- Add users to a new RKE template during template creation
- Add users to an existing RKE template
- Make the RKE template public, sharing it with all users in the Rancher setup
- Share template ownership with users who are trusted to modify the template
### Sharing Templates with Specific Users or Groups
To allow users or groups to create clusters using your template, you can give them the basic **User** access level for the template.
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template that you want to share and click the **&#8942; > Edit.**
1. In the **Share Template** section, click on **Add Member**.
1. Search in the **Name** field for the user or group you want to share the template with.
1. Choose the **User** access type.
1. Click **Save.**
**Result:** The user or group can create clusters using the template.
### Sharing Templates with All Users
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the template that you want to share and click the **&#8942; > Edit.**
1. Under **Share Template,** click **Make Public (read-only).** Then click **Save.**
**Result:** All users in the Rancher setup can create clusters using the template.
### Sharing Ownership of Templates
If you are the creator of a template, you might want to delegate responsibility for maintaining and updating a template to another user or group.
In that case, you can give users the Owner access type, which allows another user to update your template, delete it, or share access to it with other users.
To give Owner access to a user or group,
1. From the **Global** view, click **Tools > RKE Templates.**
1. Go to the RKE template that you want to share and click the **&#8942; > Edit.**
1. Under **Share Template**, click on **Add Member** and search in the **Name** field for the user or group you want to share the template with.
1. In the **Access Type** field, click **Owner.**
1. Click **Save.**
**Result:** The user or group has the Owner access type, and can modify, share, or delete the template.
@@ -0,0 +1,52 @@
---
title: API
weight: 24
---
## How to use the API
The API has its own user interface accessible from a web browser. This is an easy way to see resources, perform actions, and see the equivalent cURL or HTTP request & response. To access it, click on your user avatar in the upper right corner. Under **API & Keys**, you can find the URL endpoint as well as create [API keys]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/api-keys/).
## Authentication
API requests must include authentication information. Authentication is done with HTTP basic authentication using [API Keys]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/api-keys/). API keys can create new clusters and have access to multiple clusters via `/v3/clusters/`. [Cluster and project roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/) apply to these keys and restrict what clusters and projects the account can see and what actions they can take.
By default, some cluster-level API tokens are generated with infinite time-to-live (`ttl=0`). In other words, API tokens with `ttl=0` never expire unless you invalidate them. For details on how to invalidate them, refer to the [API tokens page]({{<baseurl>}}/rancher/v2.0-v2.4/en/api/api-tokens).
## Making requests
The API is generally RESTful but has several features to make the definition of everything discoverable by a client so that generic clients can be written instead of having to write specific code for every type of resource. For detailed info about the generic API spec, [see here](https://github.com/rancher/api-spec/blob/master/specification.md).
- Every type has a Schema which describes:
- The URL to get to the collection of this type of resources
- Every field the resource can have, along with their type, basic validation rules, whether they are required or optional, etc.
- Every action that is possible on this type of resource, with their inputs and outputs (also as schemas).
- Every field that filtering is allowed on
- What HTTP verb methods are available for the collection itself, or for individual resources in the collection.
- So the theory is that you can load just the list of schemas and know everything about the API. This is in fact how the UI for the API works, it contains no code specific to Rancher itself. The URL to get Schemas is sent in every HTTP response as a `X-Api-Schemas` header. From there you can follow the `collection` link on each schema to know where to list resources, and other `links` inside of the returned resources to get any other information.
- In practice, you will probably just want to construct URL strings. We highly suggest limiting this to the top-level to list a collection (`/v3/<type>`) or get a specific resource (`/v3/<type>/<id>`). Anything deeper than that is subject to change in future releases.
- Resources have relationships between each other called links. Each resource includes a map of `links` with the name of the link and the URL to retrieve that information. Again you should `GET` the resource and then follow the URL in the `links` map, not construct these strings yourself.
- Most resources have actions, which do something or change the state of the resource. To use these, send a HTTP `POST` to the URL in the `actions` map for the action you want. Some actions require input or produce output, see the individual documentation for each type or the schemas for specific information.
- To edit a resource, send a HTTP `PUT` to the `links.update` link on the resource with the fields that you want to change. If the link is missing then you don't have permission to update the resource. Unknown fields and ones that are not editable are ignored.
- To delete a resource, send a HTTP `DELETE` to the `links.remove` link on the resource. If the link is missing then you don't have permission to update the resource.
- To create a new resource, HTTP `POST` to the collection URL in the schema (which is `/v3/<type>`).
## Filtering
Most collections can be filtered on the server-side by common fields using HTTP query parameters. The `filters` map shows you what fields can be filtered on and what the filtered values were for the request you made. The API UI has controls to setup filtering and show you the appropriate request. For simple "equals" matches it's just `field=value`. Modifiers can be added to the field name, e.g. `field_gt=42` for "field is greater than 42". See the [API spec](https://github.com/rancher/api-spec/blob/master/specification.md#filtering) for full details.
## Sorting
Most collections can be sorted on the server-side by common fields using HTTP query parameters. The `sortLinks` map shows you what sorts are available, along with the URL to get the collection sorted by that. It also includes info about what the current response was sorted by, if specified.
## Pagination
API responses are paginated with a limit of 100 resources per page by default. This can be changed with the `limit` query parameter, up to a maximum of 1000, e.g. `/v3/pods?limit=1000`. The `pagination` map in collection responses tells you whether or not you have the full result set and has a link to the next page if you do not.
@@ -0,0 +1,51 @@
---
title: API Tokens
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/api/api-tokens/
---
By default, some cluster-level API tokens are generated with infinite time-to-live (`ttl=0`). In other words, API tokens with `ttl=0` never expire unless you invalidate them. Tokens are not invalidated by changing a password.
You can deactivate API tokens by deleting them or by deactivating the user account.
### Deleting tokens
To delete a token,
1. Go to the list of all tokens in the Rancher API view at `https://<Rancher-Server-IP>/v3/tokens`.
1. Access the token you want to delete by its ID. For example, `https://<Rancher-Server-IP>/v3/tokens/kubectl-shell-user-vqkqt`
1. Click **Delete.**
Here is the complete list of tokens that are generated with `ttl=0`:
| Token | Description |
|-------|-------------|
| `kubeconfig-*` | Kubeconfig token |
| `kubectl-shell-*` | Access to `kubectl` shell in the browser |
| `agent-*` | Token for agent deployment |
| `compose-token-*` | Token for compose |
| `helm-token-*` | Token for Helm chart deployment |
| `*-pipeline*` | Pipeline token for project |
| `telemetry-*` | Telemetry token |
| `drain-node-*` | Token for drain (we use `kubectl` for drain because there is no native Kubernetes API) |
### Setting TTL on Kubeconfig Tokens
_**Available as of v2.4.6**_
Starting Rancher v2.4.6, admins can set a global TTL on Kubeconfig tokens. Once the token expires the kubectl command will require the user to authenticate to Rancher.
_**Note:**_:
Existing kubeconfig tokens won't be updated with the new TTL. Admins can [delete old kubeconfig tokens](#deleting-tokens).
1. Disable the kubeconfig-generate-token setting in the Rancher API view at `https://<Rancher-Server-IP/v3/settings/kubeconfig-generate-token`. This setting instructs Rancher to no longer automatically generate a token when a user clicks on download a kubeconfig file. The kubeconfig file will now provide a command to login to Rancher.
2. Edit the setting and set the value to `false`.
3. Go to setting kubeconfig-token-ttl-minutes in the Rancher API view at `https://<Rancher-Server-IP/v3/settings/kubeconfig-token-ttl-minutes`. By default, kubeconfig-token-ttl-minutes is 960 (16 hours).
4. Edit the setting and set the value to desired duration in minutes.
_**Note:**_ This value cannot exceed max-ttl of API tokens.(`https://<Rancher-Server-IP/v3/settings/auth-token-max-ttl-minutes`). In Rancher v2.4.6, auth-token-max-ttl-minutes is set to 1440 (24 hours) by default. Starting Rancher v2.4.7, auth-token-max-ttl-minutes would default to 0 allowing tokens to never expire, similar to v2.4.5.
@@ -0,0 +1,12 @@
---
title: Backups and Disaster Recovery
weight: 5
---
This section is devoted to protecting your data in a disaster scenario.
To protect yourself from a disaster scenario, you should create backups on a regular basis.
- [Backup](./backup)
- [Restore](./restore)
@@ -0,0 +1,20 @@
---
title: Backup
weight: 50
aliases:
- /rancher/v2.0-v2.4/en/installation/after-installation/
- /rancher/v2.0-v2.4/en/backups/
- /rancher/v2.0-v2.4/en/backups/backups
- /rancher/v2.0-v2.4/en/backups/legacy/backup
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/backup/
---
This section contains information about how to create backups of your Rancher data and how to restore them in a disaster scenario.
- Rancher server backups:
- [Rancher installed on a K3s Kubernetes cluster](./k3s-backups)
- [Rancher installed on an RKE Kubernetes cluster](./rke-backups)
- [Rancher installed with Docker](./docker-backups)
For information on backing up Rancher launched Kubernetes clusters, refer to [this section.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/)
If you are looking to back up your [Rancher launched Kubernetes cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/), please refer [here]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/).
@@ -0,0 +1,87 @@
---
title: Backing up Rancher Installed with Docker
shortTitle: Docker Installs
weight: 3
aliases:
- /rancher/v2.0-v2.4/en/installation/after-installation/single-node-backup-and-restoration/
- /rancher/v2.0-v2.4/en/installation/after-installation/single-node-backup-and-restoration/
- /rancher/v2.0-v2.4/en/backups/backups/single-node-backups/
- /rancher/v2.0-v2.4/en/backups/legacy/backup/single-node-backups/
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/backup/docker-backups
---
After completing your Docker installation of Rancher, we recommend creating backups of it on a regular basis. Having a recent backup will let you recover quickly from an unexpected disaster.
### How to Read Placeholders
During the creation of your backup, you'll enter a series of commands, replacing placeholders with data from your environment. These placeholders are denoted with angled brackets and all capital letters (`<EXAMPLE>`). Here's an example of a command with a placeholder:
```
docker run \
--volumes-from rancher-data-<DATE> \
-v $PWD:/backup busybox tar pzcvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz /var/lib/rancher
```
In this command, `<DATE>` is a placeholder for the date that the data container and backup were created. `9-27-18` for example.
### Obtaining Placeholder Data
Get the placeholder data by running:
```
docker ps
```
Write down or copy this information before starting the [procedure below](#creating-a-backup).
<sup>Terminal `docker ps` Command, Displaying Where to Find `<RANCHER_CONTAINER_TAG>` and `<RANCHER_CONTAINER_NAME>`</sup>
![Placeholder Reference]({{<baseurl>}}/img/rancher/placeholder-ref.png)
| Placeholder | Example | Description |
| -------------------------- | -------------------------- | --------------------------------------------------------- |
| `<RANCHER_CONTAINER_TAG>` | `v2.0.5` | The rancher/rancher image you pulled for initial install. |
| `<RANCHER_CONTAINER_NAME>` | `festive_mestorf` | The name of your Rancher container. |
| `<RANCHER_VERSION>` | `v2.0.5` | The version of Rancher that you're creating a backup for. |
| `<DATE>` | `9-27-18` | The date that the data container or backup was created. |
<br/>
You can obtain `<RANCHER_CONTAINER_TAG>` and `<RANCHER_CONTAINER_NAME>` by logging into your Rancher Server by remote connection and entering the command to view the containers that are running: `docker ps`. You can also view containers that are stopped with `docker ps -a`. Use these commands for help anytime while creating backups.
### Creating a Backup
This procedure creates a backup that you can restore if Rancher encounters a disaster scenario.
1. Using a remote Terminal connection, log into the node running your Rancher Server.
1. Stop the container currently running Rancher Server. Replace `<RANCHER_CONTAINER_NAME>` with the [name of your Rancher container](#how-to-read-placeholders).
```
docker stop <RANCHER_CONTAINER_NAME>
```
1. <a id="backup"></a>Use the command below, replacing each placeholder, to create a data container from the Rancher container that you just stopped.
```
docker create --volumes-from <RANCHER_CONTAINER_NAME> --name rancher-data-<DATE> rancher/rancher:<RANCHER_CONTAINER_TAG>
```
1. <a id="tarball"></a>From the data container that you just created (`rancher-data-<DATE>`), create a backup tarball (`rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz`). Use the following command, replacing each placeholder.
```
docker run --volumes-from rancher-data-<DATE> -v $PWD:/backup:z busybox tar pzcvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz /var/lib/rancher
```
**Step Result:** A stream of commands runs on the screen.
1. Enter the `ls` command to confirm that the backup tarball was created. It will have a name similar to `rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz`.
1. Move your backup tarball to a safe location external to your Rancher Server. Then delete the `rancher-data-<DATE>` container from your Rancher Server.
1. Restart Rancher Server. Replace `<RANCHER_CONTAINER_NAME>` with the name of your Rancher container.
```
docker start <RANCHER_CONTAINER_NAME>
```
**Result:** A backup tarball of your Rancher Server data is created. See [Restoring Backups: Docker Installs]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/restorations/single-node-restoration) if you need to restore backup data.
@@ -0,0 +1,33 @@
---
title: Backing up Rancher Installed on a K3s Kubernetes Cluster
shortTitle: K3s Installs
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/backups/backups/k3s-backups
- /rancher/v2.0-v2.4/en/backups/backups/k8s-backups/k3s-backups
- /rancher/v2.0-v2.4/en/backups/legacy/backup/k8s-backups/k3s-backups/
- /rancher/v2.0-v2.4/en/backups/legacy/backups/k3s-backups
- /rancher/v2.0-v2.4/en/backups/legacy/backup/k3s-backups
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/backup/k3s-backups
---
When Rancher is installed on a high-availability Kubernetes cluster, we recommend using an external database to store the cluster data.
The database administrator will need to back up the external database, or restore it from a snapshot or dump.
We recommend configuring the database to take recurring snapshots.
### K3s Kubernetes Cluster Data
One main advantage of this K3s architecture is that it allows an external datastore to hold the cluster data, allowing the K3s server nodes to be treated as ephemeral.
<figcaption>Architecture of a K3s Kubernetes Cluster Running the Rancher Management Server</figcaption>
![Architecture of an RKE Kubernetes Cluster Running the Rancher Management Server]({{<baseurl>}}/img/rancher/k3s-server-storage.svg)
### Creating Snapshots and Restoring Databases from Snapshots
For details on taking database snapshots and restoring your database from them, refer to the official database documentation:
- [Official MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-snapshot-method.html)
- [Official PostgreSQL documentation](https://www.postgresql.org/docs/8.3/backup-dump.html)
- [Official etcd documentation](https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md)
@@ -0,0 +1,181 @@
---
title: Backing up Rancher Installed on an RKE Kubernetes Cluster
shortTitle: RKE Installs
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/installation/after-installation/k8s-install-backup-and-restoration/
- /rancher/v2.0-v2.4/en/installation/backups-and-restoration/ha-backup-and-restoration/
- /rancher/v2.0-v2.4/en/backups/backups/ha-backups
- /rancher/v2.0-v2.4/en/backups/backups/k8s-backups/ha-backups
- /rancher/v2.0-v2.4/en/backups/legacy/backup/k8s-backups/ha-backups/
- /rancher/v2.0-v2.4/en/backups/legacy/backups/ha-backups
- /rancher/v2.0-v2.4/en/backups/legacy/backup/ha-backups
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/backup/rke-backups
---
This section describes how to create backups of your high-availability Rancher install.
In an RKE installation, the cluster data is replicated on each of three etcd nodes in the cluster, providing redundancy and data duplication in case one of the nodes fails.
<figcaption>Cluster Data within an RKE Kubernetes Cluster Running the Rancher Management Server</figcaption>
![Architecture of an RKE Kubernetes cluster running the Rancher management server]({{<baseurl>}}/img/rancher/rke-server-storage.svg)
# Requirements
### RKE Version
The commands for taking `etcd` snapshots are only available in RKE v0.1.7 and later.
### RKE Config File
You'll need the RKE config file that you used for Rancher install, `rancher-cluster.yml`. You created this file during your initial install. Place this file in same directory as the RKE binary.
# Backup Outline
Backing up your high-availability Rancher cluster is process that involves completing multiple tasks.
1. [Take Snapshots of the `etcd` Database](#1-take-snapshots-of-the-etcd-database)
Take snapshots of your current `etcd` database using Rancher Kubernetes Engine (RKE).
1. [Store Snapshot(s) Externally](#2-back-up-local-snapshots-to-a-safe-location)
After taking your snapshots, export them to a safe location that won't be affected if your cluster encounters issues.
# 1. Take Snapshots of the `etcd` Database
Take snapshots of your `etcd` database. You can use these snapshots later to recover from a disaster scenario. There are two ways to take snapshots: recurringly, or as a one-off. Each option is better suited to a specific use case. Read the short description below each link to know when to use each option.
- [Option A: Recurring Snapshots](#option-a-recurring-snapshots)
After you stand up a high-availability Rancher install, we recommend configuring RKE to automatically take recurring snapshots so that you always have a safe restore point available.
- [Option B: One-Time Snapshots](#option-b-one-time-snapshots)
We advise taking one-time snapshots before events like upgrades or restore of another snapshot.
### Option A: Recurring Snapshots
For all high-availability Rancher installs, we recommend taking recurring snapshots so that you always have a safe restore point available.
To take recurring snapshots, enable the `etcd-snapshot` service, which is a service that's included with RKE. This service runs in a service container alongside the `etcd` container. You can enable this service by adding some code to `rancher-cluster.yml`.
**To Enable Recurring Snapshots:**
The steps to enable recurring snapshots differ based on the version of RKE.
{{% tabs %}}
{{% tab "RKE v0.2.0+" %}}
1. Open `rancher-cluster.yml` with your favorite text editor.
2. Edit the code for the `etcd` service to enable recurring snapshots. Snapshots can be saved in a S3 compatible backend.
```
services:
etcd:
backup_config:
enabled: true # enables recurring etcd snapshots
interval_hours: 6 # time increment between snapshots
retention: 60 # time in days before snapshot purge
# Optional S3
s3backupconfig:
access_key: "myaccesskey"
secret_key: "myaccesssecret"
bucket_name: "my-backup-bucket"
folder: "folder-name" # Available as of v2.3.0
endpoint: "s3.eu-west-1.amazonaws.com"
region: "eu-west-1"
custom_ca: |-
-----BEGIN CERTIFICATE-----
$CERTIFICATE
-----END CERTIFICATE-----
```
4. Save and close `rancher-cluster.yml`.
5. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
6. Run the following command:
```
rke up --config rancher-cluster.yml
```
**Result:** RKE is configured to take recurring snapshots of `etcd` on all nodes running the `etcd` role. Snapshots are saved locally to the following directory: `/opt/rke/etcd-snapshots/`. If configured, the snapshots are also uploaded to your S3 compatible backend.
{{% /tab %}}
{{% tab "RKE v0.1.x" %}}
1. Open `rancher-cluster.yml` with your favorite text editor.
2. Edit the code for the `etcd` service to enable recurring snapshots.
```
services:
etcd:
snapshot: true # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge
```
4. Save and close `rancher-cluster.yml`.
5. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
6. Run the following command:
```
rke up --config rancher-cluster.yml
```
**Result:** RKE is configured to take recurring snapshots of `etcd` on all nodes running the `etcd` role. Snapshots are saved locally to the following directory: `/opt/rke/etcd-snapshots/`.
{{% /tab %}}
{{% /tabs %}}
### Option B: One-Time Snapshots
When you're about to upgrade Rancher or restore it to a previous snapshot, you should snapshot your live image so that you have a backup of `etcd` in its last known state.
**To Take a One-Time Local Snapshot:**
1. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
2. Enter the following command. Replace `<SNAPSHOT.db>` with any name that you want to use for the snapshot (e.g. `upgrade.db`).
```
rke etcd snapshot-save \
--name <SNAPSHOT.db> \
--config rancher-cluster.yml
```
**Result:** RKE takes a snapshot of `etcd` running on each `etcd` node. The file is saved to `/opt/rke/etcd-snapshots`.
**To Take a One-Time S3 Snapshot:**
_Available as of RKE v0.2.0_
1. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
2. Enter the following command. Replace `<SNAPSHOT.db>` with any name that you want to use for the snapshot (e.g. `upgrade.db`).
```shell
rke etcd snapshot-save \
--config rancher-cluster.yml \
--name snapshot-name \
--s3 \
--access-key S3_ACCESS_KEY \
--secret-key S3_SECRET_KEY \
--bucket-name s3-bucket-name \
--s3-endpoint s3.amazonaws.com \
--folder folder-name # Available as of v2.3.0
```
**Result:** RKE takes a snapshot of `etcd` running on each `etcd` node. The file is saved to `/opt/rke/etcd-snapshots`. It is also uploaded to the S3 compatible backend.
# 2. Back up Local Snapshots to a Safe Location
> **Note:** If you are using RKE v0.2.0, you can enable saving the backups to a S3 compatible backend directly and skip this step.
After taking the `etcd` snapshots, save them to a safe location so that they're unaffected if your cluster experiences a disaster scenario. This location should be persistent.
In this documentation, as an example, we're using Amazon S3 as our safe location, and [S3cmd](http://s3tools.org/s3cmd) as our tool to create the backups. The backup location and tool that you use are ultimately your decision.
**Example:**
```
root@node:~# s3cmd mb s3://rke-etcd-snapshots
root@node:~# s3cmd put /opt/rke/etcd-snapshots/snapshot.db s3://rke-etcd-snapshots/
```
@@ -0,0 +1,15 @@
---
title: Restore
weight: 1010
aliases:
- /rancher/v2.0-v2.4/en/backups/restorations
- /rancher/v2.0-v2.4/en/backups/legacy/restore
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/restore
---
If you lose the data on your Rancher Server, you can restore it if you have backups stored in a safe location.
- [Restoring backups for Rancher installed with Docker](./docker-restores)
- [Restoring backups for Rancher installed on an RKE Kubernetes cluster](./rke-restore)
- [Restoring backups for Rancher installed on a K3s Kubernetes cluster](./k3s-restore)
If you are looking to restore your [Rancher launched Kubernetes cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/), please refer to [this section]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/restoring-etcd/).
@@ -0,0 +1,73 @@
---
title: Restoring Backups—Docker Installs
shortTitle: Docker Installs
weight: 3
aliases:
- /rancher/v2.0-v2.4/en/installation/after-installation/single-node-backup-and-restoration/
- /rancher/v2.0-v2.4/en/backups/restorations/single-node-restoration
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/restore/docker-restores
---
If you encounter a disaster scenario, you can restore your Rancher Server to your most recent backup.
## Before You Start
During restore of your backup, you'll enter a series of commands, filling placeholders with data from your environment. These placeholders are denoted with angled brackets and all capital letters (`<EXAMPLE>`). Here's an example of a command with a placeholder:
```
docker run --volumes-from <RANCHER_CONTAINER_NAME> -v $PWD:/backup \
busybox sh -c "rm /var/lib/rancher/* -rf && \
tar pzxvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>"
```
In this command, `<RANCHER_CONTAINER_NAME>` and `<RANCHER_VERSION>-<DATE>` are environment variables for your Rancher deployment.
Cross reference the image and reference table below to learn how to obtain this placeholder data. Write down or copy this information before starting the procedure below.
<sup>Terminal `docker ps` Command, Displaying Where to Find `<RANCHER_CONTAINER_TAG>` and `<RANCHER_CONTAINER_NAME>`</sup>
![Placeholder Reference]({{<baseurl>}}/img/rancher/placeholder-ref.png)
| Placeholder | Example | Description |
| -------------------------- | -------------------------- | --------------------------------------------------------- |
| `<RANCHER_CONTAINER_TAG>` | `v2.0.5` | The rancher/rancher image you pulled for initial install. |
| `<RANCHER_CONTAINER_NAME>` | `festive_mestorf` | The name of your Rancher container. |
| `<RANCHER_VERSION>` | `v2.0.5` | The version number for your Rancher backup. |
| `<DATE>` | `9-27-18` | The date that the data container or backup was created. |
<br/>
You can obtain `<RANCHER_CONTAINER_TAG>` and `<RANCHER_CONTAINER_NAME>` by logging into your Rancher Server by remote connection and entering the command to view the containers that are running: `docker ps`. You can also view containers that are stopped using a different command: `docker ps -a`. Use these commands for help anytime during while creating backups.
## Restoring Backups
Using a [backup]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/backups/single-node-backups/) that you created earlier, restore Rancher to its last known healthy state.
1. Using a remote Terminal connection, log into the node running your Rancher Server.
1. Stop the container currently running Rancher Server. Replace `<RANCHER_CONTAINER_NAME>` with the name of your Rancher container.
```
docker stop <RANCHER_CONTAINER_NAME>
```
1. Move the backup tarball that you created during completion of [Creating Backups—Docker Installs]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/backups/single-node-backups/) onto your Rancher Server. Change to the directory that you moved it to. Enter `dir` to confirm that it's there.
If you followed the naming convention we suggested in [Creating Backups—Docker Installs]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/backups/single-node-backups/), it will have a name similar to `rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz`.
1. Enter the following command to delete your current state data and replace it with your backup data, replacing the placeholders. Don't forget to close the quotes.
>**Warning!** This command deletes all current state data from your Rancher Server container. Any changes saved after your backup tarball was created will be lost.
```
docker run --volumes-from <RANCHER_CONTAINER_NAME> -v $PWD:/backup \
busybox sh -c "rm /var/lib/rancher/* -rf && \
tar pzxvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz"
```
**Step Result:** A series of commands should run.
1. Restart your Rancher Server container, replacing the placeholder. It will restart using your backup data.
```
docker start <RANCHER_CONTAINER_NAME>
```
1. Wait a few moments and then open Rancher in a web browser. Confirm that the restore succeeded and that your data is restored.
@@ -0,0 +1,25 @@
---
title: Restoring Rancher Installed on a K3s Kubernetes Cluster
shortTitle: K3s Installs
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/backups/restorations/k3s-restoration
- /rancher/v2.0-v2.4/en/backups/restorations/k8s-restore/k3s-restore
- /rancher/v2.0-v2.4/en/backups/legacy/restore/k8s-restore/k3s-restore/
- /rancher/v2.0-v2.4/en/backups/legacy/restore/k3s-restore
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/restore/k3s-restore
---
When Rancher is installed on a high-availability Kubernetes cluster, we recommend using an external database to store the cluster data.
The database administrator will need to back up the external database, or restore it from a snapshot or dump.
We recommend configuring the database to take recurring snapshots.
### Creating Snapshots and Restoring Databases from Snapshots
For details on taking database snapshots and restoring your database from them, refer to the official database documentation:
- [Official MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-snapshot-method.html)
- [Official PostgreSQL documentation](https://www.postgresql.org/docs/8.3/backup-dump.html)
- [Official etcd documentation](https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md)
@@ -0,0 +1,140 @@
---
title: Restoring Backups—Kubernetes installs
shortTitle: RKE Installs
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/installation/after-installation/ha-backup-and-restoration/
- /rancher/v2.0-v2.4/en/backups/restorations/ha-restoration
- /rancher/v2.0-v2.4/en/backups/restorations/k8s-restore/rke-restore
- /rancher/v2.0-v2.4/en/backups/legacy/restore/k8s-restore/rke-restore/
- /rancher/v2.0-v2.4/en/backups/legacy/restore/rke-restore
- /rancher/v2.0-v2.4/en/backups/v2.0.x-v2.4.x/restore/rke-restore
---
This procedure describes how to use RKE to restore a snapshot of the Rancher Kubernetes cluster.
This will restore the Kubernetes configuration and the Rancher database and state.
> **Note:** This document covers clusters set up with RKE >= v0.2.x, for older RKE versions refer to the [RKE Documentation]({{<baseurl>}}/rke/latest/en/etcd-snapshots/restoring-from-backup).
## Restore Outline
<!-- TOC -->
- [1. Preparation](#1-preparation)
- [2. Place Snapshot](#2-place-snapshot)
- [3. Configure RKE](#3-configure-rke)
- [4. Restore the Database and bring up the Cluster](#4-restore-the-database-and-bring-up-the-cluster)
<!-- /TOC -->
### 1. Preparation
It is advised that you run the restore from your local host or a jump box/bastion where your cluster yaml, rke statefile, and kubeconfig are stored. You will need [RKE]({{<baseurl>}}/rke/latest/en/installation/) and [kubectl]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/kubectl/) CLI utilities installed locally.
Prepare by creating 3 new nodes to be the target for the restored Rancher instance. We recommend that you start with fresh nodes and a clean state. For clarification on the requirements, review the [Installation Requirements](https://rancher.com/docs/rancher/v2.0-v2.4/en/installation/requirements/).
Alternatively you can re-use the existing nodes after clearing Kubernetes and Rancher configurations. This will destroy the data on these nodes. See [Node Cleanup]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/cleaning-cluster-nodes/) for the procedure.
You must restore each of your etcd nodes to the same snapshot. Copy the snapshot you're using from one of your nodes to the others before running the `etcd snapshot-restore` command.
> **IMPORTANT:** Before starting the restore make sure all the Kubernetes services on the old cluster nodes are stopped. We recommend powering off the nodes to be sure.
### 2. Place Snapshot
As of RKE v0.2.0, snapshots could be saved in an S3 compatible backend. To restore your cluster from the snapshot stored in S3 compatible backend, you can skip this step and retrieve the snapshot in [4. Restore the Database and bring up the Cluster](#4-restore-the-database-and-bring-up-the-cluster). Otherwise, you will need to place the snapshot directly on one of the etcd nodes.
Pick one of the clean nodes that will have the etcd role assigned and place the zip-compressed snapshot file in `/opt/rke/etcd-snapshots` on that node.
> **Note:** Because of a current limitation in RKE, the restore process does not work correctly if `/opt/rke/etcd-snapshots` is a NFS share that is mounted on all nodes with the etcd role. The easiest options are to either keep `/opt/rke/etcd-snapshots` as a local folder during the restore process and only mount the NFS share there after it has been completed, or to only mount the NFS share to one node with an etcd role in the beginning.
### 3. Configure RKE
Use your original `rancher-cluster.yml` and `rancher-cluster.rkestate` files. If they are not stored in a version control system, it is a good idea to back them up before making any changes.
```
cp rancher-cluster.yml rancher-cluster.yml.bak
cp rancher-cluster.rkestate rancher-cluster.rkestate.bak
```
If the replaced or cleaned nodes have been configured with new IP addresses, modify the `rancher-cluster.yml` file to ensure the address and optional internal_address fields reflect the new addresses.
> **IMPORTANT:** You should not rename the `rancher-cluster.yml` or `rancher-cluster.rkestate` files. It is important that the filenames match each other.
### 4. Restore the Database and bring up the Cluster
You will now use the RKE command-line tool with the `rancher-cluster.yml` and the `rancher-cluster.rkestate` configuration files to restore the etcd database and bring up the cluster on the new nodes.
> **Note:** Ensure your `rancher-cluster.rkestate` is present in the same directory as the `rancher-cluster.yml` file before starting the restore, as this file contains the certificate data for the cluster.
#### Restoring from a Local Snapshot
When restoring etcd from a local snapshot, the snapshot is assumed to be located on the target node in the directory `/opt/rke/etcd-snapshots`.
```
rke etcd snapshot-restore --name snapshot-name --config ./rancher-cluster.yml
```
> **Note:** The --name parameter expects the filename of the snapshot without the extension.
#### Restoring from a Snapshot in S3
_Available as of RKE v0.2.0_
When restoring etcd from a snapshot located in an S3 compatible backend, the command needs the S3 information in order to connect to the S3 backend and retrieve the snapshot.
```
$ rke etcd snapshot-restore --config ./rancher-cluster.yml --name snapshot-name \
--s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
--bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com \
--folder folder-name # Available as of v2.3.0
```
#### Options for `rke etcd snapshot-restore`
S3 specific options are only available for RKE v0.2.0+.
| Option | Description | S3 Specific |
| --- | --- | ---|
| `--name` value | Specify snapshot name | |
| `--config` value | Specify an alternate cluster YAML file (default: "cluster.yml") [$RKE_CONFIG] | |
| `--s3` | Enabled backup to s3 |* |
| `--s3-endpoint` value | Specify s3 endpoint url (default: "s3.amazonaws.com") | * |
| `--access-key` value | Specify s3 accessKey | *|
| `--secret-key` value | Specify s3 secretKey | *|
| `--bucket-name` value | Specify s3 bucket name | *|
| `--folder` value | Specify s3 folder in the bucket name _Available as of v2.3.0_ | *|
| `--region` value | Specify the s3 bucket location (optional) | *|
| `--ssh-agent-auth` | [Use SSH Agent Auth defined by SSH_AUTH_SOCK]({{<baseurl>}}/rke/latest/en/config-options/#ssh-agent) | |
| `--ignore-docker-version` | [Disable Docker version check]({{<baseurl>}}/rke/latest/en/config-options/#supported-docker-versions) |
#### Testing the Cluster
Once RKE completes it will have created a credentials file in the local directory. Configure `kubectl` to use the `kube_config_rancher-cluster.yml` credentials file and check on the state of the cluster. See [Installing and Configuring kubectl]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/kubectl/#configuration) for details.
#### Check Kubernetes Pods
Wait for the pods running in `kube-system`, `ingress-nginx` and the `rancher` pod in `cattle-system` to return to the `Running` state.
> **Note:** `cattle-cluster-agent` and `cattle-node-agent` pods will be in an `Error` or `CrashLoopBackOff` state until Rancher server is up and the DNS/Load Balancer have been pointed at the new cluster.
```
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-system cattle-cluster-agent-766585f6b-kj88m 0/1 Error 6 4m
cattle-system cattle-node-agent-wvhqm 0/1 Error 8 8m
cattle-system rancher-78947c8548-jzlsr 0/1 Running 1 4m
ingress-nginx default-http-backend-797c5bc547-f5ztd 1/1 Running 1 4m
ingress-nginx nginx-ingress-controller-ljvkf 1/1 Running 1 8m
kube-system canal-4pf9v 3/3 Running 3 8m
kube-system cert-manager-6b47fc5fc-jnrl5 1/1 Running 1 4m
kube-system kube-dns-7588d5b5f5-kgskt 3/3 Running 3 4m
kube-system kube-dns-autoscaler-5db9bbb766-s698d 1/1 Running 1 4m
kube-system metrics-server-97bc649d5-6w7zc 1/1 Running 1 4m
kube-system tiller-deploy-56c4cf647b-j4whh 1/1 Running 1 4m
```
#### Finishing Up
Rancher should now be running and available to manage your Kubernetes clusters.
> **IMPORTANT:** Remember to save your updated RKE config (`rancher-cluster.yml`) state file (`rancher-cluster.rkestate`) and `kubectl` credentials (`kube_config_rancher-cluster.yml`) files in a safe place for future maintenance for example in a version control system.
@@ -0,0 +1,75 @@
---
title: "Rolling back to v2.0.0-v2.1.5"
weight: 1
---
> Rolling back to Rancher v2.0-v2.1 is no longer supported. The instructions for rolling back to these versions are preserved here and are intended to be used only in cases where upgrading to Rancher v2.2+ is not feasible.
If you are rolling back to versions in either of these scenarios, you must follow some extra instructions in order to get your clusters working.
- Rolling back from v2.1.6+ to any version between v2.1.0 - v2.1.5 or v2.0.0 - v2.0.10.
- Rolling back from v2.0.11+ to any version between v2.0.0 - v2.0.10.
Because of the changes necessary to address [CVE-2018-20321](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-20321), special steps are necessary if the user wants to roll back to a previous version of Rancher where this vulnerability exists. The steps are as follows:
1. Record the `serviceAccountToken` for each cluster. To do this, save the following script on a machine with `kubectl` access to the Rancher management plane and execute it. You will need to run these commands on the machine where the rancher container is running. Ensure JQ is installed before running the command. The commands will vary depending on how you installed Rancher.
**Rancher Installed with Docker**
```
docker exec <NAME OF RANCHER CONTAINER> kubectl get clusters -o json | jq '[.items[] | select(any(.status.conditions[]; .type == "ServiceAccountMigrated")) | {name: .metadata.name, token: .status.serviceAccountToken}]' > tokens.json
```
**Rancher Installed on a Kubernetes Cluster**
```
kubectl get clusters -o json | jq '[.items[] | select(any(.status.conditions[]; .type == "ServiceAccountMigrated")) | {name: .metadata.name, token: .status.serviceAccountToken}]' > tokens.json
```
2. After executing the command a `tokens.json` file will be created. Important! Back up this file in a safe place.** You will need it to restore functionality to your clusters after rolling back Rancher. **If you lose this file, you may lose access to your clusters.**
3. Rollback Rancher following the [normal instructions]({{<baseurl>}}/rancher/v2.x/en/upgrades/rollbacks/).
4. Once Rancher comes back up, every cluster managed by Rancher (except for Imported clusters) will be in an `Unavailable` state.
5. Apply the backed up tokens based on how you installed Rancher.
**Rancher Installed with Docker**
Save the following script as `apply_tokens.sh` to the machine where the Rancher docker container is running. Also copy the `tokens.json` file created previously to the same directory as the script.
```
set -e
tokens=$(jq .[] -c tokens.json)
for token in $tokens; do
name=$(echo $token | jq -r .name)
value=$(echo $token | jq -r .token)
docker exec $1 kubectl patch --type=merge clusters $name -p "{\"status\": {\"serviceAccountToken\": \"$value\"}}"
done
```
the script to allow execution (`chmod +x apply_tokens.sh`) and execute the script as follows:
```
./apply_tokens.sh <DOCKER CONTAINER NAME>
```
After a few moments the clusters will go from Unavailable back to Available.
**Rancher Installed on a Kubernetes Cluster**
Save the following script as `apply_tokens.sh` to a machine with kubectl access to the Rancher management plane. Also copy the `tokens.json` file created previously to the same directory as the script.
```
set -e
tokens=$(jq .[] -c tokens.json)
for token in $tokens; do
name=$(echo $token | jq -r .name)
value=$(echo $token | jq -r .token)
kubectl patch --type=merge clusters $name -p "{\"status\": {\"serviceAccountToken\": \"$value\"}}"
done
```
Set the script to allow execution (`chmod +x apply_tokens.sh`) and execute the script as follows:
```
./apply_tokens.sh
```
After a few moments the clusters will go from `Unavailable` back to `Available`.
6. Continue using Rancher as normal.
@@ -0,0 +1,20 @@
---
title: Best Practices Guide
weight: 4
---
The purpose of this section is to consolidate best practices for Rancher implementations. This also includes recommendations for related technologies, such as Kubernetes, Docker, containers, and more. The objective is to improve the outcome of a Rancher implementation using the operational experience of Rancher and its customers.
If you have any questions about how these might apply to your use case, please contact your Customer Success Manager or Support.
Use the navigation bar on the left to find the current best practices for managing and deploying the Rancher Server.
For more guidance on best practices, you can consult these resources:
- [Security]({{<baseurl>}}/rancher/v2.0-v2.4/en/security/)
- [Rancher Blog](https://rancher.com/blog/)
- [Articles about best practices on the Rancher blog](https://rancher.com/tags/best-practices/)
- [101 More Security Best Practices for Kubernetes](https://rancher.com/blog/2019/2019-01-17-101-more-kubernetes-security-best-practices/)
- [Rancher Forum](https://forums.rancher.com/)
- [Rancher Users Slack](https://slack.rancher.io/)
- [Rancher Labs YouTube Channel - Online Meetups, Demos, Training, and Webinars](https://www.youtube.com/channel/UCh5Xtp82q8wjijP8npkVTBA/featured)
@@ -0,0 +1,52 @@
---
title: Tips for Setting Up Containers
weight: 100
aliases:
- /rancher/v2.0-v2.4/en/best-practices/containers
- /rancher/v2.0-v2.4/en/best-practices/v2.0-v2.4/containers
---
Running well-built containers can greatly impact the overall performance and security of your environment.
Below are a few tips for setting up your containers.
For a more detailed discussion of security for containers, you can also refer to Rancher's [Guide to Container Security.](https://rancher.com/complete-guide-container-security)
### Use a Common Container OS
When possible, you should try to standardize on a common container base OS.
Smaller distributions such as Alpine and BusyBox reduce container image size and generally have a smaller attack/vulnerability surface.
Popular distributions such as Ubuntu, Fedora, and CentOS are more field-tested and offer more functionality.
### Start with a FROM scratch container
If your microservice is a standalone static binary, you should use a FROM scratch container.
The FROM scratch container is an [official Docker image](https://hub.docker.com/_/scratch) that is empty so that you can use it to design minimal images.
This will have the smallest attack surface and smallest image size.
### Run Container Processes as Unprivileged
When possible, use a non-privileged user when running processes within your container. While container runtimes provide isolation, vulnerabilities and attacks are still possible. Inadvertent or accidental host mounts can also be impacted if the container is running as root. For details on configuring a security context for a pod or container, refer to the [Kubernetes docs](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).
### Define Resource Limits
Apply CPU and memory limits to your pods. This can help manage the resources on your worker nodes and avoid a malfunctioning microservice from impacting other microservices.
In standard Kubernetes, you can set resource limits on the namespace level. In Rancher, you can set resource limits on the project level and they will propagate to all the namespaces within the project. For details, refer to the Rancher docs.
When setting resource quotas, if you set anything related to CPU or Memory (i.e. limits or reservations) on a project or namespace, all containers will require a respective CPU or Memory field set during creation. To avoid setting these limits on each and every container during workload creation, a default container resource limit can be specified on the namespace.
The Kubernetes docs have more information on how resource limits can be set at the [container level](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) and the namespace level.
### Define Resource Requirements
You should apply CPU and memory requirements to your pods. This is crucial for informing the scheduler which type of compute node your pod needs to be placed on, and ensuring it does not over-provision that node. In Kubernetes, you can set a resource requirement by defining `resources.requests` in the resource requests field in a pod's container spec. For details, refer to the [Kubernetes docs](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container).
> **Note:** If you set a resource limit for the namespace that the pod is deployed in, and the container doesn't have a specific resource request, the pod will not be allowed to start. To avoid setting these fields on each and every container during workload creation, a default container resource limit can be specified on the namespace.
It is recommended to define resource requirements on the container level because otherwise, the scheduler makes assumptions that will likely not be helpful to your application when the cluster experiences load.
### Liveness and Readiness Probes
Set up liveness and readiness probes for your container. Unless your container completely crashes, Kubernetes will not know it's unhealthy unless you create an endpoint or mechanism that can report container status. Alternatively, make sure your container halts and crashes if unhealthy.
The Kubernetes docs show how to [configure liveness and readiness probes for containers.](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/)
@@ -0,0 +1,48 @@
---
title: Rancher Deployment Strategies
weight: 100
aliases:
- /rancher/v2.0-v2.4/en/best-practices/deployment-strategies
- /rancher/v2.0-v2.4/en/best-practices/v2.0-v2.4/deployment-strategies
---
There are two recommended deployment strategies. Each one has its own pros and cons. Read more about which one would fit best for your use case:
* [Hub and Spoke](#hub-and-spoke-strategy)
* [Regional](#regional-strategy)
# Hub and Spoke Strategy
---
In this deployment scenario, there is a single Rancher control plane managing Kubernetes clusters across the globe. The control plane would be run on a high-availability Kubernetes cluster, and there would be impact due to latencies.
{{< img "/img/rancher/bpg/hub-and-spoke.png" "Hub and Spoke Deployment">}}
### Pros
* Environments could have nodes and network connectivity across regions.
* Single control plane interface to view/see all regions and environments.
* Kubernetes does not require Rancher to operate and can tolerate losing connectivity to the Rancher control plane.
### Cons
* Subject to network latencies.
* If the control plane goes out, global provisioning of new services is unavailable until it is restored. However, each Kubernetes cluster can continue to be managed individually.
# Regional Strategy
---
In the regional deployment model a control plane is deployed in close proximity to the compute nodes.
{{< img "/img/rancher/bpg/regional.png" "Regional Deployment">}}
### Pros
* Rancher functionality in regions stay operational if a control plane in another region goes down.
* Network latency is greatly reduced, improving the performance of functionality in Rancher.
* Upgrades of the Rancher control plane can be done independently per region.
### Cons
* Overhead of managing multiple Rancher installations.
* Visibility across global Kubernetes clusters requires multiple interfaces/panes of glass.
* Deploying multi-cluster apps in Rancher requires repeating the process for each Rancher server.
@@ -0,0 +1,41 @@
---
title: Tips for Running Rancher
weight: 100
aliases:
- /rancher/v2.0-v2.4/en/best-practices/deployment-types
- /rancher/v2.0-v2.4/en/best-practices/v2.0-v2.4/deployment-types
---
A high-availability Kubernetes installation, defined as an installation of Rancher on a Kubernetes cluster with at least three nodes, should be used in any production installation of Rancher, as well as any installation deemed "important." Multiple Rancher instances running on multiple nodes ensure high availability that cannot be accomplished with a single node environment.
When you set up your high-availability Rancher installation, consider the following:
### Run Rancher on a Separate Cluster
Don't run other workloads or microservices in the Kubernetes cluster that Rancher is installed on.
### Don't Run Rancher on a Hosted Kubernetes Environment
When the Rancher server is installed on a Kubernetes cluster, it should not be run in a hosted Kubernetes environment such as Google's GKE, Amazon's EKS, or Microsoft's AKS. These hosted Kubernetes solutions do not expose etcd to a degree that is manageable for Rancher, and their customizations can interfere with Rancher operations.
It is strongly recommended to use hosted infrastructure such as Amazon's EC2 or Google's GCE instead. When you create a cluster using RKE on an infrastructure provider, you can configure the cluster to create etcd snapshots as a backup. You can then [use RKE]({{<baseurl>}}/rke/latest/en/etcd-snapshots/) or [Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/restorations/) to restore your cluster from one of these snapshots. In a hosted Kubernetes environment, this backup and restore functionality is not supported.
### Make sure nodes are configured correctly for Kubernetes ###
It's important to follow K8s and etcd best practices when deploying your nodes, including disabling swap, double checking you have full network connectivity between all machines in the cluster, using unique hostnames, MAC addresses, and product_uuids for every node, checking that all correct ports are opened, and deploying with ssd backed etcd. More details can be found in the [kubernetes docs](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin) and [etcd's performance op guide](https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/performance.md)
### When using RKE: Backup the Statefile
RKE keeps record of the cluster state in a file called `cluster.rkestate`. This file is important for the recovery of a cluster and/or the continued maintenance of the cluster through RKE. Because this file contains certificate material, we strongly recommend encrypting this file before backing up. After each run of `rke up` you should backup the state file.
### Run All Nodes in the Cluster in the Same Datacenter
For best performance, run all three of your nodes in the same geographic datacenter. If you are running nodes in the cloud, such as AWS, run each node in a separate Availability Zone. For example, launch node 1 in us-west-2a, node 2 in us-west-2b, and node 3 in us-west-2c.
### Development and Production Environments Should be Similar
It's strongly recommended to have a "staging" or "pre-production" environment of the Kubernetes cluster that Rancher runs on. This environment should mirror your production environment as closely as possible in terms of software and hardware configuration.
### Monitor Your Clusters to Plan Capacity
The Rancher server's Kubernetes cluster should run within the [system and hardware requirements]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/requirements/) as closely as possible. The more you deviate from the system and hardware requirements, the more risk you take.
However, metrics-driven capacity planning analysis should be the ultimate guidance for scaling Rancher, because the published requirements take into account a variety of workload types.
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with Prometheus, a leading open-source monitoring solution, and Grafana, which lets you visualize the metrics from Prometheus.
After you [enable monitoring]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) in the cluster, you can set up [a notification channel]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers/) and [cluster alerts]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/alerts/) to let you know if your cluster is approaching its capacity. You can also use the Prometheus and Grafana monitoring framework to establish a baseline for key metrics as you scale.
@@ -0,0 +1,143 @@
---
title: Tips for Scaling, Security and Reliability
weight: 101
aliases:
- /rancher/v2.0-v2.4/en/best-practices/management
- /rancher/v2.0-v2.4/en/best-practices/v2.0-v2.4/management
---
Rancher allows you to set up numerous combinations of configurations. Some configurations are more appropriate for development and testing, while there are other best practices for production environments for maximum availability and fault tolerance. The following best practices should be followed for production.
- [Tips for Preventing and Handling Problems](#tips-for-preventing-and-handling-problems)
- [Network Topology](#network-topology)
- [Tips for Scaling and Reliability](#tips-for-scaling-and-reliability)
- [Tips for Security](#tips-for-security)
- [Tips for Multi-Tenant Clusters](#tips-for-multi-tenant-clusters)
- [Class of Service and Kubernetes Clusters](#class-of-service-and-kubernetes-clusters)
- [Network Security](#network-security)
# Tips for Preventing and Handling Problems
These tips can help you solve problems before they happen.
### Run Rancher on a Supported OS and Supported Docker Version
Rancher is container-based and can potentially run on any Linux-based operating system. However, only operating systems listed in the [requirements documentation]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/requirements/) should be used for running Rancher, along with a supported version of Docker. These versions have been most thoroughly tested and can be properly supported by the Rancher Support team.
### Upgrade Your Kubernetes Version
Keep your Kubernetes cluster up to date with a recent and supported version. Typically the Kubernetes community will support the current version and previous three minor releases (for example, 1.14.x, 1.13.x, 1.12.x, and 1.11.x). After a new version is released, the third-oldest supported version reaches EOL (End of Life) status. Running on an EOL release can be a risk if a security issues are found and patches are not available. The community typically makes minor releases every quarter (every three months).
Ranchers SLAs are not community dependent, but as Kubernetes is a community-driven software, the quality of experience will degrade as you get farther away from the community's supported target.
### Kill Pods Randomly During Testing
Run chaoskube or a similar mechanism to randomly kill pods in your test environment. This will test the resiliency of your infrastructure and the ability of Kubernetes to self-heal. It's not recommended to run this in your production environment.
### Deploy Complicated Clusters with Terraform
Rancher's "Add Cluster" UI is preferable for getting started with Kubernetes cluster orchestration or for simple use cases. However, for more complex or demanding use cases, it is recommended to use a CLI/API driven approach. [Terraform](https://www.terraform.io/) is recommended as the tooling to implement this. When you use Terraform with version control and a CI/CD environment, you can have high assurances of consistency and reliability when deploying Kubernetes clusters. This approach also gives you the most customization options.
Rancher [maintains a Terraform provider](https://rancher.com/blog/2019/rancher-2-terraform-provider/) for working with Rancher 2.0 Kubernetes. It is called the [Rancher2 Provider.](https://www.terraform.io/docs/providers/rancher2/index.html)
### Upgrade Rancher in a Staging Environment
All upgrades, both patch and feature upgrades, should be first tested on a staging environment before production is upgraded. The more closely the staging environment mirrors production, the higher chance your production upgrade will be successful.
### Renew Certificates Before they Expire
Multiple people in your organization should set up calendar reminders for certificate renewal. Consider renewing the certificate two weeks to one month in advance. If you have multiple certificates to track, consider using [monitoring and alerting mechanisms]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/) to track certificate expiration.
Rancher-provisioned Kubernetes clusters will use certificates that expire in one year. Clusters provisioned by other means may have a longer or shorter expiration.
Certificates can be renewed for Rancher-provisioned clusters [through the Rancher user interface]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/certificate-rotation/).
### Enable Recurring Snapshots for Backing up and Restoring the Cluster
Make sure etcd recurring snapshots are enabled. Extend the snapshot retention to a period of time that meets your business needs. In the event of a catastrophic failure or deletion of data, this may be your only recourse for recovery. For details about configuring snapshots, refer to the [RKE documentation]({{<baseurl>}}/rke/latest/en/etcd-snapshots/) or the [Rancher documentation on backups]({{<baseurl>}}/rancher/v2.0-v2.4/en/backups/).
### Provision Clusters with Rancher
When possible, use Rancher to provision your Kubernetes cluster rather than importing a cluster. This will ensure the best compatibility and supportability.
### Use Stable and Supported Rancher Versions for Production
Do not upgrade production environments to alpha, beta, release candidate (rc), or "latest" versions. These early releases are often not stable and may not have a future upgrade path.
When installing or upgrading a non-production environment to an early release, anticipate problems such as features not working, data loss, outages, and inability to upgrade without a reinstall.
Make sure the feature version you are upgrading to is considered "stable" as determined by Rancher. Use the beta, release candidate, and "latest" versions in a testing, development, or demo environment to try out new features. Feature version upgrades, for example 2.1.x to 2.2.x, should be considered as and when they are released. Some bug fixes and most features are not back ported into older versions.
Keep in mind that Rancher does End of Life support for old versions, so you will eventually want to upgrade if you want to continue to receive patches.
For more detail on what happens during the Rancher product lifecycle, refer to the [Support Maintenance Terms](https://rancher.com/support-maintenance-terms/).
# Network Topology
These tips can help Rancher work more smoothly with your network.
### Use Low-latency Networks for Communication Within Clusters
Kubernetes clusters are best served by low-latency networks. This is especially true for the control plane components and etcd, where lots of coordination and leader election traffic occurs. Networking between Rancher server and the Kubernetes clusters it manages are more tolerant of latency.
### Allow Rancher to Communicate Directly with Clusters
Limit the use of proxies or load balancers between Rancher server and Kubernetes clusters. As Rancher is maintaining a long-lived web sockets connection, these intermediaries can interfere with the connection lifecycle as they often weren't configured with this use case in mind.
# Tips for Scaling and Reliability
These tips can help you scale your cluster more easily.
### Use One Kubernetes Role Per Host
Separate the etcd, control plane, and worker roles onto different hosts. Don't assign multiple roles to the same host, such as a worker and control plane. This will give you maximum scalability.
### Run the Control Plane and etcd on Virtual Machines
Run your etcd and control plane nodes on virtual machines where you can scale vCPU and memory easily if needed in the future.
### Use at Least Three etcd Nodes
Provision 3 or 5 etcd nodes. Etcd requires a quorum to determine a leader by the majority of nodes, therefore it is not recommended to have clusters of even numbers. Three etcd nodes is generally sufficient for smaller clusters and five etcd nodes for large clusters.
### Use at Least Two Control Plane Nodes
Provision two or more control plane nodes. Some control plane components, such as the `kube-apiserver`, run in [active-active](https://www.jscape.com/blog/active-active-vs-active-passive-high-availability-cluster) mode and will give you more scalability. Other components such as kube-scheduler and kube-controller run in active-passive mode (leader elect) and give you more fault tolerance.
### Monitor Your Cluster
Closely monitor and scale your nodes as needed. You should [enable cluster monitoring]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and use the Prometheus metrics and Grafana visualization options as a starting point.
# Tips for Security
Below are some basic tips for increasing security in Rancher. For more detailed information about securing your cluster, you can refer to these resources:
- Rancher's [security documentation and Kubernetes cluster hardening guide]({{<baseurl>}}/rancher/v2.0-v2.4/en/security/)
- [101 More Security Best Practices for Kubernetes](https://rancher.com/blog/2019/2019-01-17-101-more-kubernetes-security-best-practices/)
### Update Rancher with Security Patches
Keep your Rancher installation up to date with the latest patches. Patch updates have important software fixes and sometimes have security fixes. When patches with security fixes are released, customers with Rancher licenses are notified by e-mail. These updates are also posted on Rancher's [forum](https://forums.rancher.com/).
### Report Security Issues Directly to Rancher
If you believe you have uncovered a security-related problem in Rancher, please communicate this immediately and discretely to the Rancher team (security@rancher.com). Posting security issues on public forums such as Twitter, Rancher Slack, GitHub, etc. can potentially compromise security for all Rancher customers. Reporting security issues discretely allows Rancher to assess and mitigate the problem. Security patches are typically given high priority and released as quickly as possible.
### Only Upgrade One Component at a Time
In addition to Rancher software updates, closely monitor security fixes for related software, such as Docker, Linux, and any libraries used by your workloads. For production environments, try to avoid upgrading too many entities during a single maintenance window. Upgrading multiple components can make it difficult to root cause an issue in the event of a failure. As business requirements allow, upgrade one component at a time.
# Tips for Multi-Tenant Clusters
### Namespaces
Each tenant should have their own unique namespaces within the cluster. This avoids naming conflicts and allows resources to be only visible to their owner through use of RBAC policy
### Project Isolation
Use Rancher's Project Isolation to automatically generate Network Policy between Projects (sets of Namespaces). This further protects workloads from interference
### Resource Limits
Enforce use of sane resource limit definitions for every deployment in your cluster. This not only protects the owners of the deployment, but the neighboring resources from other tenants as well. Remember, namespaces do not isolate at the node level, so over-consumption of resources on a node affects other namespace deployments. Admission controllers can be written to require resource limit definitions
### Resource Requirements
Enforce use of resource requirement definitions for each deployment in your cluster. This enables the scheduler to appropriately schedule workloads. Otherwise you will eventually end up with over-provisioned nodes.
# Class of Service and Kubernetes Clusters
A class of service describes the expectations around cluster uptime, durability, and duration of maintenance windows. Typically organizations group these characteristics into labels such as "dev" or "prod"
### Consider fault domains
Kubernetes clusters can span multiple classes of service, however it is important to consider the ability for one workload to affect another. Without proper deployment practices such as resource limits, requirements, etc, a deployment that is not behaving well has the potential to impact the health of the cluster. In a "dev" environment it is common for end-users to exercise less caution with deployments, thus increasing the chance of such behavior. Sharing this behavior with your production workload increases risk.
### Upgrade risks
Upgrades of Kubernetes are not without risk, the best way to predict the outcome of an upgrade is try it on a cluster of similar load and use case as your production cluster. This is where having non-prod class of service clusters can be advantageous.
### Resource Efficiency
Clusters can be built with varying degrees of redundancy. In a class of service with low expectations for uptime, resources and cost can be conserved by building clusters without redundant Kubernetes control components. This approach may also free up more budget/resources to increase the redundancy at the production level
# Network Security
In general, you can use network security best practices in your Rancher and Kubernetes clusters. Consider the following:
### Use a Firewall Between your Hosts and the Internet
Firewalls should be used between your hosts and the Internet (or corporate Intranet). This could be enterprise firewall appliances in a datacenter or SDN constructs in the cloud, such as VPCs, security groups, ingress, and egress rules. Try to limit inbound access only to ports and IP addresses that require it. Outbound access can be shut off (air gap) if environment sensitive information that requires this restriction. If available, use firewalls with intrusion detection and DDoS prevention.
### Run Periodic Security Scans
Run security and penetration scans on your environment periodically. Even with well design infrastructure, a poorly designed microservice could compromise the entire environment.
@@ -0,0 +1,82 @@
---
title: Using the Rancher Command Line Interface
description: The Rancher CLI is a unified tool that you can use to interact with Rancher. With it, you can operate Rancher using a command line interface rather than the GUI
metaTitle: "Using the Rancher Command Line Interface "
metaDescription: "The Rancher CLI is a unified tool that you can use to interact with Rancher. With it, you can operate Rancher using a command line interface rather than the GUI"
weight: 21
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/cluster-access/cli
---
The Rancher CLI (Command Line Interface) is a unified tool that you can use to interact with Rancher. With this tool, you can operate Rancher using a command line rather than the GUI.
### Download Rancher CLI
The binary can be downloaded directly from the UI. The link can be found in the right hand side of the footer in the UI. We have binaries for Windows, Mac, and Linux. You can also check the [releases page for our CLI](https://github.com/rancher/cli/releases) for direct downloads of the binary.
### Requirements
After you download the Rancher CLI, you need to make a few configurations. Rancher CLI requires:
- Your [Rancher Server URL]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/server-url), which is used to connect to Rancher Server.
- An API Bearer Token, which is used to authenticate with Rancher. For more information about obtaining a Bearer Token, see [Creating an API Key]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/api-keys/).
### CLI Authentication
Before you can use Rancher CLI to control your Rancher Server, you must authenticate using an API Bearer Token. Log in using the following command (replace `<BEARER_TOKEN>` and `<SERVER_URL>` with your information):
```bash
$ ./rancher login https://<SERVER_URL> --token <BEARER_TOKEN>
```
If Rancher Server uses a self-signed certificate, Rancher CLI prompts you to continue with the connection.
### Project Selection
Before you can perform any commands, you must select a Rancher project to perform those commands against. To select a [project]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/) to work on, use the command `./rancher context switch`. When you enter this command, a list of available projects displays. Enter a number to choose your project.
**Example: `./rancher context switch` Output**
```
User:rancher-cli-directory user$ ./rancher context switch
NUMBER CLUSTER NAME PROJECT ID PROJECT NAME
1 cluster-2 c-7q96s:p-h4tmb project-2
2 cluster-2 c-7q96s:project-j6z6d Default
3 cluster-1 c-lchzv:p-xbpdt project-1
4 cluster-1 c-lchzv:project-s2mch Default
Select a Project:
```
After you enter a number, the console displays a message that you've changed projects.
```
INFO[0005] Setting new context to project project-1
INFO[0005] Saving config to /Users/markbishop/.rancher/cli2.json
```
### Commands
The following commands are available for use in Rancher CLI.
| Command | Result |
|---|---|
| `apps, [app]` | Performs operations on catalog applications (i.e. individual [Helm charts](https://docs.helm.sh/developing_charts/) or Rancher charts. |
| `catalog` | Performs operations on [catalogs]({{<baseurl>}}/rancher/v2.0-v2.4/en/catalog/). |
| `clusters, [cluster]` | Performs operations on your [clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/). |
| `context` | Switches between Rancher [projects]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/). For an example, see [Project Selection](#project-selection). |
| `inspect [OPTIONS] [RESOURCEID RESOURCENAME]` | Displays details about [Kubernetes resources](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#resource-types) or Rancher resources (i.e.: [projects]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/) and [workloads]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/workloads/)). Specify resources by name or ID. |
| `kubectl` |Runs [kubectl commands](https://kubernetes.io/docs/reference/kubectl/overview/#operations). |
| `login, [l]` | Logs into a Rancher Server. For an example, see [CLI Authentication](#cli-authentication). |
| `namespaces, [namespace]` |Performs operations on namespaces. |
| `nodes, [node]` |Performs operations on nodes. |
| `projects, [project]` | Performs operations on [projects]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/). |
| `ps` | Displays [workloads]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/workloads) in a project. |
| `settings, [setting]` | Shows the current settings for your Rancher Server. |
| `ssh` | Connects to one of your cluster nodes using the SSH protocol. |
| `help, [h]` | Shows a list of commands or help for one command. |
### Rancher CLI Help
Once logged into Rancher Server using the CLI, enter `./rancher --help` for a list of commands.
All commands accept the `--help` flag, which documents each command's usage.
@@ -0,0 +1,39 @@
---
title: Cluster Administration
weight: 8
---
After you provision a cluster in Rancher, you can begin using powerful Kubernetes features to deploy and scale your containerized applications in development, testing, or production environments.
This page covers the following topics:
- [Switching between clusters](#switching-between-clusters)
- [Managing clusters in Rancher](#managing-clusters-in-rancher)
- [Configuring tools](#configuring-tools)
> This section assumes a basic familiarity with Docker and Kubernetes. For a brief explanation of how Kubernetes components work together, refer to the [concepts]({{<baseurl>}}/rancher/v2.0-v2.4/en/overview/concepts) page.
## Switching between Clusters
To switch between clusters, use the drop-down available in the navigation bar.
Alternatively, you can switch between projects and clusters directly in the navigation bar. Open the **Global** view and select **Clusters** from the main menu. Then select the name of the cluster you want to open.
## Managing Clusters in Rancher
After clusters have been [provisioned into Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/), [cluster owners]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles) will need to manage these clusters. There are many different options of how to manage your cluster.
{{% include file="/rancher/v2.0-v2.4/en/cluster-provisioning/cluster-capabilities-table" %}}
## Configuring Tools
Rancher contains a variety of tools that aren't included in Kubernetes to assist in your DevOps operations. Rancher can integrate with external services to help your clusters run more efficiently. Tools are divided into following categories:
- Alerts
- Notifiers
- Logging
- Monitoring
- Istio Service Mesh
- OPA Gatekeeper
For more information, see [Tools]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/)
@@ -0,0 +1,220 @@
---
title: Backing up a Cluster
weight: 2045
---
_Available as of v2.2.0_
In the Rancher UI, etcd backup and recovery for [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) can be easily performed.
Rancher recommends configuring recurrent `etcd` snapshots for all production clusters. Additionally, one-time snapshots can easily be taken as well.
Snapshots of the etcd database are taken and saved either [locally onto the etcd nodes](#local-backup-target) or to a [S3 compatible target](#s3-backup-target). The advantages of configuring S3 is that if all etcd nodes are lost, your snapshot is saved remotely and can be used to restore the cluster.
This section covers the following topics:
- [How snapshots work](#how-snapshots-work)
- [Configuring recurring snapshots](#configuring-recurring-snapshots)
- [One-time snapshots](#one-time-snapshots)
- [Snapshot backup targets](#snapshot-backup-targets)
- [Local backup target](#local-backup-target)
- [S3 backup target](#s3-backup-target)
- [Using a custom CA certificate for S3](#using-a-custom-ca-certificate-for-s3)
- [IAM Support for storing snapshots in S3](#iam-support-for-storing-snapshots-in-s3)
- [Viewing available snapshots](#viewing-available-snapshots)
- [Safe timestamps](#safe-timestamps)
- [Enabling snapshot features for clusters created before Rancher v2.2.0](#enabling-snapshot-features-for-clusters-created-before-rancher-v2-2-0)
# How Snapshots Work
{{% tabs %}}
{{% tab "Rancher v2.4.0+" %}}
### Snapshot Components
When Rancher creates a snapshot, it includes three components:
- The cluster data in etcd
- The Kubernetes version
- The cluster configuration in the form of the `cluster.yml`
Because the Kubernetes version is now included in the snapshot, it is possible to restore a cluster to a prior Kubernetes version.
The multiple components of the snapshot allow you to select from the following options if you need to restore a cluster from a snapshot:
- **Restore just the etcd contents:** This restore is similar to restoring to snapshots in Rancher before v2.4.0.
- **Restore etcd and Kubernetes version:** This option should be used if a Kubernetes upgrade is the reason that your cluster is failing, and you haven't made any cluster configuration changes.
- **Restore etcd, Kubernetes versions and cluster configuration:** This option should be used if you changed both the Kubernetes version and cluster configuration when upgrading.
It's always recommended to take a new snapshot before any upgrades.
### Generating the Snapshot from etcd Nodes
For each etcd node in the cluster, the etcd cluster health is checked. If the node reports that the etcd cluster is healthy, a snapshot is created from it and optionally uploaded to S3.
The snapshot is stored in `/opt/rke/etcd-snapshots`. If the directory is configured on the nodes as a shared mount, it will be overwritten. On S3, the snapshot will always be from the last node that uploads it, as all etcd nodes upload it and the last will remain.
In the case when multiple etcd nodes exist, any created snapshot is created after the cluster has been health checked, so it can be considered a valid snapshot of the data in the etcd cluster.
### Snapshot Naming Conventions
The name of the snapshot is auto-generated. The `--name` option can be used to override the name of the snapshot when creating one-time snapshots with the RKE CLI.
When Rancher creates a snapshot of an RKE cluster, the snapshot name is based on the type (whether the snapshot is manual or recurring) and the target (whether the snapshot is saved locally or uploaded to S3). The naming convention is as follows:
- `m` stands for manual
- `r` stands for recurring
- `l` stands for local
- `s` stands for S3
Some example snapshot names are:
- c-9dmxz-rl-8b2cx
- c-9dmxz-ml-kr56m
- c-9dmxz-ms-t6bjb
- c-9dmxz-rs-8gxc8
### How Restoring from a Snapshot Works
On restore, the following process is used:
1. The snapshot is retrieved from S3, if S3 is configured.
2. The snapshot is unzipped (if zipped).
3. One of the etcd nodes in the cluster serves that snapshot file to the other nodes.
4. The other etcd nodes download the snapshot and validate the checksum so that they all use the same snapshot for the restore.
5. The cluster is restored and post-restore actions will be done in the cluster.
{{% /tab %}}
{{% tab "Rancher before v2.4.0" %}}
When Rancher creates a snapshot, only the etcd data is included in the snapshot.
Because the Kubernetes version is not included in the snapshot, there is no option to restore a cluster to a different Kubernetes version.
It's always recommended to take a new snapshot before any upgrades.
### Generating the Snapshot from etcd Nodes
For each etcd node in the cluster, the etcd cluster health is checked. If the node reports that the etcd cluster is healthy, a snapshot is created from it and optionally uploaded to S3.
The snapshot is stored in `/opt/rke/etcd-snapshots`. If the directory is configured on the nodes as a shared mount, it will be overwritten. On S3, the snapshot will always be from the last node that uploads it, as all etcd nodes upload it and the last will remain.
In the case when multiple etcd nodes exist, any created snapshot is created after the cluster has been health checked, so it can be considered a valid snapshot of the data in the etcd cluster.
### Snapshot Naming Conventions
The name of the snapshot is auto-generated. The `--name` option can be used to override the name of the snapshot when creating one-time snapshots with the RKE CLI.
When Rancher creates a snapshot of an RKE cluster, the snapshot name is based on the type (whether the snapshot is manual or recurring) and the target (whether the snapshot is saved locally or uploaded to S3). The naming convention is as follows:
- `m` stands for manual
- `r` stands for recurring
- `l` stands for local
- `s` stands for S3
Some example snapshot names are:
- c-9dmxz-rl-8b2cx
- c-9dmxz-ml-kr56m
- c-9dmxz-ms-t6bjb
- c-9dmxz-rs-8gxc8
### How Restoring from a Snapshot Works
On restore, the following process is used:
1. The snapshot is retrieved from S3, if S3 is configured.
2. The snapshot is unzipped (if zipped).
3. One of the etcd nodes in the cluster serves that snapshot file to the other nodes.
4. The other etcd nodes download the snapshot and validate the checksum so that they all use the same snapshot for the restore.
5. The cluster is restored and post-restore actions will be done in the cluster.
{{% /tab %}}
{{% /tabs %}}
# Configuring Recurring Snapshots
Select how often you want recurring snapshots to be taken as well as how many snapshots to keep. The amount of time is measured in hours. With timestamped snapshots, the user has the ability to do a point-in-time recovery.
By default, [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) are configured to take recurring snapshots (saved to local disk). To protect against local disk failure, using the [S3 Target](#s3-backup-target) or replicating the path on disk is advised.
During cluster provisioning or editing the cluster, the configuration for snapshots can be found in the advanced section for **Cluster Options**. Click on **Show advanced options**.
In the **Advanced Cluster Options** section, there are several options available to configure:
| Option | Description | Default Value|
| --- | ---| --- |
| etcd Snapshot Backup Target | Select where you want the snapshots to be saved. Options are either local or in S3 | local|
|Recurring etcd Snapshot Enabled| Enable/Disable recurring snapshots | Yes|
| Recurring etcd Snapshot Creation Period | Time in hours between recurring snapshots| 12 hours |
| Recurring etcd Snapshot Retention Count | Number of snapshots to retain| 6 |
# One-Time Snapshots
In addition to recurring snapshots, you may want to take a "one-time" snapshot. For example, before upgrading the Kubernetes version of a cluster it's best to backup the state of the cluster to protect against upgrade failure.
1. In the **Global** view, navigate to the cluster that you want to take a one-time snapshot.
2. Click the **&#8942; > Snapshot Now**.
**Result:** Based on your [snapshot backup target](#snapshot-backup-targets), a one-time snapshot will be taken and saved in the selected backup target.
# Snapshot Backup Targets
Rancher supports two different backup targets:
* [Local Target](#local-backup-target)
* [S3 Target](#s3-backup-target)
### Local Backup Target
By default, the `local` backup target is selected. The benefits of this option is that there is no external configuration. Snapshots are automatically saved locally to the etcd nodes in the [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) in `/opt/rke/etcd-snapshots`. All recurring snapshots are taken at configured intervals. The downside of using the `local` backup target is that if there is a total disaster and _all_ etcd nodes are lost, there is no ability to restore the cluster.
### S3 Backup Target
The `S3` backup target allows users to configure a S3 compatible backend to store the snapshots. The primary benefit of this option is that if the cluster loses all the etcd nodes, the cluster can still be restored as the snapshots are stored externally. Rancher recommends external targets like `S3` backup, however its configuration requirements do require additional effort that should be considered.
| Option | Description | Required|
|---|---|---|
|S3 Bucket Name| S3 bucket name where backups will be stored| *|
|S3 Region|S3 region for the backup bucket| |
|S3 Region Endpoint|S3 regions endpoint for the backup bucket|* |
|S3 Access Key|S3 access key with permission to access the backup bucket|*|
|S3 Secret Key|S3 secret key with permission to access the backup bucket|*|
| Custom CA Certificate | A custom certificate used to access private S3 backends _Available as of v2.2.5_ ||
### Using a custom CA certificate for S3
_Available as of v2.2.5_
The backup snapshot can be stored on a custom `S3` backup like [minio](https://min.io/). If the S3 back end uses a self-signed or custom certificate, provide a custom certificate using the `Custom CA Certificate` option to connect to the S3 backend.
### IAM Support for Storing Snapshots in S3
The `S3` backup target supports using IAM authentication to AWS API in addition to using API credentials. An IAM role gives temporary permissions that an application can use when making API calls to S3 storage. To use IAM authentication, the following requirements must be met:
- The cluster etcd nodes must have an instance role that has read/write access to the designated backup bucket.
- The cluster etcd nodes must have network access to the specified S3 endpoint.
- The Rancher Server worker node(s) must have an instance role that has read/write to the designated backup bucket.
- The Rancher Server worker node(s) must have network access to the specified S3 endpoint.
To give an application access to S3, refer to the AWS documentation on [Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances.](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html)
# Viewing Available Snapshots
The list of all available snapshots for the cluster is available in the Rancher UI.
1. In the **Global** view, navigate to the cluster that you want to view snapshots.
2. Click **Tools > Snapshots** from the navigation bar to view the list of saved snapshots. These snapshots include a timestamp of when they were created.
# Safe Timestamps
_Available as of v2.3.0_
As of v2.2.6, snapshot files are timestamped to simplify processing the files using external tools and scripts, but in some S3 compatible backends, these timestamps were unusable. As of Rancher v2.3.0, the option `safe_timestamp` is added to support compatible file names. When this flag is set to `true`, all special characters in the snapshot filename timestamp are replaced.
This option is not available directly in the UI, and is only available through the `Edit as Yaml` interface.
# Enabling Snapshot Features for Clusters Created Before Rancher v2.2.0
If you have any Rancher launched Kubernetes clusters that were created before v2.2.0, after upgrading Rancher, you must [edit the cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/editing-clusters/) and _save_ it, in order to enable the updated snapshot features. Even if you were already creating snapshots before v2.2.0, you must do this step as the older snapshots will not be available to use to [back up and restore etcd through the UI]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/restoring-etcd/).
@@ -0,0 +1,87 @@
---
title: Certificate Rotation
weight: 2040
---
> **Warning:** Rotating Kubernetes certificates may result in your cluster being temporarily unavailable as components are restarted. For production environments, it's recommended to perform this action during a maintenance window.
By default, Kubernetes clusters require certificates and Rancher launched Kubernetes clusters automatically generate certificates for the Kubernetes components. Rotating these certificates is important before the certificates expire as well as if a certificate is compromised. After the certificates are rotated, the Kubernetes components are automatically restarted.
Certificates can be rotated for the following services:
- etcd
- kubelet
- kube-apiserver
- kube-proxy
- kube-scheduler
- kube-controller-manager
### Certificate Rotation in Rancher v2.2.x
_Available as of v2.2.0_
Rancher launched Kubernetes clusters have the ability to rotate the auto-generated certificates through the UI.
1. In the **Global** view, navigate to the cluster that you want to rotate certificates.
2. Select the **&#8942; > Rotate Certificates**.
3. Select which certificates that you want to rotate.
* Rotate all Service certificates (keep the same CA)
* Rotate an individual service and choose one of the services from the drop down menu
4. Click **Save**.
**Results:** The selected certificates will be rotated and the related services will be restarted to start using the new certificate.
> **Note:** Even though the RKE CLI can use custom certificates for the Kubernetes cluster components, Rancher currently doesn't allow the ability to upload these in Rancher Launched Kubernetes clusters.
### Certificate Rotation in Rancher v2.1.x and v2.0.x
_Available as of v2.0.14 and v2.1.9_
Rancher launched Kubernetes clusters have the ability to rotate the auto-generated certificates through the API.
1. In the **Global** view, navigate to the cluster that you want to rotate certificates.
2. Select the **&#8942; > View in API**.
3. Click on **RotateCertificates**.
4. Click on **Show Request**.
5. Click on **Send Request**.
**Results:** All Kubernetes certificates will be rotated.
### Rotating Expired Certificates After Upgrading Older Rancher Versions
If you are upgrading from Rancher v2.0.13 or earlier, or v2.1.8 or earlier, and your clusters have expired certificates, some manual steps are required to complete the certificate rotation.
1. For the `controlplane` and `etcd` nodes, log in to each corresponding host and check if the certificate `kube-apiserver-requestheader-ca.pem` is in the following directory:
```
cd /etc/kubernetes/.tmp
```
If the certificate is not in the directory, perform the following commands:
```
cp kube-ca.pem kube-apiserver-requestheader-ca.pem
cp kube-ca-key.pem kube-apiserver-requestheader-ca-key.pem
cp kube-apiserver.pem kube-apiserver-proxy-client.pem
cp kube-apiserver-key.pem kube-apiserver-proxy-client-key.pem
```
If the `.tmp` directory does not exist, you can copy the entire SSL certificate to `.tmp`:
```
cp -r /etc/kubernetes/ssl /etc/kubernetes/.tmp
```
1. Rotate the certificates. For Rancher v2.0.x and v2.1.x, use the [Rancher API.](#certificate-rotation-in-rancher-v2-1-x-and-v2-0-x) For Rancher 2.2.x, [use the UI.](#certificate-rotation-in-rancher-v2-2-x)
1. After the command is finished, check if the `worker` nodes are Active. If not, log in to each `worker` node and restart the kubelet and proxy.
@@ -0,0 +1,279 @@
---
title: Removing Kubernetes Components from Nodes
description: Learn about cluster cleanup when removing nodes from your Rancher-launched Kubernetes cluster. What is removed, how to do it manually
weight: 2055
---
This section describes how to disconnect a node from a Rancher-launched Kubernetes cluster and remove all of the Kubernetes components from the node. This process allows you to use the node for other purposes.
When you use Rancher to install Kubernetes on new nodes in an infrastructure provider, resources (containers/virtual network interfaces) and configuration items (certificates/configuration files) are created.
When removing nodes from your Rancher launched Kubernetes cluster (provided that they are in `Active` state), those resources are automatically cleaned, and the only action needed is to restart the node. When a node has become unreachable and the automatic cleanup process cannot be used, we describe the steps that need to be executed before the node can be added to a cluster again.
## What Gets Removed?
When cleaning nodes provisioned using Rancher, the following components are deleted based on the type of cluster node you're removing.
| Removed Component | [Nodes Hosted by Infrastructure Provider][1] | [Custom Nodes][2] | [Hosted Cluster][3] | [Imported Nodes][4] |
| ------------------------------------------------------------------------------ | --------------- | ----------------- | ------------------- | ------------------- |
| The Rancher deployment namespace (`cattle-system` by default) | ✓ | ✓ | ✓ | ✓ |
| `serviceAccount`, `clusterRoles`, and `clusterRoleBindings` labeled by Rancher | ✓ | ✓ | ✓ | ✓ |
| Labels, Annotations, and Finalizers | ✓ | ✓ | ✓ | ✓ |
| Rancher Deployment | ✓ | ✓ | ✓ | |
| Machines, clusters, projects, and user custom resource definitions (CRDs) | ✓ | ✓ | ✓ | |
| All resources create under the `management.cattle.io` API Group | ✓ | ✓ | ✓ | |
| All CRDs created by Rancher v2.x | ✓ | ✓ | ✓ | |
[1]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/
[2]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/custom-nodes/
[3]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/
[4]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/imported-clusters/
## Removing a Node from a Cluster by Rancher UI
When the node is in `Active` state, removing the node from a cluster will trigger a process to clean up the node. Please restart the node after the automatic cleanup process is done to make sure any non-persistent data is properly removed.
**To restart a node:**
```
# using reboot
$ sudo reboot
# using shutdown
$ sudo shutdown -r now
```
## Removing Rancher Components from a Cluster Manually
When a node is unreachable and removed from the cluster, the automatic cleaning process can't be triggered because the node is unreachable. Please follow the steps below to manually remove the Rancher components.
>**Warning:** The commands listed below will remove data from the node. Make sure you have created a backup of files you want to keep before executing any of the commands as data will be lost.
### Removing Rancher Components from Imported Clusters
For imported clusters, the process for removing Rancher is a little different. You have the option of simply deleting the cluster in the Rancher UI, or your can run a script that removes Rancher components from the nodes. Both options make the same deletions.
After the imported cluster is detached from Rancher, the cluster's workloads will be unaffected and you can access the cluster using the same methods that you did before the cluster was imported into Rancher.
{{% tabs %}}
{{% tab "By UI / API" %}}
>**Warning:** This process will remove data from your cluster. Make sure you have created a backup of files you want to keep before executing the command, as data will be lost.
After you initiate the removal of an imported cluster using the Rancher UI (or API), the following events occur.
1. Rancher creates a `serviceAccount` that it uses to remove the Rancher components from the cluster. This account is assigned the [clusterRole](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#role-and-clusterrole) and [clusterRoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding) permissions, which are required to remove the Rancher components.
1. Using the `serviceAccount`, Rancher schedules and runs a [job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) that cleans the Rancher components off of the cluster. This job also references the `serviceAccount` and its roles as dependencies, so the job deletes them before its completion.
1. Rancher is removed from the cluster. However, the cluster persists, running the native version of Kubernetes.
**Result:** All components listed for imported clusters in [What Gets Removed?](#what-gets-removed) are deleted.
{{% /tab %}}
{{% tab "By Script" %}}
Rather than cleaning imported cluster nodes using the Rancher UI, you can run a script instead. This functionality is available since `v2.1.0`.
>**Prerequisite:**
>
>Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
1. Open a web browser, navigate to [GitHub](https://github.com/rancher/rancher/blob/master/cleanup/user-cluster.sh), and download `user-cluster.sh`.
1. Make the script executable by running the following command from the same directory as `user-cluster.sh`:
```
chmod +x user-cluster.sh
```
1. **Air Gap Environments Only:** Open `user-cluster.sh` and replace `yaml_url` with the URL in `user-cluster.yml`.
If you don't have an air gap environment, skip this step.
1. From the same directory, run the script and provide the `rancher/rancher-agent` image version which should be equal to the version of Rancher used to manage the cluster. (`<RANCHER_VERSION>`):
>**Tip:**
>
>Add the `-dry-run` flag to preview the script's outcome without making changes.
```
./user-cluster.sh rancher/rancher-agent:<RANCHER_VERSION>
```
**Result:** The script runs. All components listed for imported clusters in [What Gets Removed?](#what-gets-removed) are deleted.
{{% /tab %}}
{{% /tabs %}}
### Windows Nodes
To clean up a Windows node, you can run a cleanup script located in `c:\etc\rancher`. The script deletes Kubernetes generated resources and the execution binary. It also drops the firewall rules and network settings.
To run the script, you can use this command in the PowerShell:
```
pushd c:\etc\rancher
.\cleanup.ps1
popd
```
**Result:** The node is reset and can be re-added to a Kubernetes cluster.
### Docker Containers, Images, and Volumes
Based on what role you assigned to the node, there are Kubernetes components in containers, containers belonging to overlay networking, DNS, ingress controller and Rancher agent. (and pods you created that have been scheduled to this node)
**To clean all Docker containers, images and volumes:**
```
docker rm -f $(docker ps -qa)
docker rmi -f $(docker images -q)
docker volume rm $(docker volume ls -q)
```
### Mounts
Kubernetes components and secrets leave behind mounts on the system that need to be unmounted.
Mounts |
--------|
`/var/lib/kubelet/pods/XXX` (miscellaneous mounts) |
`/var/lib/kubelet` |
`/var/lib/rancher` |
**To unmount all mounts:**
```
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done
```
### Directories and Files
The following directories are used when adding a node to a cluster, and should be removed. You can remove a directory using `rm -rf /directory_name`.
>**Note:** Depending on the role you assigned to the node, some of the directories will or won't be present on the node.
Directories |
--------|
`/etc/ceph` |
`/etc/cni` |
`/etc/kubernetes` |
`/opt/cni` |
`/opt/rke` |
`/run/secrets/kubernetes.io` |
`/run/calico` |
`/run/flannel` |
`/var/lib/calico` |
`/var/lib/etcd` |
`/var/lib/cni` |
`/var/lib/kubelet` |
`/var/lib/rancher/rke/log` |
`/var/log/containers` |
`/var/log/kube-audit` |
`/var/log/pods` |
`/var/run/calico` |
**To clean the directories:**
```
rm -rf /etc/ceph \
/etc/cni \
/etc/kubernetes \
/opt/cni \
/opt/rke \
/run/secrets/kubernetes.io \
/run/calico \
/run/flannel \
/var/lib/calico \
/var/lib/etcd \
/var/lib/cni \
/var/lib/kubelet \
/var/lib/rancher/rke/log \
/var/log/containers \
/var/log/kube-audit \
/var/log/pods \
/var/run/calico
```
### Network Interfaces and Iptables
The remaining two components that are changed/configured are (virtual) network interfaces and iptables rules. Both are non-persistent to the node, meaning that they will be cleared after a restart of the node. To remove these components, a restart is recommended.
**To restart a node:**
```
# using reboot
$ sudo reboot
# using shutdown
$ sudo shutdown -r now
```
If you want to know more on (virtual) network interfaces or iptables rules, please see the specific subjects below.
### Network Interfaces
>**Note:** Depending on the network provider configured for the cluster the node was part of, some of the interfaces will or won't be present on the node.
Interfaces |
--------|
`flannel.1` |
`cni0` |
`tunl0` |
`caliXXXXXXXXXXX` (random interface names) |
`vethXXXXXXXX` (random interface names) |
**To list all interfaces:**
```
# Using ip
ip address show
# Using ifconfig
ifconfig -a
```
**To remove an interface:**
```
ip link delete interface_name
```
### Iptables
>**Note:** Depending on the network provider configured for the cluster the node was part of, some of the chains will or won't be present on the node.
Iptables rules are used to route traffic from and to containers. The created rules are not persistent, so restarting the node will restore iptables to its original state.
Chains |
--------|
`cali-failsafe-in` |
`cali-failsafe-out` |
`cali-fip-dnat` |
`cali-fip-snat` |
`cali-from-hep-forward` |
`cali-from-host-endpoint` |
`cali-from-wl-dispatch` |
`cali-fw-caliXXXXXXXXXXX` (random chain names) |
`cali-nat-outgoing` |
`cali-pri-kns.NAMESPACE` (chain per namespace) |
`cali-pro-kns.NAMESPACE` (chain per namespace) |
`cali-to-hep-forward` |
`cali-to-host-endpoint` |
`cali-to-wl-dispatch` |
`cali-tw-caliXXXXXXXXXXX` (random chain names) |
`cali-wl-to-host` |
`KUBE-EXTERNAL-SERVICES` |
`KUBE-FIREWALL` |
`KUBE-MARK-DROP` |
`KUBE-MARK-MASQ` |
`KUBE-NODEPORTS` |
`KUBE-SEP-XXXXXXXXXXXXXXXX` (random chain names) |
`KUBE-SERVICES` |
`KUBE-SVC-XXXXXXXXXXXXXXXX` (random chain names) |
**To list all iptables rules:**
```
iptables -L -t nat
iptables -L -t mangle
iptables -L
```
@@ -0,0 +1,101 @@
---
title: Cloning Clusters
weight: 2035
aliases:
- /rancher/v2.0-v2.4/en/cluster-provisioning/cloning-clusters/
---
If you have a cluster in Rancher that you want to use as a template for creating similar clusters, you can use Rancher CLI to clone the cluster's configuration, edit it, and then use it to quickly launch the cloned cluster.
Duplication of imported clusters is not supported.
| Cluster Type | Cloneable? |
|----------------------------------|---------------|
| [Nodes Hosted by Infrastructure Provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/) | ✓ |
| [Hosted Kubernetes Providers]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/) | ✓ |
| [Custom Cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/custom-nodes) | ✓ |
| [Imported Cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/imported-clusters/) | |
> **Warning:** During the process of duplicating a cluster, you will edit a config file full of cluster settings. However, we recommend editing only values explicitly listed in this document, as cluster duplication is designed for simple cluster copying, _not_ wide scale configuration changes. Editing other values may invalidate the config file, which will lead to cluster deployment failure.
## Prerequisites
Download and install [Rancher CLI]({{<baseurl>}}/rancher/v2.0-v2.4/en/cli). Remember to [create an API bearer token]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/api-keys) if necessary.
## 1. Export Cluster Config
Begin by using Rancher CLI to export the configuration for the cluster that you want to clone.
1. Open Terminal and change your directory to the location of the Rancher CLI binary, `rancher`.
1. Enter the following command to list the clusters managed by Rancher.
./rancher cluster ls
1. Find the cluster that you want to clone, and copy either its resource `ID` or `NAME` to your clipboard. From this point on, we'll refer to the resource `ID` or `NAME` as `<RESOURCE_ID>`, which is used as a placeholder in the next step.
1. Enter the following command to export the configuration for your cluster.
./rancher clusters export <RESOURCE_ID>
**Step Result:** The YAML for a cloned cluster prints to Terminal.
1. Copy the YAML to your clipboard and paste it in a new file. Save the file as `cluster-template.yml` (or any other name, as long as it has a `.yml` extension).
## 2. Modify Cluster Config
Use your favorite text editor to modify the cluster configuration in `cluster-template.yml` for your cloned cluster.
> **Note:** As of Rancher v2.3.0, cluster configuration directives must be nested under the `rancher_kubernetes_engine_config` directive in `cluster.yml`. For more information, refer to the section on [the config file structure in Rancher v2.3.0+.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/options/#config-file-structure-in-rancher-v2-3-0)
1. Open `cluster-template.yml` (or whatever you named your config) in your favorite text editor.
>**Warning:** Only edit the cluster config values explicitly called out below. Many of the values listed in this file are used to provision your cloned cluster, and editing their values may break the provisioning process.
1. As depicted in the example below, at the `<CLUSTER_NAME>` placeholder, replace your original cluster's name with a unique name (`<CLUSTER_NAME>`). If your cloned cluster has a duplicate name, the cluster will not provision successfully.
```yml
Version: v3
clusters:
<CLUSTER_NAME>: # ENTER UNIQUE NAME
dockerRootDir: /var/lib/docker
enableNetworkPolicy: false
rancherKubernetesEngineConfig:
addonJobTimeout: 30
authentication:
strategy: x509
authorization: {}
bastionHost: {}
cloudProvider: {}
ignoreDockerVersion: true
```
1. For each `nodePools` section, replace the original nodepool name with a unique name at the `<NODEPOOL_NAME>` placeholder. If your cloned cluster has a duplicate nodepool name, the cluster will not provision successfully.
```yml
nodePools:
<NODEPOOL_NAME>:
clusterId: do
controlPlane: true
etcd: true
hostnamePrefix: mark-do
nodeTemplateId: do
quantity: 1
worker: true
```
1. When you're done, save and close the configuration.
## 3. Launch Cloned Cluster
Move `cluster-template.yml` into the same directory as the Rancher CLI binary. Then run this command:
./rancher up --file cluster-template.yml
**Result:** Your cloned cluster begins provisioning. Enter `./rancher cluster ls` to confirm. You can also log into the Rancher UI and open the **Global** view to watch your provisioning cluster's progress.
@@ -0,0 +1,32 @@
---
title: Cluster Access
weight: 1
---
This section is about what tools can be used to access clusters managed by Rancher.
For information on how to give users permission to access a cluster, see the section on [adding users to clusters.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/cluster-access/cluster-members/)
For more information on roles-based access control, see [this section.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/)
For information on how to set up an authentication system, see [this section.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/)
### Rancher UI
Rancher provides an intuitive user interface for interacting with your clusters. All options available in the UI use the Rancher API. Therefore any action possible in the UI is also possible in the Rancher CLI or Rancher API.
### kubectl
You can use the Kubernetes command-line tool, [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/), to manage your clusters. You have two options for using kubectl:
- **Rancher kubectl shell:** Interact with your clusters by launching a kubectl shell available in the Rancher UI. This option requires no configuration actions on your part. For more information, see [Accessing Clusters with kubectl Shell]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/kubectl/).
- **Terminal remote connection:** You can also interact with your clusters by installing [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your local desktop and then copying the cluster's kubeconfig file to your local `~/.kube/config` directory. For more information, see [Accessing Clusters with kubectl and a kubeconfig File](./kubectl/).
### Rancher CLI
You can control your clusters by downloading Rancher's own command-line interface, [Rancher CLI]({{<baseurl>}}/rancher/v2.0-v2.4/en/cli/). This CLI tool can interact directly with different clusters and projects or pass them `kubectl` commands.
### Rancher API
Finally, you can interact with your clusters over the Rancher API. Before you use the API, you must obtain an [API key]({{<baseurl>}}/rancher/v2.0-v2.4/en/user-settings/api-keys/). To view the different resource fields and actions for an API object, open the API UI, which can be accessed by clicking on **View in API** for any Rancher UI object.
@@ -0,0 +1,48 @@
---
title: How the Authorized Cluster Endpoint Works
weight: 2015
---
This section describes how the kubectl CLI, the kubeconfig file, and the authorized cluster endpoint work together to allow you to access a downstream Kubernetes cluster directly, without authenticating through the Rancher server. It is intended to provide background information and context to the instructions for [how to set up kubectl to directly access a cluster.](../kubectl/#authenticating-directly-with-a-downstream-cluster)
### About the kubeconfig File
The _kubeconfig file_ is a file used to configure access to Kubernetes when used in conjunction with the kubectl command line tool (or other clients).
This kubeconfig file and its contents are specific to the cluster you are viewing. It can be downloaded from the cluster view in Rancher. You will need a separate kubeconfig file for each cluster that you have access to in Rancher.
After you download the kubeconfig file, you will be able to use the kubeconfig file and its Kubernetes [contexts](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#kubectl-context-and-configuration) to access your downstream cluster.
_Available as of v2.4.6_
If admins have [enforced TTL on kubeconfig tokens]({{<baseurl>}}/rancher/v2.0-v2.4/en/api/api-tokens/#setting-ttl-on-kubeconfig-tokens), the kubeconfig file requires [rancher cli](../cli) to be present in your PATH.
### Two Authentication Methods for RKE Clusters
If the cluster is not an [RKE cluster,]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) the kubeconfig file allows you to access the cluster in only one way: it lets you be authenticated with the Rancher server, then Rancher allows you to run kubectl commands on the cluster.
For RKE clusters, the kubeconfig file allows you to be authenticated in two ways:
- **Through the Rancher server authentication proxy:** Rancher's authentication proxy validates your identity, then connects you to the downstream cluster that you want to access.
- **Directly with the downstream cluster's API server:** RKE clusters have an authorized cluster endpoint enabled by default. This endpoint allows you to access your downstream Kubernetes cluster with the kubectl CLI and a kubeconfig file, and it is enabled by default for RKE clusters. In this scenario, the downstream cluster's Kubernetes API server authenticates you by calling a webhook (the `kube-api-auth` microservice) that Rancher set up.
This second method, the capability to connect directly to the cluster's Kubernetes API server, is important because it lets you access your downstream cluster if you can't connect to Rancher.
To use the authorized cluster endpoint, you will need to configure kubectl to use the extra kubectl context in the kubeconfig file that Rancher generates for you when the RKE cluster is created. This file can be downloaded from the cluster view in the Rancher UI, and the instructions for configuring kubectl are on [this page.](../kubectl/#authenticating-directly-with-a-downstream-cluster)
These methods of communicating with downstream Kubernetes clusters are also explained in the [architecture page]({{<baseurl>}}/rancher/v2.0-v2.4/en/overview/architecture/#communicating-with-downstream-user-clusters) in the larger context of explaining how Rancher works and how Rancher communicates with downstream clusters.
### About the kube-api-auth Authentication Webhook
The `kube-api-auth` microservice is deployed to provide the user authentication functionality for the [authorized cluster endpoint,]({{<baseurl>}}/rancher/v2.0-v2.4/en/overview/architecture/#4-authorized-cluster-endpoint) which is only available for [RKE clusters.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) When you access the user cluster using `kubectl`, the cluster's Kubernetes API server authenticates you by using the `kube-api-auth` service as a webhook.
During cluster provisioning, the file `/etc/kubernetes/kube-api-authn-webhook.yaml` is deployed and `kube-apiserver` is configured with `--authentication-token-webhook-config-file=/etc/kubernetes/kube-api-authn-webhook.yaml`. This configures the `kube-apiserver` to query `http://127.0.0.1:6440/v1/authenticate` to determine authentication for bearer tokens.
The scheduling rules for `kube-api-auth` are listed below:
_Applies to v2.3.0 and higher_
| Component | nodeAffinity nodeSelectorTerms | nodeSelector | Tolerations |
| -------------------- | ------------------------------------------ | ------------ | ------------------------------------------------------------------------------ |
| kube-api-auth | `beta.kubernetes.io/os:NotIn:windows`<br/>`node-role.kubernetes.io/controlplane:In:"true"` | none | `operator:Exists` |
@@ -0,0 +1,57 @@
---
title: Adding Users to Clusters
weight: 2020
aliases:
- /rancher/v2.0-v2.4/en/tasks/clusters/adding-managing-cluster-members/
- /rancher/v2.0-v2.4/en/k8s-in-rancher/cluster-members/
- /rancher/v2.0-v2.4/en/cluster-admin/cluster-members
---
If you want to provide a user with access and permissions to _all_ projects, nodes, and resources within a cluster, assign the user a cluster membership.
>**Tip:** Want to provide a user with access to a _specific_ project within a cluster? See [Adding Project Members]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/project-members/) instead.
There are two contexts where you can add cluster members:
- Adding Members to a New Cluster
You can add members to a cluster as you create it (recommended if possible).
- [Adding Members to an Existing Cluster](#editing-cluster-membership)
You can always add members to a cluster after a cluster is provisioned.
## Editing Cluster Membership
Cluster administrators can edit the membership for a cluster, controlling which Rancher users can access the cluster and what features they can use.
1. From the **Global** view, open the cluster that you want to add members to.
2. From the main menu, select **Members**. Then click **Add Member**.
3. Search for the user or group that you want to add to the cluster.
If external authentication is configured:
- Rancher returns users from your [external authentication]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/) source as you type.
>**Using AD but can't find your users?**
>There may be an issue with your search attribute configuration. See [Configuring Active Directory Authentication: Step 5]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/ad/).
- A drop-down allows you to add groups instead of individual users. The drop-down only lists groups that you, the logged in user, are part of.
>**Note:** If you are logged in as a local user, external users do not display in your search results. For more information, see [External Authentication Configuration and Principal Users]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/authentication/#external-authentication-configuration-and-principal-users).
4. Assign the user or group **Cluster** roles.
[What are Cluster Roles?]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/)
>**Tip:** For Custom Roles, you can modify the list of individual roles available for assignment.
>
> - To add roles to the list, [Add a Custom Role]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/default-custom-roles/).
> - To remove roles from the list, [Lock/Unlock Roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/locked-roles).
**Result:** The chosen users are added to the cluster.
- To revoke cluster membership, select the user and click **Delete**. This action deletes membership, not the user.
- To modify a user's roles in the cluster, delete them from the cluster, and then re-add them with modified roles.
@@ -0,0 +1,109 @@
---
title: "Access a Cluster with Kubectl and kubeconfig"
description: "Learn how you can access and manage your Kubernetes clusters using kubectl with kubectl Shell or with kubectl CLI and kubeconfig file. A kubeconfig file is used to configure access to Kubernetes. When you create a cluster with Rancher, it automatically creates a kubeconfig for your cluster."
weight: 2010
aliases:
- /rancher/v2.0-v2.4/en/k8s-in-rancher/kubectl/
- /rancher/v2.0-v2.4/en/cluster-admin/kubectl
- /rancher/v2.0-v2.4/en/concepts/clusters/kubeconfig-files/
- /rancher/v2.0-v2.4/en/k8s-in-rancher/kubeconfig/
- /rancher/2.x/en/cluster-admin/kubeconfig
---
This section describes how to manipulate your downstream Kubernetes cluster with kubectl from the Rancher UI or from your workstation.
For more information on using kubectl, see [Kubernetes Documentation: Overview of kubectl](https://kubernetes.io/docs/reference/kubectl/overview/).
- [Accessing clusters with kubectl shell in the Rancher UI](#accessing-clusters-with-kubectl-shell-in-the-rancher-ui)
- [Accessing clusters with kubectl from your workstation](#accessing-clusters-with-kubectl-from-your-workstation)
- [Note on Resources created using kubectl](#note-on-resources-created-using-kubectl)
- [Authenticating Directly with a Downstream Cluster](#authenticating-directly-with-a-downstream-cluster)
- [Connecting Directly to Clusters with FQDN Defined](#connecting-directly-to-clusters-with-fqdn-defined)
- [Connecting Directly to Clusters without FQDN Defined](#connecting-directly-to-clusters-without-fqdn-defined)
### Accessing Clusters with kubectl Shell in the Rancher UI
You can access and manage your clusters by logging into Rancher and opening the kubectl shell in the UI. No further configuration necessary.
1. From the **Global** view, open the cluster that you want to access with kubectl.
2. Click **Launch kubectl**. Use the window that opens to interact with your Kubernetes cluster.
### Accessing Clusters with kubectl from Your Workstation
This section describes how to download your cluster's kubeconfig file, launch kubectl from your workstation, and access your downstream cluster.
This alternative method of accessing the cluster allows you to authenticate with Rancher and manage your cluster without using the Rancher UI.
> **Prerequisites:** These instructions assume that you have already created a Kubernetes cluster, and that kubectl is installed on your workstation. For help installing kubectl, refer to the official [Kubernetes documentation.](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
1. Log into Rancher. From the **Global** view, open the cluster that you want to access with kubectl.
1. Click **Kubeconfig File**.
1. Copy the contents displayed to your clipboard.
1. Paste the contents into a new file on your local computer. Move the file to `~/.kube/config`. Note: The default location that kubectl uses for the kubeconfig file is `~/.kube/config`, but you can use any directory and specify it using the `--kubeconfig` flag, as in this command:
```
kubectl --kubeconfig /custom/path/kube.config get pods
```
1. From your workstation, launch kubectl. Use it to interact with your kubernetes cluster.
### Note on Resources Created Using kubectl
Rancher will discover and show resources created by `kubectl`. However, these resources might not have all the necessary annotations on discovery. If an operation (for instance, scaling the workload) is done to the resource using the Rancher UI/API, this may trigger recreation of the resources due to the missing annotations. This should only happen the first time an operation is done to the discovered resource.
# Authenticating Directly with a Downstream Cluster
This section intended to help you set up an alternative method to access an [RKE cluster.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters)
This method is only available for RKE clusters that have the [authorized cluster endpoint]({{<baseurl>}}/rancher/v2.0-v2.4/en/overview/architecture/#4-authorized-cluster-endpoint) enabled. When Rancher creates this RKE cluster, it generates a kubeconfig file that includes additional kubectl context(s) for accessing your cluster. This additional context allows you to use kubectl to authenticate with the downstream cluster without authenticating through Rancher. For a longer explanation of how the authorized cluster endpoint works, refer to [this page.](../ace)
We recommend that as a best practice, you should set up this method to access your RKE cluster, so that just in case you cant connect to Rancher, you can still access the cluster.
> **Prerequisites:** The following steps assume that you have created a Kubernetes cluster and followed the steps to [connect to your cluster with kubectl from your workstation.](#accessing-clusters-with-kubectl-from-your-workstation)
To find the name of the context(s) in your downloaded kubeconfig file, run:
```
kubectl config get-contexts --kubeconfig /custom/path/kube.config
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* my-cluster my-cluster user-46tmn
my-cluster-controlplane-1 my-cluster-controlplane-1 user-46tmn
```
In this example, when you use `kubectl` with the first context, `my-cluster`, you will be authenticated through the Rancher server.
With the second context, `my-cluster-controlplane-1`, you would authenticate with the authorized cluster endpoint, communicating with an downstream RKE cluster directly.
We recommend using a load balancer with the authorized cluster endpoint. For details, refer to the [recommended architecture section.]({{<baseurl>}}/rancher/v2.0-v2.4/en/overview/architecture-recommendations/#architecture-for-an-authorized-cluster-endpoint)
Now that you have the name of the context needed to authenticate directly with the cluster, you can pass the name of the context in as an option when running kubectl commands. The commands will differ depending on whether your cluster has an FQDN defined. Examples are provided in the sections below.
When `kubectl` works normally, it confirms that you can access your cluster while bypassing Rancher's authentication proxy.
### Connecting Directly to Clusters with FQDN Defined
If an FQDN is defined for the cluster, a single context referencing the FQDN will be created. The context will be named `<CLUSTER_NAME>-fqdn`. When you want to use `kubectl` to access this cluster without Rancher, you will need to use this context.
Assuming the kubeconfig file is located at `~/.kube/config`:
```
kubectl --context <CLUSTER_NAME>-fqdn get nodes
```
Directly referencing the location of the kubeconfig file:
```
kubectl --kubeconfig /custom/path/kube.config --context <CLUSTER_NAME>-fqdn get pods
```
### Connecting Directly to Clusters without FQDN Defined
If there is no FQDN defined for the cluster, extra contexts will be created referencing the IP address of each node in the control plane. Each context will be named `<CLUSTER_NAME>-<NODE_NAME>`. When you want to use `kubectl` to access this cluster without Rancher, you will need to use this context.
Assuming the kubeconfig file is located at `~/.kube/config`:
```
kubectl --context <CLUSTER_NAME>-<NODE_NAME> get nodes
```
Directly referencing the location of the kubeconfig file:
```
kubectl --kubeconfig /custom/path/kube.config --context <CLUSTER_NAME>-<NODE_NAME> get pods
```
@@ -0,0 +1,25 @@
---
title: Cluster Autoscaler
weight: 1
---
In this section, you'll learn how to install and use the [Kubernetes cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/) on Rancher custom clusters using AWS EC2 Auto Scaling Groups.
The cluster autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
* There are pods that failed to run in the cluster due to insufficient resources.
* There are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
To prevent your pod from being evicted, set a `priorityClassName: system-cluster-critical` property on your pod spec.
Cluster Autoscaler is designed to run on Kubernetes master nodes. It can run in the `kube-system` namespace. Cluster Autoscaler doesn't scale down nodes with non-mirrored `kube-system` pods running on them.
It's possible to run a customized deployment of Cluster Autoscaler on worker nodes, but extra care needs to be taken to ensure that Cluster Autoscaler remains up and running.
# Cloud Providers
Cluster Autoscaler provides support to distinct cloud providers. For more information, go to [cluster-autoscaler supported cloud providers.](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment)
### Setting up Cluster Autoscaler on Amazon Cloud Provider
For details on running the cluster autoscaler on Amazon cloud provider, refer to [this page.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/cluster-autoscaler/amazon)
@@ -0,0 +1,580 @@
---
title: Cluster Autoscaler with AWS EC2 Auto Scaling Groups
weight: 1
---
This guide will show you how to install and use [Kubernetes cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/) on Rancher custom clusters using AWS EC2 Auto Scaling Groups.
We are going to install a Rancher RKE custom cluster with a fixed number of nodes with the etcd and controlplane roles, and a variable nodes with the worker role, managed by `cluster-autoscaler`.
- [Prerequisites](#prerequisites)
- [1. Create a Custom Cluster](#1-create-a-custom-cluster)
- [2. Configure the Cloud Provider](#2-configure-the-cloud-provider)
- [3. Deploy Nodes](#3-deploy-nodes)
- [4. Install cluster-autoscaler](#4-install-cluster-autoscaler)
- [Parameters](#parameters)
- [Deployment](#deployment)
- [Testing](#testing)
- [Generating Load](#generating-load)
- [Checking Scale](#checking-scale)
# Prerequisites
These elements are required to follow this guide:
* The Rancher server is up and running
* You have an AWS EC2 user with proper permissions to create virtual machines, auto scaling groups, and IAM profiles and roles
### 1. Create a Custom Cluster
On Rancher server, we should create a custom k8s cluster v1.18.x. Be sure that cloud_provider name is set to `amazonec2`. Once cluster is created we need to get:
* clusterID: `c-xxxxx` will be used on EC2 `kubernetes.io/cluster/<clusterID>` instance tag
* clusterName: will be used on EC2 `k8s.io/cluster-autoscaler/<clusterName>` instance tag
* nodeCommand: will be added on EC2 instance user_data to include new nodes on cluster
```sh
sudo docker run -d --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:<RANCHER_VERSION> --server https://<RANCHER_URL> --token <RANCHER_TOKEN> --ca-checksum <RANCHER_CHECKSUM> <roles>
```
### 2. Configure the Cloud Provider
On AWS EC2, we should create a few objects to configure our system. We've defined three distinct groups and IAM profiles to configure on AWS.
1. Autoscaling group: Nodes that will be part of the EC2 Auto Scaling Group (ASG). The ASG will be used by `cluster-autoscaler` to scale up and down.
* IAM profile: Required by k8s nodes where cluster-autoscaler will be running. It is recommended for Kubernetes master nodes. This profile is called `K8sAutoscalerProfile`.
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"autoscaling:DescribeTags",
"autoscaling:DescribeLaunchConfigurations",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": [
"*"
]
}
]
}
```
2. Master group: Nodes that will be part of the Kubernetes etcd and/or control planes. This will be out of the ASG.
* IAM profile: Required by the Kubernetes cloud_provider integration. Optionally, `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` can be used instead [using-aws-credentials.](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#using-aws-credentials) This profile is called `K8sMasterProfile`.
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"ec2:DescribeInstances",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyVolume",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateRoute",
"ec2:DeleteRoute",
"ec2:DeleteSecurityGroup",
"ec2:DeleteVolume",
"ec2:DetachVolume",
"ec2:RevokeSecurityGroupIngress",
"ec2:DescribeVpcs",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:AttachLoadBalancerToSubnets",
"elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:CreateLoadBalancerPolicy",
"elasticloadbalancing:CreateLoadBalancerListeners",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DeleteLoadBalancerListeners",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DetachLoadBalancerFromSubnets",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:CreateTargetGroup",
"elasticloadbalancing:DeleteListener",
"elasticloadbalancing:DeleteTargetGroup",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeLoadBalancerPolicies",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:SetLoadBalancerPoliciesOfListener",
"iam:CreateServiceLinkedRole",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:BatchGetImage",
"kms:DescribeKey"
],
"Resource": [
"*"
]
}
]
}
```
* IAM role: `K8sMasterRole: [K8sMasterProfile,K8sAutoscalerProfile]`
* Security group: `K8sMasterSg` More info at[RKE ports (custom nodes tab)]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/requirements/ports/#downstream-kubernetes-cluster-nodes)
* Tags:
`kubernetes.io/cluster/<clusterID>: owned`
* User data: `K8sMasterUserData` Ubuntu 18.04(ami-0e11cbb34015ff725), installs docker and add etcd+controlplane node to the k8s cluster
```sh
#!/bin/bash -x
cat <<EOF > /etc/sysctl.d/90-kubelet.conf
vm.overcommit_memory = 1
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
kernel.keys.root_maxkeys = 1000000
kernel.keys.root_maxbytes = 25000000
EOF
sysctl -p /etc/sysctl.d/90-kubelet.conf
curl -sL https://releases.rancher.com/install-docker/19.03.sh | sh
sudo usermod -aG docker ubuntu
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
PRIVATE_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/local-ipv4)
PUBLIC_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/public-ipv4)
K8S_ROLES="--etcd --controlplane"
sudo docker run -d --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:<RANCHER_VERSION> --server https://<RANCHER_URL> --token <RANCHER_TOKEN> --ca-checksum <RANCHER_CA_CHECKSUM> --address ${PUBLIC_IP} --internal-address ${PRIVATE_IP} ${K8S_ROLES}
```
3. Worker group: Nodes that will be part of the k8s worker plane. Worker nodes will be scaled by cluster-autoscaler using the ASG.
* IAM profile: Provides cloud_provider worker integration.
This profile is called `K8sWorkerProfile`.
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeRegions",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:BatchGetImage"
],
"Resource": "*"
}
]
}
```
* IAM role: `K8sWorkerRole: [K8sWorkerProfile]`
* Security group: `K8sWorkerSg` More info at [RKE ports (custom nodes tab)]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/requirements/ports/#downstream-kubernetes-cluster-nodes)
* Tags:
* `kubernetes.io/cluster/<clusterID>: owned`
* `k8s.io/cluster-autoscaler/<clusterName>: true`
* `k8s.io/cluster-autoscaler/enabled: true`
* User data: `K8sWorkerUserData` Ubuntu 18.04(ami-0e11cbb34015ff725), installs docker and add worker node to the k8s cluster
```sh
#!/bin/bash -x
cat <<EOF > /etc/sysctl.d/90-kubelet.conf
vm.overcommit_memory = 1
vm.panic_on_oom = 0
kernel.panic = 10
kernel.panic_on_oops = 1
kernel.keys.root_maxkeys = 1000000
kernel.keys.root_maxbytes = 25000000
EOF
sysctl -p /etc/sysctl.d/90-kubelet.conf
curl -sL https://releases.rancher.com/install-docker/19.03.sh | sh
sudo usermod -aG docker ubuntu
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
PRIVATE_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/local-ipv4)
PUBLIC_IP=$(curl -H "X-aws-ec2-metadata-token: ${TOKEN}" -s http://169.254.169.254/latest/meta-data/public-ipv4)
K8S_ROLES="--worker"
sudo docker run -d --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:<RANCHER_VERSION> --server https://<RANCHER_URL> --token <RANCHER_TOKEN> --ca-checksum <RANCHER_CA_CHECKCSUM> --address ${PUBLIC_IP} --internal-address ${PRIVATE_IP} ${K8S_ROLES}
```
More info is at [RKE clusters on AWS]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/cloud-providers/amazon/) and [Cluster Autoscaler on AWS.](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md)
### 3. Deploy Nodes
Once we've configured AWS, let's create VMs to bootstrap our cluster:
* master (etcd+controlplane): Depending your needs, deploy three master instances with proper size. More info is at [the recommendations for production-ready clusters.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/production/)
* IAM role: `K8sMasterRole`
* Security group: `K8sMasterSg`
* Tags:
* `kubernetes.io/cluster/<clusterID>: owned`
* User data: `K8sMasterUserData`
* worker: Define an ASG on EC2 with the following settings:
* Name: `K8sWorkerAsg`
* IAM role: `K8sWorkerRole`
* Security group: `K8sWorkerSg`
* Tags:
* `kubernetes.io/cluster/<clusterID>: owned`
* `k8s.io/cluster-autoscaler/<clusterName>: true`
* `k8s.io/cluster-autoscaler/enabled: true`
* User data: `K8sWorkerUserData`
* Instances:
* minimum: 2
* desired: 2
* maximum: 10
Once the VMs are deployed, you should have a Rancher custom cluster up and running with three master and two worker nodes.
### 4. Install Cluster-autoscaler
At this point, we should have rancher cluster up and running. We are going to install cluster-autoscaler on master nodes and `kube-system` namespace, following cluster-autoscaler recommendation.
#### Parameters
This table shows cluster-autoscaler parameters for fine tuning:
| Parameter | Default | Description |
|---|---|---|
|cluster-name|-|Autoscaled cluster name, if available|
|address|:8085|The address to expose Prometheus metrics|
|kubernetes|-|Kubernetes master location. Leave blank for default|
|kubeconfig|-|Path to kubeconfig file with authorization and master location information|
|cloud-config|-|The path to the cloud provider configuration file. Empty string for no configuration file|
|namespace|"kube-system"|Namespace in which cluster-autoscaler run|
|scale-down-enabled|true|Should CA scale down the cluster|
|scale-down-delay-after-add|"10m"|How long after scale up that scale down evaluation resumes|
|scale-down-delay-after-delete|0|How long after node deletion that scale down evaluation resumes, defaults to scanInterval|
|scale-down-delay-after-failure|"3m"|How long after scale down failure that scale down evaluation resumes|
|scale-down-unneeded-time|"10m"|How long a node should be unneeded before it is eligible for scale down|
|scale-down-unready-time|"20m"|How long an unready node should be unneeded before it is eligible for scale down|
|scale-down-utilization-threshold|0.5|Sum of cpu or memory of all pods running on the node divided by node's corresponding allocatable resource, below which a node can be considered for scale down|
|scale-down-gpu-utilization-threshold|0.5|Sum of gpu requests of all pods running on the node divided by node's allocatable resource, below which a node can be considered for scale down|
|scale-down-non-empty-candidates-count|30|Maximum number of non empty nodes considered in one iteration as candidates for scale down with drain|
|scale-down-candidates-pool-ratio|0.1|A ratio of nodes that are considered as additional non empty candidates for scale down when some candidates from previous iteration are no longer valid|
|scale-down-candidates-pool-min-count|50|Minimum number of nodes that are considered as additional non empty candidates for scale down when some candidates from previous iteration are no longer valid|
|node-deletion-delay-timeout|"2m"|Maximum time CA waits for removing delay-deletion.cluster-autoscaler.kubernetes.io/ annotations before deleting the node|
|scan-interval|"10s"|How often cluster is reevaluated for scale up or down|
|max-nodes-total|0|Maximum number of nodes in all node groups. Cluster autoscaler will not grow the cluster beyond this number|
|cores-total|"0:320000"|Minimum and maximum number of cores in cluster, in the format <min>:<max>. Cluster autoscaler will not scale the cluster beyond these numbers|
|memory-total|"0:6400000"|Minimum and maximum number of gigabytes of memory in cluster, in the format <min>:<max>. Cluster autoscaler will not scale the cluster beyond these numbers|
cloud-provider|-|Cloud provider type|
|max-bulk-soft-taint-count|10|Maximum number of nodes that can be tainted/untainted PreferNoSchedule at the same time. Set to 0 to turn off such tainting|
|max-bulk-soft-taint-time|"3s"|Maximum duration of tainting/untainting nodes as PreferNoSchedule at the same time|
|max-empty-bulk-delete|10|Maximum number of empty nodes that can be deleted at the same time|
|max-graceful-termination-sec|600|Maximum number of seconds CA waits for pod termination when trying to scale down a node|
|max-total-unready-percentage|45|Maximum percentage of unready nodes in the cluster. After this is exceeded, CA halts operations|
|ok-total-unready-count|3|Number of allowed unready nodes, irrespective of max-total-unready-percentage|
|scale-up-from-zero|true|Should CA scale up when there 0 ready nodes|
|max-node-provision-time|"15m"|Maximum time CA waits for node to be provisioned|
|nodes|-|sets min,max size and other configuration data for a node group in a format accepted by cloud provider. Can be used multiple times. Format: <min>:<max>:<other...>|
|node-group-auto-discovery|-|One or more definition(s) of node group auto-discovery. A definition is expressed `<name of discoverer>:[<key>[=<value>]]`|
|estimator|-|"binpacking"|Type of resource estimator to be used in scale up. Available values: ["binpacking"]|
|expander|"random"|Type of node group expander to be used in scale up. Available values: `["random","most-pods","least-waste","price","priority"]`|
|ignore-daemonsets-utilization|false|Should CA ignore DaemonSet pods when calculating resource utilization for scaling down|
|ignore-mirror-pods-utilization|false|Should CA ignore Mirror pods when calculating resource utilization for scaling down|
|write-status-configmap|true|Should CA write status information to a configmap|
|max-inactivity|"10m"|Maximum time from last recorded autoscaler activity before automatic restart|
|max-failing-time|"15m"|Maximum time from last recorded successful autoscaler run before automatic restart|
|balance-similar-node-groups|false|Detect similar node groups and balance the number of nodes between them|
|node-autoprovisioning-enabled|false|Should CA autoprovision node groups when needed|
|max-autoprovisioned-node-group-count|15|The maximum number of autoprovisioned groups in the cluster|
|unremovable-node-recheck-timeout|"5m"|The timeout before we check again a node that couldn't be removed before|
|expendable-pods-priority-cutoff|-10|Pods with priority below cutoff will be expendable. They can be killed without any consideration during scale down and they don't cause scale up. Pods with null priority (PodPriority disabled) are non expendable|
|regional|false|Cluster is regional|
|new-pod-scale-up-delay|"0s"|Pods less than this old will not be considered for scale-up|
|ignore-taint|-|Specifies a taint to ignore in node templates when considering to scale a node group|
|balancing-ignore-label|-|Specifies a label to ignore in addition to the basic and cloud-provider set of labels when comparing if two node groups are similar|
|aws-use-static-instance-list|false|Should CA fetch instance types in runtime or use a static list. AWS only|
|profiling|false|Is debug/pprof endpoint enabled|
#### Deployment
Based on [cluster-autoscaler-run-on-master.yaml](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-master.yaml) example, we've created our own `cluster-autoscaler-deployment.yaml` to use preferred [auto-discovery setup](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#auto-discovery-setup), updating tolerations, nodeSelector, image version and command config:
```yml
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["events", "endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["watch", "list", "get", "update"]
- apiGroups: [""]
resources:
- "pods"
- "services"
- "replicationcontrollers"
- "persistentvolumeclaims"
- "persistentvolumes"
verbs: ["watch", "list", "get"]
- apiGroups: ["extensions"]
resources: ["replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["watch", "list"]
- apiGroups: ["apps"]
resources: ["statefulsets", "replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes"]
verbs: ["watch", "list", "get"]
- apiGroups: ["batch", "extensions"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
resourceNames: ["cluster-autoscaler"]
resources: ["leases"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create","list","watch"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
verbs: ["delete", "get", "update", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8085'
spec:
serviceAccountName: cluster-autoscaler
tolerations:
- effect: NoSchedule
operator: "Equal"
value: "true"
key: node-role.kubernetes.io/controlplane
nodeSelector:
node-role.kubernetes.io/controlplane: "true"
containers:
- image: eu.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.18.1
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<clusterName>
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-certificates.crt"
```
Once the manifest file is prepared, deploy it in the Kubernetes cluster (Rancher UI can be used instead):
```sh
kubectl -n kube-system apply -f cluster-autoscaler-deployment.yaml
```
**Note:** Cluster-autoscaler deployment can also be set up using [manual configuration](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#manual-configuration)
# Testing
At this point, we should have a cluster-scaler up and running in our Rancher custom cluster. Cluster-scale should manage `K8sWorkerAsg` ASG to scale up and down between 2 and 10 nodes, when one of the following conditions is true:
* There are pods that failed to run in the cluster due to insufficient resources. In this case, the cluster is scaled up.
* There are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. In this case, the cluster is scaled down.
### Generating Load
We've prepared a `test-deployment.yaml` just to generate load on the Kubernetes cluster and see if cluster-autoscaler is working properly. The test deployment is requesting 1000m CPU and 1024Mi memory by three replicas. Adjust the requested resources and/or replica to be sure you exhaust the Kubernetes cluster resources:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: hello-world
name: hello-world
spec:
replicas: 3
selector:
matchLabels:
app: hello-world
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: hello-world
spec:
containers:
- image: rancher/hello-world
imagePullPolicy: Always
name: hello-world
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 1000m
memory: 1024Mi
```
Once the test deployment is prepared, deploy it in the Kubernetes cluster default namespace (Rancher UI can be used instead):
```
kubectl -n default apply -f test-deployment.yaml
```
### Checking Scale
Once the Kubernetes resources got exhausted, cluster-autoscaler should scale up worker nodes where pods failed to be scheduled. It should scale up until up until all pods became scheduled. You should see the new nodes on the ASG and on the Kubernetes cluster. Check the logs on the `kube-system` cluster-autoscaler pod.
Once scale up is checked, let check for scale down. To do it, reduce the replica number on the test deployment until you release enough Kubernetes cluster resources to scale down. You should see nodes disappear on the ASG and on the Kubernetes cluster. Check the logs on the `kube-system` cluster-autoscaler pod.
@@ -0,0 +1,68 @@
---
title: Cluster Configuration
weight: 2025
---
After you provision a Kubernetes cluster using Rancher, you can still edit options and settings for the cluster.
For information on editing cluster membership, go to [this page.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/cluster-access/cluster-members)
- [Cluster Management Capabilities by Cluster Type](#cluster-management-capabilities-by-cluster-type)
- [Editing Clusters in the Rancher UI](#editing-clusters-in-the-rancher-ui)
- [Editing Clusters with YAML](#editing-clusters-with-yaml)
- [Updating ingress-nginx](#updating-ingress-nginx)
### Cluster Management Capabilities by Cluster Type
The options and settings available for an existing cluster change based on the method that you used to provision it. For example, only clusters [provisioned by RKE]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) have **Cluster Options** available for editing.
The following table summarizes the options and settings available for each cluster type:
{{% include file="/rancher/v2.0-v2.4/en/cluster-provisioning/cluster-capabilities-table" %}}
### Editing Clusters in the Rancher UI
To edit your cluster, open the **Global** view, make sure the **Clusters** tab is selected, and then select **&#8942; > Edit** for the cluster that you want to edit.
In [clusters launched by RKE]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/), you can edit any of the remaining options that follow.
Note that these options are not available for imported clusters or hosted Kubernetes clusters.
Option | Description |
---------|----------|
Kubernetes Version | The version of Kubernetes installed on each cluster node. For more detail, see [Upgrading Kubernetes]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/upgrading-kubernetes). |
Network Provider | The \container networking interface (CNI) that powers networking for your cluster.<br/><br/>**Note:** You can only choose this option while provisioning your cluster. It cannot be edited later. |
Project Network Isolation | As of Rancher v2.0.7, if you're using the Canal network provider, you can choose whether to enable or disable inter-project communication. |
Nginx Ingress | If you want to publish your applications in a high-availability configuration, and you're hosting your nodes with a cloud-provider that doesn't have a native load-balancing feature, enable this option to use Nginx ingress within the cluster. |
Metrics Server Monitoring | Each cloud provider capable of launching a cluster using RKE can collect metrics and monitor for your cluster nodes. Enable this option to view your node metrics from your cloud provider's portal. |
Pod Security Policy Support | Enables [pod security policies]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/) for the cluster. After enabling this option, choose a policy using the **Default Pod Security Policy** drop-down. |
Docker version on nodes | Configures whether nodes are allowed to run versions of Docker that Rancher doesn't officially support. If you choose to require a [supported Docker version]({{<baseurl>}}/rancher/v2.0-v2.4/en/installation/options/rke-add-on/layer-7-lb/), Rancher will stop pods from running on nodes that don't have a supported Docker version installed. |
Docker Root Directory | The directory on your cluster nodes where you've installed Docker. If you install Docker on your nodes to a non-default directory, update this path. |
Default Pod Security Policy | If you enable **Pod Security Policy Support**, use this drop-down to choose the pod security policy that's applied to the cluster. |
Cloud Provider | If you're using a cloud provider to host cluster nodes launched by RKE, enable [this option]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/options/cloud-providers/) so that you can use the cloud provider's native features. If you want to store persistent data for your cloud-hosted cluster, this option is required. |
### Editing Clusters with YAML
Instead of using the Rancher UI to choose Kubernetes options for the cluster, advanced users can create an RKE config file. Using a config file allows you to set any of the options available in an RKE installation, except for system_images configuration, by specifying them in YAML.
- To edit an RKE config file directly from the Rancher UI, click **Edit as YAML**.
- To read from an existing RKE file, click **Read from File**.
![image]({{<baseurl>}}/img/rancher/cluster-options-yaml.png)
For an example of RKE config file syntax, see the [RKE documentation]({{<baseurl>}}/rke/latest/en/example-yamls/).
For the complete reference of configurable options for RKE Kubernetes clusters in YAML, see the [RKE documentation.]({{<baseurl>}}/rke/latest/en/config-options/)
In Rancher v2.0.0-v2.2.x, the config file is identical to the [cluster config file for the Rancher Kubernetes Engine]({{<baseurl>}}/rke/latest/en/config-options/), which is the tool Rancher uses to provision clusters. In Rancher v2.3.0, the RKE information is still included in the config file, but it is separated from other options, so that the RKE cluster config options are nested under the `rancher_kubernetes_engine_config` directive. For more information, see the [cluster configuration reference.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/options)
>**Note:** In Rancher v2.0.5 and v2.0.6, the names of services in the Config File (YAML) should contain underscores only: `kube_api` and `kube_controller`.
### Updating ingress-nginx
Clusters that were created before Kubernetes 1.16 will have an `ingress-nginx` `updateStrategy` of `OnDelete`. Clusters that were created with Kubernetes 1.16 or newer will have `RollingUpdate`.
If the `updateStrategy` of `ingress-nginx` is `OnDelete`, you will need to delete these pods to get the correct version for your deployment.
@@ -0,0 +1,226 @@
---
title: Nodes and Node Pools
weight: 2030
---
After you launch a Kubernetes cluster in Rancher, you can manage individual nodes from the cluster's **Node** tab. Depending on the [option used]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/) to provision the cluster, there are different node options available.
> If you want to manage the _cluster_ and not individual nodes, see [Editing Clusters]({{< baseurl >}}/rancher/v2.0-v2.4/en/k8s-in-rancher/editing-clusters).
This section covers the following topics:
- [Node options available for each cluster creation option](#node-options-available-for-each-cluster-creation-option)
- [Nodes hosted by an infrastructure provider](#nodes-hosted-by-an-infrastructure-provider)
- [Nodes provisioned by hosted Kubernetes providers](#nodes-provisioned-by-hosted-kubernetes-providers)
- [Imported nodes](#imported-nodes)
- [Managing and editing individual nodes](#managing-and-editing-individual-nodes)
- [Viewing a node in the Rancher API](#viewing-a-node-in-the-rancher-api)
- [Deleting a node](#deleting-a-node)
- [Scaling nodes](#scaling-nodes)
- [SSH into a node hosted by an infrastructure provider](#ssh-into-a-node-hosted-by-an-infrastructure-provider)
- [Cordoning a node](#cordoning-a-node)
- [Draining a node](#draining-a-node)
- [Aggressive and safe draining options](#aggressive-and-safe-draining-options)
- [Grace period](#grace-period)
- [Timeout](#timeout)
- [Drained and cordoned state](#drained-and-cordoned-state)
- [Labeling a node to be ignored by Rancher](#labeling-a-node-to-be-ignored-by-rancher)
# Node Options Available for Each Cluster Creation Option
The following table lists which node options are available for each type of cluster in Rancher. Click the links in the **Option** column for more detailed information about each feature.
| Option | [Nodes Hosted by an Infrastructure Provider][1] | [Custom Node][2] | [Hosted Cluster][3] | [Imported Nodes][4] | Description |
| ------------------------------------------------ | ------------------------------------------------ | ---------------- | ------------------- | ------------------- | ------------------------------------------------------------------ |
| [Cordon](#cordoning-a-node) | ✓ | ✓ | ✓ | | Marks the node as unschedulable. |
| [Drain](#draining-a-node) | ✓ | ✓ | ✓ | | Marks the node as unschedulable _and_ evicts all pods. |
| [Edit](#managing-and-editing-individual-nodes) | ✓ | ✓ | ✓ | | Enter a custom name, description, label, or taints for a node. |
| [View API](#viewing-a-node-in-the-rancher-api) | ✓ | ✓ | ✓ | | View API data. |
| [Delete](#deleting-a-node) | ✓ | ✓ | | | Deletes defective nodes from the cluster. |
| [Download Keys](#ssh-into-a-node-hosted-by-an-infrastructure-provider) | ✓ | | | | Download SSH key for in order to SSH into the node. |
| [Node Scaling](#scaling-nodes) | ✓ | | | | Scale the number of nodes in the node pool up or down. |
[1]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/
[2]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/custom-nodes/
[3]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/
[4]: {{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/imported-clusters/
### Nodes Hosted by an Infrastructure Provider
Node pools are available when you provision Rancher-launched Kubernetes clusters on nodes that are [hosted in an infrastructure provider.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/)
Clusters provisioned using [one of the node pool options]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/#node-pools) can be scaled up or down if the node pool is edited.
A node pool can also automatically maintain the node scale that's set during the initial cluster provisioning if [node auto-replace is enabled.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/#about-node-auto-replace) This scale determines the number of active nodes that Rancher maintains for the cluster.
Rancher uses [node templates]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/#node-templates) to replace nodes in the node pool. Each node template uses cloud provider credentials to allow Rancher to set up the node in the infrastructure provider.
### Nodes Provisioned by Hosted Kubernetes Providers
Options for managing nodes [hosted by a Kubernetes provider]({{<baseurl >}}/rancher/v2.0-v2.4/en/cluster-provisioning/hosted-kubernetes-clusters/) are somewhat limited in Rancher. Rather than using the Rancher UI to make edits such as scaling the number of nodes up or down, edit the cluster directly.
### Imported Nodes
Although you can deploy workloads to an [imported cluster]({{< baseurl >}}/rancher/v2.0-v2.4/en/cluster-provisioning/imported-clusters/) using Rancher, you cannot manage individual cluster nodes. All management of imported cluster nodes must take place outside of Rancher.
# Managing and Editing Individual Nodes
Editing a node lets you:
* Change its name
* Change its description
* Add [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/)
* Add/Remove [taints](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/)
To manage individual nodes, browse to the cluster that you want to manage and then select **Nodes** from the main menu. You can open the options menu for a node by clicking its **&#8942;** icon (**...**).
# Viewing a Node in the Rancher API
Select this option to view the node's [API endpoints]({{< baseurl >}}/rancher/v2.0-v2.4/en/api/).
# Deleting a Node
Use **Delete** to remove defective nodes from the cloud provider.
When you the delete a defective node, Rancher can automatically replace it with an identically provisioned node if the node is in a node pool and [node auto-replace is enabled.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/#about-node-auto-replace)
>**Tip:** If your cluster is hosted by an infrastructure provider, and you want to scale your cluster down instead of deleting a defective node, [scale down](#scaling-nodes) rather than delete.
# Scaling Nodes
For nodes hosted by an infrastructure provider, you can scale the number of nodes in each [node pool]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/#node-pools) by using the scale controls. This option isn't available for other cluster types.
# SSH into a Node Hosted by an Infrastructure Provider
For [nodes hosted by an infrastructure provider]({{< baseurl >}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/), you have the option of downloading its SSH key so that you can connect to it remotely from your desktop.
1. From the cluster hosted by an infrastructure provider, select **Nodes** from the main menu.
1. Find the node that you want to remote into. Select **&#8942; > Download Keys**.
**Step Result:** A ZIP file containing files used for SSH is downloaded.
1. Extract the ZIP file to any location.
1. Open Terminal. Change your location to the extracted ZIP file.
1. Enter the following command:
```
ssh -i id_rsa root@<IP_OF_HOST>
```
# Cordoning a Node
_Cordoning_ a node marks it as unschedulable. This feature is useful for performing short tasks on the node during small maintenance windows, like reboots, upgrades, or decommissions. When you're done, power back on and make the node schedulable again by uncordoning it.
# Draining a Node
_Draining_ is the process of first cordoning the node, and then evicting all its pods. This feature is useful for performing node maintenance (like kernel upgrades or hardware maintenance). It prevents new pods from deploying to the node while redistributing existing pods so that users don't experience service interruption.
- For pods with a replica set, the pod is replaced by a new pod that will be scheduled to a new node. Additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
- For pods with no replica set, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
You can drain nodes that are in either a `cordoned` or `active` state. When you drain a node, the node is cordoned, the nodes are evaluated for conditions they must meet to be drained, and then (if it meets the conditions) the node evicts its pods.
However, you can override the conditions draining when you initiate the drain. You're also given an opportunity to set a grace period and timeout value.
### Aggressive and Safe Draining Options
The node draining options are different based on your version of Rancher.
{{% tabs %}}
{{% tab "Rancher v2.2.x+" %}}
There are two drain modes: aggressive and safe.
- **Aggressive Mode**
In this mode, pods won't get rescheduled to a new node, even if they do not have a controller. Kubernetes expects you to have your own logic that handles the deletion of these pods.
Kubernetes also expects the implementation to decide what to do with pods using emptyDir. If a pod uses emptyDir to store local data, you might not be able to safely delete it, since the data in the emptyDir will be deleted once the pod is removed from the node. Choosing aggressive mode will delete these pods.
- **Safe Mode**
If a node has standalone pods or ephemeral data it will be cordoned but not drained.
{{% /tab %}}
{{% tab "Rancher before v2.2.x" %}}
The following list describes each drain option:
- **Even if there are pods not managed by a ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet**
These types of pods won't get rescheduled to a new node, since they do not have a controller. Kubernetes expects you to have your own logic that handles the deletion of these pods. Kubernetes forces you to choose this option (which will delete/evict these pods) or drain won't proceed.
- **Even if there are DaemonSet-managed pods**
Similar to above, if you have any daemonsets, drain would proceed only if this option is selected. Even when this option is on, pods won't be deleted since they'll immediately be replaced. On startup, Rancher currently has a few daemonsets running by default in the system, so this option is turned on by default.
- **Even if there are pods using emptyDir**
If a pod uses emptyDir to store local data, you might not be able to safely delete it, since the data in the emptyDir will be deleted once the pod is removed from the node. Similar to the first option, Kubernetes expects the implementation to decide what to do with these pods. Choosing this option will delete these pods.
{{% /tab %}}
{{% /tabs %}}
### Grace Period
The timeout given to each pod for cleaning things up, so they will have chance to exit gracefully. For example, when pods might need to finish any outstanding requests, roll back transactions or save state to some external storage. If negative, the default value specified in the pod will be used.
### Timeout
The amount of time drain should continue to wait before giving up.
>**Kubernetes Known Issue:** The [timeout setting](https://github.com/kubernetes/kubernetes/pull/64378) was not enforced while draining a node before Kubernetes 1.12.
### Drained and Cordoned State
If there's any error related to user input, the node enters a `cordoned` state because the drain failed. You can either correct the input and attempt to drain the node again, or you can abort by uncordoning the node.
If the drain continues without error, the node enters a `draining` state. You'll have the option to stop the drain when the node is in this state, which will stop the drain process and change the node's state to `cordoned`.
Once drain successfully completes, the node will be in a state of `drained`. You can then power off or delete the node.
>**Want to know more about cordon and drain?** See the [Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node).
# Labeling a Node to be Ignored by Rancher
_Available as of 2.3.3_
Some solutions, such as F5's BIG-IP integration, may require creating a node that is never registered to a cluster.
Since the node will never finish registering, it will always be shown as unhealthy in the Rancher UI.
In that case, you may want to label the node to be ignored by Rancher so that Rancher only shows nodes as unhealthy when they are actually failing.
You can label nodes to be ignored by using a setting in the Rancher UI, or by using `kubectl`.
> **Note:** There is an [open issue](https://github.com/rancher/rancher/issues/24172) in which nodes labeled to be ignored can get stuck in an updating state.
### Labeling Nodes to be Ignored with the Rancher UI
To add a node that is ignored by Rancher,
1. From the **Global** view, click the **Settings** tab.
1. Go to the `ignore-node-name` setting and click **&#8942; > Edit.**
1. Enter a name that Rancher will use to ignore nodes. All nodes with this name will be ignored.
1. Click **Save.**
**Result:** Rancher will not wait to register nodes with this name. In the UI, the node will displayed with a grayed-out status. The node is still part of the cluster and can be listed with `kubectl`.
If the setting is changed afterward, the ignored nodes will continue to be hidden.
### Labeling Nodes to be Ignored with kubectl
To add a node that will be ignored by Rancher, use `kubectl` to create a node that has the following label:
```
cattle.rancher.io/node-status: ignore
```
**Result:** If you add the node to a cluster, Rancher will not attempt to sync with this node. The node can still be part of the cluster and can be listed with `kubectl`.
If the label is added before the node is added to the cluster, the node will not be shown in the Rancher UI.
If the label is added after the node is added to a Rancher cluster, the node will not be removed from the UI.
If you delete the node from the Rancher server using the Rancher UI or API, the node will not be removed from the cluster if the `nodeName` is listed in the Rancher settings under `ignore-node-name`.
@@ -0,0 +1,30 @@
---
title: Adding a Pod Security Policy
weight: 80
---
> **Prerequisite:** The options below are available only for clusters that are [launched using RKE.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/)
When your cluster is running pods with security-sensitive configurations, assign it a [pod security policy]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/), which is a set of rules that monitors the conditions and settings in your pods. If a pod doesn't meet the rules specified in your policy, the policy stops it from running.
You can assign a pod security policy when you provision a cluster. However, if you need to relax or restrict security for your pods later, you can update the policy while editing your cluster.
1. From the **Global** view, find the cluster to which you want to apply a pod security policy. Select **&#8942; > Edit**.
2. Expand **Cluster Options**.
3. From **Pod Security Policy Support**, select **Enabled**.
>**Note:** This option is only available for clusters [provisioned by RKE]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/).
4. From the **Default Pod Security Policy** drop-down, select the policy you want to apply to the cluster.
Rancher ships with [policies]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/#default-pod-security-policies) of `restricted` and `unrestricted`, although you can [create custom policies]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/#default-pod-security-policies) as well.
5. Click **Save**.
**Result:** The pod security policy is applied to the cluster and any projects within the cluster.
>**Note:** Workloads already running before assignment of a pod security policy are grandfathered in. Even if they don't meet your pod security policy, workloads running before assignment of the policy continue to run.
>
>To check if a running workload passes your pod security policy, clone or upgrade it.
@@ -0,0 +1,203 @@
---
title: Projects and Kubernetes Namespaces with Rancher
description: Rancher Projects ease the administrative burden of your cluster and support multi-tenancy. Learn to create projects and divide projects into Kubernetes namespaces
weight: 2032
aliases:
- /rancher/v2.0-v2.4/en/concepts/projects/
- /rancher/v2.0-v2.4/en/tasks/projects/
- /rancher/v2.0-v2.4/en/tasks/projects/create-project/
- /rancher/v2.0-v2.4/en/tasks/projects/create-project/
---
A namespace is a Kubernetes concept that allows a virtual cluster within a cluster, which is useful for dividing the cluster into separate "virtual clusters" that each have their own access control and resource quotas.
A project is a group of namespaces, and it is a concept introduced by Rancher. Projects allow you to manage multiple namespaces as a group and perform Kubernetes operations in them. You can use projects to support multi-tenancy, so that a team can access a project within a cluster without having access to other projects in the same cluster.
This section describes how projects and namespaces work with Rancher. It covers the following topics:
- [About namespaces](#about-namespaces)
- [About projects](#about-projects)
- [The cluster's default project](#the-cluster-s-default-project)
- [The system project](#the-system-project)
- [Project authorization](#project-authorization)
- [Pod security policies](#pod-security-policies)
- [Creating projects](#creating-projects)
- [Switching between clusters and projects](#switching-between-clusters-and-projects)
# About Namespaces
A namespace is a concept introduced by Kubernetes. According to the [official Kubernetes documentation on namespaces,](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
> Kubernetes supports multiple virtual clusters backed by the same physical cluster. These virtual clusters are called namespaces. [...] Namespaces are intended for use in environments with many users spread across multiple teams, or projects. For clusters with a few to tens of users, you should not need to create or think about namespaces at all.
Namespaces provide the following functionality:
- **Providing a scope for names:** Names of resources need to be unique within a namespace, but not across namespaces. Namespaces can not be nested inside one another and each Kubernetes resource can only be in one namespace.
- **Resource quotas:** Namespaces provide a way to divide cluster resources between multiple users.
You can assign resources at the project level so that each namespace in the project can use them. You can also bypass this inheritance by assigning resources explicitly to a namespace.
You can assign the following resources directly to namespaces:
- [Workloads]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/workloads/)
- [Load Balancers/Ingress]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/load-balancers-and-ingress/)
- [Service Discovery Records]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/service-discovery/)
- [Persistent Volume Claims]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/volumes-and-storage/persistent-volume-claims/)
- [Certificates]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/certificates/)
- [ConfigMaps]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/configmaps/)
- [Registries]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/registries/)
- [Secrets]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/secrets/)
To manage permissions in a vanilla Kubernetes cluster, cluster admins configure role-based access policies for each namespace. With Rancher, user permissions are assigned on the project level instead, and permissions are automatically inherited by any namespace owned by the particular project.
For more information on creating and moving namespaces, see [Namespaces]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/namespaces/).
### Role-based access control issues with namespaces and kubectl
Because projects are a concept introduced by Rancher, kubectl does not have the capability to restrict the creation of namespaces to a project the creator has access to.
This means that when standard users with project-scoped permissions create a namespaces with `kubectl`, it may be unusable because `kubectl` doesn't require the new namespace to be scoped within a certain project.
If your permissions are restricted to the project level, it is better to [create a namespace through Rancher]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/namespaces/) to ensure that you will have permission to access the namespace.
If a standard user is a project owner, the user will be able to create namespaces within that project. The Rancher UI will prevent that user from creating namespaces outside the scope of the projects they have access to.
# About Projects
In terms of hierarchy:
- Clusters contain projects
- Projects contain namespaces
You can use projects to support multi-tenancy, so that a team can access a project within a cluster without having access to other projects in the same cluster.
In the base version of Kubernetes, features like role-based access rights or cluster resources are assigned to individual namespaces. A project allows you to save time by giving an individual or a team access to multiple namespaces simultaneously.
You can use projects to perform actions such as:
- Assign users to a group of namespaces (i.e., [project membership]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/project-members)).
- Assign users specific roles in a project. A role can be owner, member, read-only, or [custom]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/default-custom-roles/).
- Assign resources to the project.
- Assign Pod Security Policies.
When you create a cluster, two projects are automatically created within it:
- [Default Project](#the-cluster-s-default-project)
- [System Project](#the-system-project)
### The Cluster's Default Project
When you provision a cluster with Rancher, it automatically creates a `default` project for the cluster. This is a project you can use to get started with your cluster, but you can always delete it and replace it with projects that have more descriptive names.
If you don't have a need for more than the default namespace, you also do not need more than the **Default** project in Rancher.
If you require another level of organization beyond the **Default** project, you can create more projects in Rancher to isolate namespaces, applications and resources.
### The System Project
_Available as of v2.0.7_
When troubleshooting, you can view the `system` project to check if important namespaces in the Kubernetes system are working properly. This easily accessible project saves you from troubleshooting individual system namespace containers.
To open it, open the **Global** menu, and then select the `system` project for your cluster.
The `system` project:
- Is automatically created when you provision a cluster.
- Lists all namespaces that exist in `v3/settings/system-namespaces`, if they exist.
- Allows you to add more namespaces or move its namespaces to other projects.
- Cannot be deleted because it's required for cluster operations.
>**Note:** In clusters where both:
>
> - The Canal network plug-in is in use.
> - The Project Network Isolation option is enabled.
>
>The `system` project overrides the Project Network Isolation option so that it can communicate with other projects, collect logs, and check health.
# Project Authorization
Standard users are only authorized for project access in two situations:
- An administrator, cluster owner or cluster member explicitly adds the standard user to the project's **Members** tab.
- Standard users can access projects that they create themselves.
# Pod Security Policies
Rancher extends Kubernetes to allow the application of [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) at the [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/pod-security-policies) in addition to the [cluster level.](../pod-security-policy) However, as a best practice, we recommend applying Pod Security Policies at the cluster level.
# Creating Projects
This section describes how to create a new project with a name and with optional pod security policy, members, and resource quotas.
1. [Name a new project.](#1-name-a-new-project)
2. [Optional: Select a pod security policy.](#2-optional-select-a-pod-security-policy)
3. [Recommended: Add project members.](#3-recommended-add-project-members)
4. [Optional: Add resource quotas.](#4-optional-add-resource-quotas)
### 1. Name a New Project
1. From the **Global** view, choose **Clusters** from the main menu. From the **Clusters** page, open the cluster from which you want to create a project.
1. From the main menu, choose **Projects/Namespaces**. Then click **Add Project**.
1. Enter a **Project Name**.
### 2. Optional: Select a Pod Security Policy
This option is only available if you've already created a Pod Security Policy. For instruction, see [Creating Pod Security Policies]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/pod-security-policies/).
Assigning a PSP to a project will:
- Override the cluster's default PSP.
- Apply the PSP to the project.
- Apply the PSP to any namespaces you add to the project later.
### 3. Recommended: Add Project Members
Use the **Members** section to provide other users with project access and roles.
By default, your user is added as the project `Owner`.
>**Notes on Permissions:**
>
>- Users assigned the `Owner` or `Member` role for a project automatically inherit the `namespace creation` role. However, this role is a [Kubernetes ClusterRole](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#role-and-clusterrole), meaning its scope extends to all projects in the cluster. Therefore, users explicitly assigned the `Owner` or `Member` role for a project can create namespaces in other projects they're assigned to, even with only the `Read Only` role assigned.
>- Choose `Custom` to create a custom role on the fly: [Custom Project Roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#custom-project-roles).
To add members:
1. Click **Add Member**.
1. From the **Name** combo box, search for a user or group that you want to assign project access. Note: You can only search for groups if external authentication is enabled.
1. From the **Role** drop-down, choose a role. For more information, refer to the [documentation on project roles.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/)
### 4. Optional: Add Resource Quotas
_Available as of v2.1.0_
Resource quotas limit the resources that a project (and its namespaces) can consume. For more information, see [Resource Quotas]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/resource-quotas).
To add a resource quota,
1. Click **Add Quota**.
1. Select a Resource Type. For more information, see [Resource Quotas.]({{<baseurl>}}/rancher/v2.0-v2.4/en/k8s-in-rancher/projects-and-namespaces/resource-quotas/).
1. Enter values for the **Project Limit** and the **Namespace Default Limit**.
1. **Optional:** Specify **Container Default Resource Limit**, which will be applied to every container started in the project. The parameter is recommended if you have CPU or Memory limits set by the Resource Quota. It can be overridden on per an individual namespace or a container level. For more information, see [Container Default Resource Limit]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/resource-quotas/) Note: This option is available as of v2.2.0.
1. Click **Create**.
**Result:** Your project is created. You can view it from the cluster's **Projects/Namespaces** view.
| Field | Description |
| ----------------------- | -------------------------------------------------------------------------------------------------------- |
| Project Limit | The overall resource limit for the project. |
| Namespace Default Limit | The default resource limit available for each namespace. This limit is propagated to each namespace in the project when created. The combined limit of all project namespaces shouldn't exceed the project limit. |
# Switching between Clusters and Projects
To switch between clusters and projects, use the **Global** drop-down available in the main menu.
![Global Menu]({{<baseurl>}}/img/rancher/global-menu.png)
Alternatively, you can switch between projects and clusters using the main menu.
- To switch between clusters, open the **Global** view and select **Clusters** from the main menu. Then open a cluster.
- To switch between projects, open a cluster, and then select **Projects/Namespaces** from the main menu. Select the link for the project that you want to open.
@@ -0,0 +1,113 @@
---
title: Restoring a Cluster from Backup
weight: 2050
---
_Available as of v2.2.0_
etcd backup and recovery for [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/) can be easily performed. Snapshots of the etcd database are taken and saved either locally onto the etcd nodes or to a S3 compatible target. The advantages of configuring S3 is that if all etcd nodes are lost, your snapshot is saved remotely and can be used to restore the cluster.
Rancher recommends enabling the [ability to set up recurring snapshots of etcd]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/#configuring-recurring-snapshots), but [one-time snapshots]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/#one-time-snapshots) can easily be taken as well. Rancher allows restore from [saved snapshots](#restoring-a-cluster-from-a-snapshot) or if you don't have any snapshots, you can still [restore etcd](#recovering-etcd-without-a-snapshot).
As of Rancher v2.4.0, clusters can also be restored to a prior Kubernetes version and cluster configuration.
This section covers the following topics:
- [Viewing Available Snapshots](#viewing-available-snapshots)
- [Restoring a Cluster from a Snapshot](#restoring-a-cluster-from-a-snapshot)
- [Recovering etcd without a Snapshot](#recovering-etcd-without-a-snapshot)
- [Enabling snapshot features for clusters created before Rancher v2.2.0](#enabling-snapshot-features-for-clusters-created-before-rancher-v2-2-0)
## Viewing Available Snapshots
The list of all available snapshots for the cluster is available.
1. In the **Global** view, navigate to the cluster that you want to view snapshots.
2. Click **Tools > Snapshots** from the navigation bar to view the list of saved snapshots. These snapshots include a timestamp of when they were created.
## Restoring a Cluster from a Snapshot
If your Kubernetes cluster is broken, you can restore the cluster from a snapshot.
Restores changed in Rancher v2.4.0.
{{% tabs %}}
{{% tab "Rancher v2.4.0+" %}}
Snapshots are composed of the cluster data in etcd, the Kubernetes version, and the cluster configuration in the `cluster.yml.` These components allow you to select from the following options when restoring a cluster from a snapshot:
- **Restore just the etcd contents:** This restore is similar to restoring to snapshots in Rancher before v2.4.0.
- **Restore etcd and Kubernetes version:** This option should be used if a Kubernetes upgrade is the reason that your cluster is failing, and you haven't made any cluster configuration changes.
- **Restore etcd, Kubernetes versions and cluster configuration:** This option should be used if you changed both the Kubernetes version and cluster configuration when upgrading.
When rolling back to a prior Kubernetes version, the [upgrade strategy options]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/upgrading-kubernetes/#configuring-the-upgrade-strategy) are ignored. Worker nodes are not cordoned or drained before being reverted to the older Kubernetes version, so that an unhealthy cluster can be more quickly restored to a healthy state.
> **Prerequisite:** To restore snapshots from S3, the cluster needs to be configured to [take recurring snapshots on S3.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/#configuring-recurring-snapshots)
1. In the **Global** view, navigate to the cluster that you want to restore from a snapshots.
2. Click the **&#8942; > Restore Snapshot**.
3. Select the snapshot that you want to use for restoring your cluster from the dropdown of available snapshots.
4. In the **Restoration Type** field, choose one of the restore options described above.
5. Click **Save**.
**Result:** The cluster will go into `updating` state and the process of restoring the `etcd` nodes from the snapshot will start. The cluster is restored when it returns to an `active` state.
{{% /tab %}}
{{% tab "Rancher before v2.4.0" %}}
> **Prerequisites:**
>
> - Make sure your etcd nodes are healthy. If you are restoring a cluster with unavailable etcd nodes, it's recommended that all etcd nodes are removed from Rancher before attempting to restore. For clusters in which Rancher used node pools to provision [nodes in an infrastructure provider]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/node-pools/), new etcd nodes will automatically be created. For [custom clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/custom-nodes/), please ensure that you add new etcd nodes to the cluster.
> - To restore snapshots from S3, the cluster needs to be configured to [take recurring snapshots on S3.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/backing-up-etcd/#configuring-recurring-snapshots)
1. In the **Global** view, navigate to the cluster that you want to restore from a snapshot.
2. Click the **&#8942; > Restore Snapshot**.
3. Select the snapshot that you want to use for restoring your cluster from the dropdown of available snapshots.
4. Click **Save**.
**Result:** The cluster will go into `updating` state and the process of restoring the `etcd` nodes from the snapshot will start. The cluster is restored when it returns to an `active` state.
{{% /tab %}}
{{% /tabs %}}
## Recovering etcd without a Snapshot
If the group of etcd nodes loses quorum, the Kubernetes cluster will report a failure because no operations, e.g. deploying workloads, can be executed in the Kubernetes cluster. The cluster should have three etcd nodes to prevent a loss of quorum. If you want to recover your set of etcd nodes, follow these instructions:
1. Keep only one etcd node in the cluster by removing all other etcd nodes.
2. On the single remaining etcd node, run the following command:
```
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike etcd
```
This command outputs the running command for etcd, save this command to use later.
3. Stop the etcd container that you launched in the previous step and rename it to `etcd-old`.
```
$ docker stop etcd
$ docker rename etcd etcd-old
```
4. Take the saved command from Step 2 and revise it:
- If you originally had more than 1 etcd node, then you need to change `--initial-cluster` to only contain the node that remains.
- Add `--force-new-cluster` to the end of the command.
5. Run the revised command.
6. After the single nodes is up and running, Rancher recommends adding additional etcd nodes to your cluster. If you have a [custom cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/custom-nodes) and you want to reuse an old node, you are required to [clean up the nodes]({{<baseurl>}}/rancher/v2.0-v2.4/en/faq/cleaning-cluster-nodes/) before attempting to add them back into a cluster.
# Enabling Snapshot Features for Clusters Created Before Rancher v2.2.0
If you have any Rancher launched Kubernetes clusters that were created before v2.2.0, after upgrading Rancher, you must [edit the cluster]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/editing-clusters/) and _save_ it, in order to enable the updated snapshot features. Even if you were already creating snapshots before v2.2.0, you must do this step as the older snapshots will not be available to use to [back up and restore etcd through the UI]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/restoring-etcd/).
@@ -0,0 +1,71 @@
---
title: Tools for Logging, Monitoring, and More
weight: 2033
aliases:
- /rancher/v2.0-v2.4/en/tools/notifiers-and-alerts/
---
Rancher contains a variety of tools that aren't included in Kubernetes to assist in your DevOps operations. Rancher can integrate with external services to help your clusters run more efficiently. Tools are divided into following categories:
<!-- TOC -->
- [Logging](#logging)
- [Monitoring](#monitoring)
- [Alerts](#alerts)
- [Notifiers](#notifiers)
- [Istio](#istio)
- [OPA Gatekeeper](#opa-gatekeeper)
- [CIS Scans](#cis-scans)
<!-- /TOC -->
# Logging
Logging is helpful because it allows you to:
- Capture and analyze the state of your cluster
- Look for trends in your environment
- Save your logs to a safe location outside of your cluster
- Stay informed of events like a container crashing, a pod eviction, or a node dying
- More easily debugg and troubleshoot problems
Rancher can integrate with Elasticsearch, splunk, kafka, syslog, and fluentd.
Refer to the logging documentation [here.](./cluster-logging)
# Monitoring
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
For details, refer to [Monitoring.](./cluster-monitoring)
# Alerts
After monitoring is enabled, you can set up alerts and notifiers that provide the mechanism to receive them.
Alerts are rules that trigger notifications. Before you can receive alerts, you must configure one or more notifier in Rancher. The scope for alerts can be set at either the cluster or project level.
For details, refer to [Alerts.](./cluster-alerts)
# Notifiers
Notifiers are services that inform you of alert events. You can configure notifiers to send alert notifications to staff best suited to take corrective action. Notifications can be sent with Slack, email, PagerDuty, WeChat, and webhooks.
For details, refer to [Notifiers.](./notifiers)
# Istio
_Available as of v2.3_
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices.
Refer to the Istio documentation [here.](./istio)
# OPA Gatekeeper
[OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper) is an open-source project that provides integration between OPA and Kubernetes to provide policy control via admission controller webhooks. For details on how to enable Gatekeeper in Rancher, refer to the [OPA Gatekeeper section.](./opa-gatekeeper)
# CIS Scans
Rancher can run a security scan to check whether Kubernetes is deployed according to security best practices as defined in the CIS Kubernetes Benchmark.
Refer to the CIS scan documentation [here.](./cis-scans)
@@ -0,0 +1,155 @@
---
title: CIS Scans
weight: 18
aliases:
- /rancher/v2.0-v2.4/en/cis-scans/legacy
- /rancher/v2.0-v2.4/en/cis-scans
---
_Available as of v2.4.0_
- [Prerequisites](#prerequisites)
- [Running a scan](#running-a-scan)
- [Scheduling recurring scans](#scheduling-recurring-scans)
- [Skipping tests](#skipping-tests)
- [Setting alerts](#setting-alerts)
- [Deleting a report](#deleting-a-report)
- [Downloading a report](#downloading-a-report)
- [List of skipped and not applicable tests](#list-of-skipped-and-not-applicable-tests)
# Prerequisites
To run security scans on a cluster and access the generated reports, you must be an [Administrator]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) or [Cluster Owner.]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/)
Rancher can only run security scans on clusters that were created with RKE, which includes custom clusters and clusters that Rancher created in an infrastructure provider such as Amazon EC2 or GCE. Imported clusters and clusters in hosted Kubernetes providers can't be scanned by Rancher.
The security scan cannot run in a cluster that has Windows nodes.
You will only be able to see the CIS scan reports for clusters that you have access to.
# Running a Scan
1. From the cluster view in Rancher, click **Tools > CIS Scans.**
1. Click **Run Scan.**
1. Choose a CIS scan profile.
**Result:** A report is generated and displayed in the **CIS Scans** page. To see details of the report, click the report's name.
# Scheduling Recurring Scans
Recurring scans can be scheduled to run on any RKE Kubernetes cluster.
To enable recurring scans, edit the advanced options in the cluster configuration during cluster creation or after the cluster has been created.
To schedule scans for an existing cluster:
1. Go to the cluster view in Rancher.
1. Click **Tools > CIS Scans.**
1. Click **Add Schedule.** This takes you to the section of the cluster editing page that is applicable to configuring a schedule for CIS scans. (This section can also be reached by going to the cluster view, clicking **&#8942; > Edit,** and going to the **Advanced Options.**)
1. In the **CIS Scan Enabled** field, click **Yes.**
1. In the **CIS Scan Profile** field, choose a **Permissive** or **Hardened** profile. The corresponding CIS Benchmark version is included in the profile name. Note: Any skipped tests [defined in a separate ConfigMap](#skipping-tests) will be skipped regardless of whether a **Permissive** or **Hardened** profile is selected. When selecting the the permissive profile, you should see which tests were skipped by Rancher (tests that are skipped by default for RKE clusters) and which tests were skipped by a Rancher user. In the hardened test profile, the only skipped tests will be skipped by users.
1. In the **CIS Scan Interval (cron)** job, enter a [cron expression](https://en.wikipedia.org/wiki/Cron#CRON_expression) to define how often the cluster will be scanned.
1. In the **CIS Scan Report Retention** field, enter the number of past reports that should be kept.
**Result:** The security scan will run and generate reports at the scheduled intervals.
The test schedule can be configured in the `cluster.yml`:
```yaml
scheduled_cluster_scan:
    enabled: true
    scan_config:
        cis_scan_config:
            override_benchmark_version: rke-cis-1.4
            profile: permissive
    schedule_config:
        cron_schedule: 0 0 * * *
        retention: 24
```
# Skipping Tests
You can define a set of tests that will be skipped by the CIS scan when the next report is generated.
These tests will be skipped for subsequent CIS scans, including both manually triggered and scheduled scans, and the tests will be skipped with any profile.
The skipped tests will be listed alongside the test profile name in the cluster configuration options when a test profile is selected for a recurring cluster scan. The skipped tests will also be shown every time a scan is triggered manually from the Rancher UI by clicking **Run Scan.** The display of skipped tests allows you to know ahead of time which tests will be run in each scan.
To skip tests, you will need to define them in a Kubernetes ConfigMap resource. Each skipped CIS scan test is listed in the ConfigMap alongside the version of the CIS benchmark that the test belongs to.
To skip tests by editing a ConfigMap resource,
1. Create a `security-scan` namespace.
1. Create a ConfigMap named `security-scan-cfg`.
1. Enter the skip information under the key `config.json` in the following format:
```json
{
"skip": {
"rke-cis-1.4": [
"1.1.1",
"1.2.2"
]
}
}
```
In the example above, the CIS benchmark version is specified alongside the tests to be skipped for that version.
**Result:** These tests will be skipped on subsequent scans that use the defined CIS Benchmark version.
# Setting Alerts
Rancher provides a set of alerts for cluster scans. which are not configured to have notifiers by default:
- A manual cluster scan was completed
- A manual cluster scan has failures
- A scheduled cluster scan was completed
- A scheduled cluster scan has failures
> **Prerequisite:** You need to configure a [notifier]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers/) before configuring, sending, or receiving alerts.
To activate an existing alert for a CIS scan result,
1. From the cluster view in Rancher, click **Tools > Alerts.**
1. Go to the section called **A set of alerts for cluster scans.**
1. Go to the alert you want to activate and click **&#8942; > Activate.**
1. Go to the alert rule group **A set of alerts for cluster scans** and click **&#8942; > Edit.**
1. Scroll down to the **Alert** section. In the **To** field, select the notifier that you would like to use for sending alert notifications.
1. Optional: To limit the frequency of the notifications, click on **Show advanced options** and configure the time interval of the alerts.
1. Click **Save.**
**Result:** The notifications will be triggered when the a scan is run on a cluster and the active alerts have satisfied conditions.
To create a new alert,
1. Go to the cluster view and click **Tools > CIS Scans.**
1. Click **Add Alert.**
1. Fill out the form.
1. Enter a name for the alert.
1. In the **Is** field, set the alert to be triggered when a scan is completed or when a scan has a failure.
1. In the **Send a** field, set the alert as a **Critical,** **Warning,** or **Info** alert level.
1. Choose a [notifier]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers/) for the alert.
**Result:** The alert is created and activated. The notifications will be triggered when the a scan is run on a cluster and the active alerts have satisfied conditions.
For more information about alerts, refer to [this page.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/alerts/)
# Deleting a Report
1. From the cluster view in Rancher, click **Tools > CIS Scans.**
1. Go to the report that should be deleted.
1. Click the **&#8942; > Delete.**
1. Click **Delete.**
# Downloading a Report
1. From the cluster view in Rancher, click **Tools > CIS Scans.**
1. Go to the report that you want to download. Click **&#8942; > Download.**
**Result:** The report is downloaded in CSV format.
# List of Skipped and Not Applicable Tests
For a list of skipped and not applicable tests, refer to <a href="{{<baseurl>}}/rancher/v2.0-v2.4/en/cis-scans/legacy/skipped-tests" target="_blank">this page.</a>
@@ -0,0 +1,109 @@
---
title: Skipped and Not Applicable Tests
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/cis-scans/legacy/skipped-tests
- /rancher/v2.0-v2.4/en/cis-scans/v2.4/skipped-tests
- /rancher/v2.0-v2.4/en/cis-scans/skipped-tests
---
This section lists the tests that are skipped in the permissive test profile for RKE.
All the tests that are skipped and not applicable on this page will be counted as Not Applicable in the generated report. The skipped test count will only mention the user-defined skipped tests. This allows user-skipped tests to be distinguished from the tests that are skipped by default in the RKE permissive test profile.
- [CIS Benchmark v1.5](#cis-benchmark-v1-5)
- [CIS Benchmark v1.4](#cis-benchmark-v1-4)
# CIS Benchmark v1.5
### CIS Benchmark v1.5 Skipped Tests
| Number | Description | Reason for Skipping |
| ---------- | ------------- | --------- |
| 1.1.12 | Ensure that the etcd data directory ownership is set to etcd:etcd (Scored) | A system service account is required for etcd data directory ownership. Refer to Rancher's hardening guide for more details on how to configure this ownership. |
| 1.2.6 | Ensure that the --kubelet-certificate-authority argument is set as appropriate (Scored) | When generating serving certificates, functionality could break in conjunction with hostname overrides which are required for certain cloud providers. |
| 1.2.16 | Ensure that the admission control plugin PodSecurityPolicy is set (Scored) | Enabling Pod Security Policy can cause applications to unexpectedly fail. |
| 1.2.33 | Ensure that the --encryption-provider-config argument is set as appropriate (Not Scored) | Enabling encryption changes how data can be recovered as data is encrypted. |
| 1.2.34 | Ensure that encryption providers are appropriately configured (Not Scored) | Enabling encryption changes how data can be recovered as data is encrypted. |
| 4.2.6 | Ensure that the --protect-kernel-defaults argument is set to true (Scored) | System level configurations are required before provisioning the cluster in order for this argument to be set to true. |
| 4.2.10 | Ensure that the--tls-cert-file and --tls-private-key-file arguments are set as appropriate (Scored) | When generating serving certificates, functionality could break in conjunction with hostname overrides which are required for certain cloud providers. |
| 5.1.5 | Ensure that default service accounts are not actively used. (Scored) | Kubernetes provides default service accounts to be used. |
| 5.2.2 | Minimize the admission of containers wishing to share the host process ID namespace (Scored) | Enabling Pod Security Policy can cause applications to unexpectedly fail. |
| 5.2.3 | Minimize the admission of containers wishing to share the host IPC namespace (Scored) | Enabling Pod Security Policy can cause applications to unexpectedly fail. |
| 5.2.4 | Minimize the admission of containers wishing to share the host network namespace (Scored) | Enabling Pod Security Policy can cause applications to unexpectedly fail. |
| 5.2.5 | Minimize the admission of containers with allowPrivilegeEscalation (Scored) | Enabling Pod Security Policy can cause applications to unexpectedly fail. |
| 5.3.2 | Ensure that all Namespaces have Network Policies defined (Scored) | Enabling Network Policies can prevent certain applications from communicating with each other. |
| 5.6.4 | The default namespace should not be used (Scored) | Kubernetes provides a default namespace. |
### CIS Benchmark v1.5 Not Applicable Tests
| Number | Description | Reason for being not applicable |
| ---------- | ------------- | --------- |
| 1.1.1 | Ensure that the API server pod specification file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for kube-apiserver. All configuration is passed in as arguments at container run time. |
| 1.1.2 | Ensure that the API server pod specification file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for kube-apiserver. All configuration is passed in as arguments at container run time. |
| 1.1.3 | Ensure that the controller manager pod specification file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for controller-manager. All configuration is passed in as arguments at container run time. |
| 1.1.4 | Ensure that the controller manager pod specification file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for controller-manager. All configuration is passed in as arguments at container run time. |
| 1.1.5 | Ensure that the scheduler pod specification file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for scheduler. All configuration is passed in as arguments at container run time. |
| 1.1.6 | Ensure that the scheduler pod specification file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for scheduler. All configuration is passed in as arguments at container run time. |
| 1.1.7 | Ensure that the etcd pod specification file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for etcd. All configuration is passed in as arguments at container run time. |
| 1.1.8 | Ensure that the etcd pod specification file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for etcd. All configuration is passed in as arguments at container run time. |
| 1.1.13 | Ensure that the admin.conf file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE does not store the kubernetes default kubeconfig credentials file on the nodes. |
| 1.1.14 | Ensure that the admin.conf file ownership is set to root:root (Scored) | Clusters provisioned by RKE does not store the kubernetes default kubeconfig credentials file on the nodes. |
| 1.1.15 | Ensure that the scheduler.conf file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for scheduler. All configuration is passed in as arguments at container run time. |
| 1.1.16 | Ensure that the scheduler.conf file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for scheduler. All configuration is passed in as arguments at container run time. |
| 1.1.17 | Ensure that the controller-manager.conf file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for controller-manager. All configuration is passed in as arguments at container run time. |
| 1.1.18 | Ensure that the controller-manager.conf file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesn't require or maintain a configuration file for controller-manager. All configuration is passed in as arguments at container run time. |
| 1.3.6 | Ensure that the RotateKubeletServerCertificate argument is set to true (Scored) | Clusters provisioned by RKE handles certificate rotation directly through RKE. |
| 4.1.1 | Ensure that the kubelet service file permissions are set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesnt require or maintain a configuration file for the kubelet service. All configuration is passed in as arguments at container run time. |
| 4.1.2 | Ensure that the kubelet service file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesnt require or maintain a configuration file for the kubelet service. All configuration is passed in as arguments at container run time. |
| 4.1.9 | Ensure that the kubelet configuration file has permissions set to 644 or more restrictive (Scored) | Clusters provisioned by RKE doesnt require or maintain a configuration file for the kubelet. All configuration is passed in as arguments at container run time. |
| 4.1.10 | Ensure that the kubelet configuration file ownership is set to root:root (Scored) | Clusters provisioned by RKE doesnt require or maintain a configuration file for the kubelet. All configuration is passed in as arguments at container run time. |
| 4.2.12 | Ensure that the RotateKubeletServerCertificate argument is set to true (Scored) | Clusters provisioned by RKE handles certificate rotation directly through RKE. |
# CIS Benchmark v1.4
The skipped and not applicable tests for CIS Benchmark v1.4 are as follows:
### CIS Benchmark v1.4 Skipped Tests
Number | Description | Reason for Skipping
---|---|---
1.1.11 | "Ensure that the admission control plugin AlwaysPullImages is set (Scored)" | Enabling AlwaysPullImages can use significant bandwidth.
1.1.21 | "Ensure that the --kubelet-certificate-authority argument is set as appropriate (Scored)" | When generating serving certificates, functionality could break in conjunction with hostname overrides which are required for certain cloud providers.
1.1.24 | "Ensure that the admission control plugin PodSecurityPolicy is set (Scored)" | Enabling Pod Security Policy can cause applications to unexpectedly fail.
1.1.34 | "Ensure that the --encryption-provider-config argument is set as appropriate (Scored)" | Enabling encryption changes how data can be recovered as data is encrypted.
1.1.35 | "Ensure that the encryption provider is set to aescbc (Scored)" | Enabling encryption changes how data can be recovered as data is encrypted.
1.1.36 | "Ensure that the admission control plugin EventRateLimit is set (Scored)" | EventRateLimit needs to be tuned depending on the cluster.
1.2.2 | "Ensure that the --address argument is set to 127.0.0.1 (Scored)" | Adding this argument prevents Rancher's monitoring tool to collect metrics on the scheduler.
1.3.7 | "Ensure that the --address argument is set to 127.0.0.1 (Scored)" | Adding this argument prevents Rancher's monitoring tool to collect metrics on the controller manager.
1.4.12 | "Ensure that the etcd data directory ownership is set to etcd:etcd (Scored)" | A system service account is required for etcd data directory ownership. Refer to Rancher's hardening guide for more details on how to configure this ownership.
1.7.2 | "Do not admit containers wishing to share the host process ID namespace (Scored)" | Enabling Pod Security Policy can cause applications to unexpectedly fail.
1.7.3 | "Do not admit containers wishing to share the host IPC namespace (Scored)" | Enabling Pod Security Policy can cause applications to unexpectedly fail.
1.7.4 | "Do not admit containers wishing to share the host network namespace (Scored)" | Enabling Pod Security Policy can cause applications to unexpectedly fail.
1.7.5 | " Do not admit containers with allowPrivilegeEscalation (Scored)" | Enabling Pod Security Policy can cause applications to unexpectedly fail.
2.1.6 | "Ensure that the --protect-kernel-defaults argument is set to true (Scored)" | System level configurations are required before provisioning the cluster in order for this argument to be set to true.
2.1.10 | "Ensure that the --tls-cert-file and --tls-private-key-file arguments are set as appropriate (Scored)" | When generating serving certificates, functionality could break in conjunction with hostname overrides which are required for certain cloud providers.
### CIS Benchmark v1.4 Not Applicable Tests
Number | Description | Reason for being not applicable
---|---|---
1.1.9 | "Ensure that the --repair-malformed-updates argument is set to false (Scored)" | The argument --repair-malformed-updates has been removed as of Kubernetes version 1.14
1.3.6 | "Ensure that the RotateKubeletServerCertificate argument is set to true" | Cluster provisioned by RKE handles certificate rotation directly through RKE.
1.4.1 | "Ensure that the API server pod specification file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for kube-apiserver.
1.4.2 | "Ensure that the API server pod specification file ownership is set to root:root (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for kube-apiserver.
1.4.3 | "Ensure that the controller manager pod specification file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for controller-manager.
1.4.4 | "Ensure that the controller manager pod specification file ownership is set to root:root (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for controller-manager.
1.4.5 | "Ensure that the scheduler pod specification file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for scheduler.
1.4.6 | "Ensure that the scheduler pod specification file ownership is set to root:root (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for scheduler.
1.4.7 | "Ensure that the etcd pod specification file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for etcd.
1.4.8 | "Ensure that the etcd pod specification file ownership is set to root:root (Scored)" | Cluster provisioned by RKE doesn't require or maintain a configuration file for etcd.
1.4.13 | "Ensure that the admin.conf file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE does not store the kubernetes default kubeconfig credentials file on the nodes.
1.4.14 | "Ensure that the admin.conf file ownership is set to root:root (Scored)" | Cluster provisioned by RKE does not store the kubernetes default kubeconfig credentials file on the nodes.
2.1.8 | "Ensure that the --hostname-override argument is not set (Scored)" | Clusters provisioned by RKE clusters and most cloud providers require hostnames.
2.1.12 | "Ensure that the --rotate-certificates argument is not set to false (Scored)" | Cluster provisioned by RKE handles certificate rotation directly through RKE.
2.1.13 | "Ensure that the RotateKubeletServerCertificate argument is set to true (Scored)" | Cluster provisioned by RKE handles certificate rotation directly through RKE.
2.2.3 | "Ensure that the kubelet service file permissions are set to 644 or more restrictive (Scored)" | Cluster provisioned by RKE doesnt require or maintain a configuration file for the kubelet service.
2.2.4 | "Ensure that the kubelet service file ownership is set to root:root (Scored)" | Cluster provisioned by RKE doesnt require or maintain a configuration file for the kubelet service.
2.2.9 | "Ensure that the kubelet configuration file ownership is set to root:root (Scored)" | RKE doesnt require or maintain a configuration file for the kubelet.
2.2.10 | "Ensure that the kubelet configuration file has permissions set to 644 or more restrictive (Scored)" | RKE doesnt require or maintain a configuration file for the kubelet.
@@ -0,0 +1,346 @@
---
title: Cluster Alerts
shortTitle: Alerts
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/tools/alerts
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/alerts/cluster-alerts
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-alerts
---
To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned. When an event occurs, your alert is triggered, and you are sent a notification. You can then, if necessary, follow up with corrective actions.
This section covers the following topics:
- [About Alerts](#about-alerts)
- [Alert Event Examples](#alert-event-examples)
- [Alerts Triggered by Prometheus Queries](#alerts-triggered-by-prometheus-queries)
- [Urgency Levels](#urgency-levels)
- [Scope of Alerts](#scope-of-alerts)
- [Managing Cluster Alerts](#managing-cluster-alerts)
- [Adding Cluster Alerts](#adding-cluster-alerts)
- [Cluster Alert Configuration](#cluster-alert-configuration)
- [System Service Alerts](#system-service-alerts)
- [Resource Event Alerts](#resource-event-alerts)
- [Node Alerts](#node-alerts)
- [Node Selector Alerts](#node-selector-alerts)
- [CIS Scan Alerts](#cis-scan-alerts)
- [Metric Expression Alerts](#metric-expression-alerts)
# About Alerts
Notifiers and alerts are built on top of the [Prometheus Alertmanager](https://prometheus.io/docs/alerting/alertmanager/). Leveraging these tools, Rancher can notify [cluster owners]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles) and [project owners]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#project-roles) of events they need to address.
Before you can receive alerts, you must configure one or more notifier in Rancher.
When you create a cluster, some alert rules are predefined. You can receive these alerts if you configure a [notifier]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers) for them.
For details about what triggers the predefined alerts, refer to the [documentation on default alerts.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/alerts/default-alerts)
### Alert Event Examples
Some examples of alert events are:
- A Kubernetes master component entering an unhealthy state.
- A node or workload error occurring.
- A scheduled deployment taking place as planned.
- A node's hardware resources becoming overstressed.
### Alerts Triggered by Prometheus Queries
When you edit an alert rule, you will have the opportunity to configure the alert to be triggered based on a Prometheus expression. For examples of expressions, refer to [this page.]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/)
Monitoring must be [enabled]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/) before you can trigger alerts with custom Prometheus queries or expressions.
### Urgency Levels
You can set an urgency level for each alert. This urgency appears in the notification you receive, helping you to prioritize your response actions. For example, if you have an alert configured to inform you of a routine deployment, no action is required. These alerts can be assigned a low priority level. However, if a deployment fails, it can critically impact your organization, and you need to react quickly. Assign these alerts a high priority level.
### Scope of Alerts
The scope for alerts can be set at either the cluster level or [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/alerts/).
At the cluster level, Rancher monitors components in your Kubernetes cluster, and sends you alerts related to:
- The state of your nodes.
- The system services that manage your Kubernetes cluster.
- The resource events from specific system services.
- The Prometheus expression cross the thresholds
### Managing Cluster Alerts
After you set up cluster alerts, you can manage each alert object. To manage alerts, browse to the cluster containing the alerts, and then select **Tools > Alerts** that you want to manage. You can:
- Deactivate/Reactive alerts
- Edit alert settings
- Delete unnecessary alerts
- Mute firing alerts
- Unmute muted alerts
# Adding Cluster Alerts
As a [cluster owner]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to send you alerts for cluster events.
>**Prerequisite:** Before you can receive cluster alerts, you must [add a notifier]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/notifiers/).
1. From the **Global** view, navigate to the cluster that you want to configure cluster alerts for. Select **Tools > Alerts**. Then click **Add Alert Group**.
1. Enter a **Name** for the alert that describes its purpose, you could group alert rules for the different purpose.
1. Based on the type of alert you want to create, refer to the [cluster alert configuration section.](#cluster-alert-configuration)
1. Continue adding more **Alert Rule** to the group.
1. Finally, choose the [notifiers]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers/) to send the alerts to.
- You can set up multiple notifiers.
- You can change notifier recipients on the fly.
1. Click **Create.**
**Result:** Your alert is configured. A notification is sent when the alert is triggered.
# Cluster Alert Configuration
- [System Service Alerts](#system-service-alerts)
- [Resource Event Alerts](#resource-event-alerts)
- [Node Alerts](#node-alerts)
- [Node Selector Alerts](#node-selector-alerts)
- [CIS Scan Alerts](#cis-scan-alerts)
- [Metric Expression Alerts](#metric-expression-alerts)
# System Service Alerts
This alert type monitor for events that affect one of the Kubernetes master components, regardless of the node it occurs on.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **System Services** option, and then select an option from the dropdown:
- [controller-manager](https://kubernetes.io/docs/concepts/overview/components/#kube-controller-manager)
- [etcd](https://kubernetes.io/docs/concepts/overview/components/#etcd)
- [scheduler](https://kubernetes.io/docs/concepts/overview/components/#kube-scheduler)
### Is
The alert will be triggered when the selected Kubernetes master component is unhealthy.
### Send a
Select the urgency level of the alert. The options are:
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level based on the importance of the service and how many nodes fill the role within your cluster. For example, if you're making an alert for the `etcd` service, select **Critical**. If you're making an alert for redundant schedulers, **Warning** is more appropriate.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
# Resource Event Alerts
This alert type monitors for specific events that are thrown from a resource type.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Choose the type of resource event that triggers an alert. The options are:
- **Normal**: triggers an alert when any standard resource event occurs.
- **Warning**: triggers an alert when unexpected resource events occur.
Select a resource type from the **Choose a Resource** drop-down that you want to trigger an alert.
- [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/)
- [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
- [Node](https://kubernetes.io/docs/concepts/architecture/nodes/)
- [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/)
- [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert by considering factors such as how often the event occurs or its importance. For example:
- If you set a normal alert for pods, you're likely to receive alerts often, and individual pods usually self-heal, so select an urgency of **Info**.
- If you set a warning alert for StatefulSets, it's very likely to impact operations, so select an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
# Node Alerts
This alert type monitors for events that occur on a specific node.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **Node** option, and then make a selection from the **Choose a Node** drop-down.
### Is
Choose an event to trigger the alert.
- **Not Ready**: Sends you an alert when the node is unresponsive.
- **CPU usage over**: Sends you an alert when the node raises above an entered percentage of its processing allocation.
- **Mem usage over**: Sends you an alert when the node raises above an entered percentage of its memory allocation.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
# Node Selector Alerts
This alert type monitors for events that occur on any node on marked with a label. For more information, see the Kubernetes documentation for [Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/).
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select the **Node Selector** option, and then click **Add Selector** to enter a key value pair for a label. This label should be applied to one or more of your nodes. Add as many selectors as you'd like.
### Is
Choose an event to trigger the alert.
- **Not Ready**: Sends you an alert when selected nodes are unresponsive.
- **CPU usage over**: Sends you an alert when selected nodes raise above an entered percentage of processing allocation.
- **Mem usage over**: Sends you an alert when selected nodes raise above an entered percentage of memory allocation.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
# CIS Scan Alerts
_Available as of v2.4.0_
This alert type is triggered based on the results of a CIS scan.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Select **CIS Scan.**
### Is
Choose an event to trigger the alert:
- Completed Scan
- Has Failure
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's CPU raises above 60% deems an urgency of **Info**, but a node that is **Not Ready** deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
# Metric Expression Alerts
This alert type monitors for the overload from Prometheus expression querying, it would be available after you enable monitoring.
Each of the below sections corresponds to a part of the alert rule configuration section in the Rancher UI.
### When a
Input or select an **Expression**, the dropdown shows the original metrics from Prometheus, including:
- [**Node**](https://github.com/prometheus/node_exporter)
- [**Container**](https://github.com/google/cadvisor)
- [**ETCD**](https://etcd.io/docs/v3.4.0/op-guide/monitoring/)
- [**Kubernetes Components**](https://github.com/kubernetes/metrics)
- [**Kubernetes Resources**](https://github.com/kubernetes/kube-state-metrics)
- [**Fluentd**](https://docs.fluentd.org/v1.0/articles/monitoring-prometheus) (supported by [Logging]({{<baseurl>}}/rancher/v2.0-v2.4//en/cluster-admin/tools/logging))
- [**Cluster Level Grafana**](http://docs.grafana.org/administration/metrics/)
- **Cluster Level Prometheus**
### Is
Choose a comparison:
- **Equal**: Trigger alert when expression value equal to the threshold.
- **Not Equal**: Trigger alert when expression value not equal to the threshold.
- **Greater Than**: Trigger alert when expression value greater than to threshold.
- **Less Than**: Trigger alert when expression value equal or less than the threshold.
- **Greater or Equal**: Trigger alert when expression value greater to equal to the threshold.
- **Less or Equal**: Trigger alert when expression value less or equal to the threshold.
If applicable, choose a comparison value or a threshold for the alert to be triggered.
### For
Select a duration for a trigger alert when the expression value crosses the threshold longer than the configured duration.
### Send a
Select the urgency level of the alert.
- **Critical**: Most urgent
- **Warning**: Normal urgency
- **Info**: Least urgent
Select the urgency level of the alert based on its impact on operations. For example, an alert triggered when a node's load expression ```sum(node_load5) / count(node_cpu_seconds_total{mode="system"})``` raises above 0.6 deems an urgency of **Info**, but 1 deems an urgency of **Critical**.
### Advanced Options
By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
- **Group Wait Time**: How long to wait to buffer alerts of the same group before sending initially, default to 30 seconds.
- **Group Interval Time**: How long to wait before sending an alert that has been added to a group which contains already fired alerts, default to 30 seconds.
- **Repeat Wait Time**: How long to wait before re-sending a given alert that has already been sent, default to 1 hour.
@@ -0,0 +1,60 @@
---
title: Default Alerts for Cluster Monitoring
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/tools/alerts/default-alerts
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/alerts/cluster-alerts/default-alerts
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-alerts/default-alerts
---
When you create a cluster, some alert rules are predefined. These alerts notify you about signs that the cluster could be unhealthy. You can receive these alerts if you configure a [notifier]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/notifiers) for them.
Several of the alerts use Prometheus expressions as the metric that triggers the alert. For more information on how expressions work, you can refer to the Rancher [documentation about Prometheus expressions]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/) or the Prometheus [documentation about querying metrics](https://prometheus.io/docs/prometheus/latest/querying/basics/).
# Alerts for etcd
Etcd is the key-value store that contains the state of the Kubernetes cluster. Rancher provides default alerts if the built-in monitoring detects a potential problem with etcd. You don't have to enable monitoring to receive these alerts.
A leader is the node that handles all client requests that need cluster consensus. For more information, you can refer to this [explanation of how etcd works.](https://rancher.com/blog/2019/2019-01-29-what-is-etcd/#how-does-etcd-work)
The leader of the cluster can change in response to certain events. It is normal for the leader to change, but too many changes can indicate a problem with the network or a high CPU load. With longer latencies, the default etcd configuration may cause frequent heartbeat timeouts, which trigger a new leader election.
| Alert | Explanation |
|-------|-------------|
| A high number of leader changes within the etcd cluster are happening | A warning alert is triggered when the leader changes more than three times in one hour. |
| Database usage close to the quota 500M | A warning alert is triggered when the size of etcd exceeds 500M.|
| Etcd is unavailable | A critical alert is triggered when etcd becomes unavailable. |
| Etcd member has no leader | A critical alert is triggered when the etcd cluster does not have a leader for at least three minutes. |
# Alerts for Kubernetes Components
Rancher provides alerts when core Kubernetes system components become unhealthy.
Controllers update Kubernetes resources based on changes in etcd. The [controller manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) monitors the cluster desired state through the Kubernetes API server and makes the necessary changes to the current state to reach the desired state.
The [scheduler](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/) service is a core component of Kubernetes. It is responsible for scheduling cluster workloads to nodes, based on various configurations, metrics, resource requirements and workload-specific requirements.
| Alert | Explanation |
|-------|-------------|
| Controller Manager is unavailable | A critical warning is triggered when the clusters controller-manager becomes unavailable. |
| Scheduler is unavailable | A critical warning is triggered when the clusters scheduler becomes unavailable. |
# Alerts for Events
Kubernetes events are objects that provide insight into what is happening inside a cluster, such as what decisions were made by the scheduler or why some pods were evicted from the node. In the Rancher UI, from the project view, you can see events for each workload.
| Alert | Explanation |
|-------|-------------|
| Get warning deployment event | A warning alert is triggered when a warning event happens on a deployment. |
# Alerts for Nodes
Alerts can be triggered based on node metrics. Each computing resource in a Kubernetes cluster is called a node. Nodes can be either bare-metal servers or virtual machines.
| Alert | Explanation |
|-------|-------------|
| High CPU load | A warning alert is triggered if the node uses more than 100 percent of the nodes available CPU seconds for at least three minutes. |
| High node memory utilization | A warning alert is triggered if the node uses more than 80 percent of its available memory for at least three minutes. |
| Node disk is running full within 24 hours | A critical alert is triggered if the disk space on the node is expected to run out in the next 24 hours based on the disk growth over the last 6 hours. |
# Project-level Alerts
When you enable monitoring for the project, some project-level alerts are provided. For details, refer to the [section on project-level alerts.]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/alerts/)
@@ -0,0 +1,126 @@
---
title: Cluster Logging
shortTitle: Logging
description: Rancher integrates with popular logging services. Learn the requirements and benefits of integrating with logging services, and enable logging on your cluster.
metaDescription: "Rancher integrates with popular logging services. Learn the requirements and benefits of integrating with logging services, and enable logging on your cluster."
weight: 3
aliases:
- /rancher/v2.0-v2.4/en/tasks/logging/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/
---
Logging is helpful because it allows you to:
- Capture and analyze the state of your cluster
- Look for trends in your environment
- Save your logs to a safe location outside of your cluster
- Stay informed of events like a container crashing, a pod eviction, or a node dying
- More easily debug and troubleshoot problems
Rancher supports integration with the following services:
- Elasticsearch
- Splunk
- Kafka
- Syslog
- Fluentd
This section covers the following topics:
- [How logging integrations work](#how-logging-integrations-work)
- [Requirements](#requirements)
- [Logging scope](#logging-scope)
- [Enabling cluster logging](#enabling-cluster-logging)
# How Logging Integrations Work
Rancher can integrate with popular external services used for event streams, telemetry, or search. These services can log errors and warnings in your Kubernetes infrastructure to a stream.
These services collect container log events, which are saved to the `/var/log/containers` directory on each of your nodes. The service collects both standard and error events. You can then log into your services to review the events collected, leveraging each service's unique features.
When configuring Rancher to integrate with these services, you'll have to point Rancher toward the service's endpoint and provide authentication information.
Additionally, you'll have the opportunity to enter key-value pairs to filter the log events collected. The service will only collect events for containers marked with your configured key-value pairs.
>**Note:** You can only configure one logging service per cluster or per project.
# Requirements
The Docker daemon on each node in the cluster should be [configured](https://docs.docker.com/config/containers/logging/configure/) with the (default) log-driver: `json-file`. You can check the log-driver by running the following command:
```
$ docker info | grep 'Logging Driver'
Logging Driver: json-file
```
# Logging Scope
You can configure logging at either cluster level or project level.
- Cluster logging writes logs for every pod in the cluster, i.e. in all the projects. For [RKE clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters), it also writes logs for all the Kubernetes system components.
- [Project logging]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/logging/) writes logs for every pod in that particular project.
Logs that are sent to your logging service are from the following locations:
- Pod logs stored at `/var/log/containers`.
- Kubernetes system components logs stored at `/var/lib/rancher/rke/log/`.
# Enabling Cluster Logging
As an [administrator]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) or [cluster owner]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to send Kubernetes logs to a logging service.
1. From the **Global** view, navigate to the cluster that you want to configure cluster logging.
1. Select **Tools > Logging** in the navigation bar.
1. Select a logging service and enter the configuration. Refer to the specific service for detailed configuration. Rancher supports integration with the following services:
- [Elasticsearch]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/elasticsearch/)
- [Splunk]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/splunk/)
- [Kafka]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/kafka/)
- [Syslog]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/syslog/)
- [Fluentd]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/fluentd/)
1. (Optional) Instead of using the UI to configure the logging services, you can enter custom advanced configurations by clicking on **Edit as File**, which is located above the logging targets. This link is only visible after you select a logging service.
- With the file editor, enter raw fluentd configuration for any logging service. Refer to the documentation for each logging service on how to setup the output configuration.
- [Elasticsearch Documentation](https://github.com/uken/fluent-plugin-elasticsearch)
- [Splunk Documentation](https://github.com/fluent/fluent-plugin-splunk)
- [Kafka Documentation](https://github.com/fluent/fluent-plugin-kafka)
- [Syslog Documentation](https://github.com/dlackty/fluent-plugin-remote_syslog)
- [Fluentd Documentation](https://docs.fluentd.org/v1.0/articles/out_forward)
- If the logging service is using TLS, you also need to complete the **SSL Configuration** form.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
- You can use either a self-signed certificate or one provided by a certificate authority.
- You can generate a self-signed certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
2. If you are using a self-signed certificate, provide the **CA Certificate PEM**.
1. (Optional) Complete the **Additional Logging Configuration** form.
1. **Optional:** Use the **Add Field** button to add custom log fields to your logging configuration. These fields are key value pairs (such as `foo=bar`) that you can use to filter the logs from another system.
1. Enter a **Flush Interval**. This value determines how often [Fluentd](https://www.fluentd.org/) flushes data to the logging server. Intervals are measured in seconds.
1. **Include System Log**. The logs from pods in system project and RKE components will be sent to the target. Uncheck it to exclude the system logs.
1. Click **Test**. Rancher sends a test log to the service.
> **Note:** This button is replaced with _Dry Run_ if you are using the custom configuration editor. In this case, Rancher calls the fluentd dry run command to validate the configuration.
1. Click **Save**.
**Result:** Rancher is now configured to send logs to the selected service. Log into the logging service so that you can start viewing the logs.
## Related Links
[Logging Architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/)
@@ -0,0 +1,46 @@
---
title: Elasticsearch
weight: 200
aliases:
- /rancher/v2.0-v2.4/en/tools/logging/elasticsearch/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging/elasticsearch
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging/elasticsearch
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/elasticsearch
---
If your organization uses [Elasticsearch](https://www.elastic.co/), either on premise or in the cloud, you can configure Rancher to send it Kubernetes logs. Afterwards, you can log into your Elasticsearch deployment to view logs.
>**Prerequisites:** Configure an [Elasticsearch deployment](https://www.elastic.co/guide/en/cloud/saas-release/ec-create-deployment.html).
## Elasticsearch Deployment Configuration
1. In the **Endpoint** field, enter the IP address and port of your Elasticsearch instance. You can find this information from the dashboard of your Elasticsearch deployment.
* Elasticsearch usually uses port `9200` for HTTP and `9243` for HTTPS.
1. If you are using [X-Pack Security](https://www.elastic.co/guide/en/x-pack/current/xpack-introduction.html), enter your Elasticsearch **Username** and **Password** for authentication.
1. Enter an [Index Pattern](https://www.elastic.co/guide/en/kibana/current/index-patterns.html).
## SSL Configuration
If your instance of Elasticsearch uses SSL, your **Endpoint** will need to begin with `https://`. With the correct endpoint, the **SSL Configuration** form is enabled and ready to be completed.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
- You can use either a self-signed certificate or one provided by a certificate authority.
- You can generate a self-signed certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
1. Enter your **Client Key Password**.
1. Enter your **SSL Version**. The default version is `TLSv1_2`.
1. Select whether or not you want to verify your SSL.
* If you are using a self-signed certificate, select **Enabled - Input trusted server certificate**, provide the **CA Certificate PEM**. You can copy and paste the certificate or upload it using the **Read from a file** button.
* If you are using a certificate from a certificate authority, select **Enabled - Input trusted server certificate**. You do not need to provide a **CA Certificate PEM**.
@@ -0,0 +1,38 @@
---
title: Fluentd
weight: 600
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging/fluentd
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging/fluentd
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/fluentd
---
If your organization uses [Fluentd](https://www.fluentd.org/), you can configure Rancher to send it Kubernetes logs. Afterwards, you can log into your Fluentd server to view logs.
>**Prerequisites:** Configure Fluentd input forward to receive the event stream.
>
>See [Fluentd Documentation](https://docs.fluentd.org/v1.0/articles/in_forward) for details.
## Fluentd Configuration
You can add multiple Fluentd Servers. If you want to add additional Fluentd servers, click **Add Fluentd Server**. For each Fluentd server, complete the configuration information:
1. In the **Endpoint** field, enter the address and port of your Fluentd instance, e.g. `http://Fluentd-server:24224`.
1. Enter the **Shared Key** if your Fluentd Server is using a shared key for authentication.
1. Enter the **Username** and **Password** if your Fluentd Server is using username and password for authentication.
1. **Optional:** Enter the **Hostname** of the Fluentd server.
1. Enter the load balancing **Weight** of the Fluentd server. If the weight of one server is 20 and the other server is 30, events will be sent in a 2:3 ratio. If you do not enter a weight, the default weight is 60.
1. If this server is a standby server, check **Use as Standby Only**. Standby servers are used when all other servers are not available.
After adding all the Fluentd servers, you have the option to select **Enable Gzip Compression**. By default, this is enabled because the transferred payload size will be reduced.
## SSL Configuration
If your Fluentd servers are using TLS, you need to select **Use TLS**. If you are using a self-signed certificate, provide the **CA Certificate PEM**. You can copy and paste the certificate or upload it using the **Read from a file** button.
>**Note:** Fluentd does not support self-signed certificates when client authentication is enabled.
@@ -0,0 +1,46 @@
---
title: Kafka
weight: 400
aliases:
- /rancher/v2.0-v2.4/en/tools/logging/kafka/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging/kafka
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging/kafka
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/kafka
---
If your organization uses [Kafka](https://kafka.apache.org/), you can configure Rancher to send it Kubernetes logs. Afterwards, you can log into your Kafka server to view logs.
>**Prerequisite:** You must have a Kafka server configured.
## Kafka Server Configuration
1. Select the type of **Endpoint** your Kafka server is using:
* **Zookeeper**: Enter the IP address and port. By default, Zookeeper uses port `2181`. Please note that a Zookeeper endpoint cannot enable TLS.
* **Broker**: Click on **Add Endpoint**. For each Kafka broker, enter the IP address and port. By default, Kafka brokers use port `9092`.
1. In the **Topic** field, enter the name of a Kafka [topic](https://kafka.apache.org/documentation/#basic_ops_add_topic) that your Kubernetes cluster submits logs to.
## **Broker** Endpoint Type
### SSL Configuration
If your Kafka cluster is using SSL for the **Broker**, you need to complete the **SSL Configuration** form.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
1. Provide the **CA Certificate PEM**. You can either copy and paste the certificate or upload it using the **Read from a file** button.
>**Note:** Kafka does not support self-signed certificates when client authentication is enabled.
### SASL configuration
If your Kafka cluster is using [SASL authentication](https://kafka.apache.org/documentation/#security_sasl) for the Broker, you need to complete the **SASL Configuration** form.
1. Enter the SASL **Username** and **Password**.
1. Select the **SASL Type** that your Kafka cluster is using.
* If your Kafka is using **Plain**, please ensure your Kafka cluster is using SSL.
* If your Kafka is using **Scram**, you need to select which **Scram Mechanism** Kafka is using.
@@ -0,0 +1,79 @@
---
title: Splunk
weight: 300
aliases:
- /rancher/v2.0-v2.4/en/tasks/logging/splunk/
- /rancher/v2.0-v2.4/en/tools/logging/splunk/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging/splunk
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging/splunk
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/splunk
---
If your organization uses [Splunk](https://www.splunk.com/), you can configure Rancher to send it Kubernetes logs. Afterwards, you can log into your Splunk server to view logs.
>**Prerequisites:**
>
>- Configure HTTP event collection for your Splunk Server (Splunk Enterprise or Splunk Cloud).
>- Either create a new token or copy an existing token.
>
>For more information, see [Splunk Documentation](http://docs.splunk.com/Documentation/Splunk/7.1.2/Data/UsetheHTTPEventCollector#About_Event_Collector_tokens).
## Splunk Configuration
1. In the **Endpoint** field, enter the IP address and port for you Splunk instance (i.e. `http://splunk-server:8088`)
* Splunk usually uses port `8088`. If you're using Splunk Cloud, you'll need to work with [Splunk support](https://www.splunk.com/en_us/support-and-services.html) to get an endpoint URL.
1. Enter the **Token** you obtained while completing the prerequisites (i.e., when you created a token in Splunk).
1. In the **Source** field, enter the name of the token as entered in Splunk.
1. **Optional:** Provide one or more [index](http://docs.splunk.com/Documentation/Splunk/7.1.2/Indexer/Aboutindexesandindexers) that's allowed for your token.
## SSL Configuration
If your instance of Splunk uses SSL, your **Endpoint** will need to begin with `https://`. With the correct endpoint, the **SSL Configuration** form is enabled and ready to be completed.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
- You can use either a self-signed certificate or one provided by a certificate authority.
- You can generate a self-signed certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
1. Enter your **Client Key Password**.
1. Select whether or not you want to verify your SSL.
* If you are using a self-signed certificate, select **Enabled - Input trusted server certificate**, provide the **CA Certificate PEM**. You can copy and paste the certificate or upload it using the **Read from a file** button.
* If you are using a certificate from a certificate authority, select **Enabled - Input trusted server certificate**. You do not need to provide a **CA Certificate PEM**.
## Viewing Logs
1. Log into your Splunk server.
1. Click on **Search & Reporting**. The number of **Indexed Events** listed should be increasing.
1. Click on Data Summary and select the Sources tab.
![View Logs]({{<baseurl>}}/img/rancher/splunk/splunk4.jpg)
1. To view the actual logs, click on the source that you declared earlier.
![View Logs]({{<baseurl>}}/img/rancher/splunk/splunk5.jpg)
## Troubleshooting
You can use curl to see if **HEC** is listening for HTTP event data.
```
$ curl http://splunk-server:8088/services/collector/event \
-H 'Authorization: Splunk 8da70994-b1b0-4a79-b154-bfaae8f93432' \
-d '{"event": "hello world"}'
```
If Splunk is configured correctly, you should receive **json** data returning `success code 0`. You should be able
to send logging data to HEC.
If you received an error, check your configuration in Splunk and Rancher.
@@ -0,0 +1,46 @@
---
title: Syslog
weight: 500
aliases:
- /rancher/v2.0-v2.4/en/tools/logging/syslog/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/logging/syslog
- /rancher/v2.0-v2.4/en/logging/legacy/cluster-logging/syslog
- /rancher/v2.0-v2.4/en/logging/v2.0.x-v2.4.x/cluster-logging/syslog
---
If your organization uses [Syslog](https://tools.ietf.org/html/rfc5424), you can configure Rancher to send it Kubernetes logs. Afterwards, you can log into your Syslog server to view logs.
>**Prerequisite:** You must have a Syslog server configured.
If you are using rsyslog, please make sure your rsyslog authentication mode is `x509/name`.
## Syslog Server Configuration
1. In the **Endpoint** field, enter the IP address and port for your Syslog server. Additionally, in the dropdown, select the protocol that your Syslog server uses.
1. In the **Program** field, enter the name of the application sending logs to your Syslog server, e.g. `Rancher`.
1. If you are using a cloud logging service, e.g. [Sumologic](https://www.sumologic.com/), enter a **Token** that authenticates with your Syslog server. You will need to create this token in the cloud logging service.
1. Select a **Log Severity** for events that are logged to the Syslog server. For more information on each severity level, see the [Syslog protocol documentation](https://tools.ietf.org/html/rfc5424#page-11).
- By specifying a **Log Severity** does not mean that will act as a filtering mechanism for logs. To do that you should use a parser on the Syslog server.
## Encryption Configuration
If your Syslog server is using **TCP** protocol and uses TLS, you need to select **Use TLS** and complete the **Encryption Configuration** form.
1. Provide the **Client Private Key** and **Client Certificate**. You can either copy and paste them or upload them by using the **Read from a file** button.
- You can use either a self-signed certificate or one provided by a certificate authority.
- You can generate a self-signed certificate using an openssl command. For example:
```
openssl req -x509 -newkey rsa:2048 -keyout myservice.key -out myservice.cert -days 365 -nodes -subj "/CN=myservice.example.com"
```
1. Select whether or not you want to verify your SSL.
* If you are using a self-signed certificate, select **Enabled - Input trusted server certificate**, provide the **CA Certificate PEM**. You can copy and paste the certificate or upload it using the **Read from a file** button.
* If you are using a certificate from a certificate authority, select **Enabled - Input trusted server certificate**. You do not need to provide a **CA Certificate PEM**.
@@ -0,0 +1,111 @@
---
title: Integrating Rancher and Prometheus for Cluster Monitoring
shortTitle: Monitoring
description: Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Learn about the scope of monitoring and how to enable cluster monitoring
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/monitoring/cluster-monitoring
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring
---
_Available as of v2.2.0_
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
This section covers the following topics:
- [About Prometheus](#about-prometheus)
- [Monitoring scope](#monitoring-scope)
- [Enabling cluster monitoring](#enabling-cluster-monitoring)
- [Resource consumption](#resource-consumption)
- [Resource consumption of Prometheus pods](#resource-consumption-of-prometheus-pods)
- [Resource consumption of other pods](#resource-consumption-of-other-pods)
# About Prometheus
Prometheus provides a _time series_ of your data, which is, according to [Prometheus documentation](https://prometheus.io/docs/concepts/data_model/):
You can configure these services to collect logs at either the cluster level or the project level. This page describes how to enable monitoring for a cluster. For details on enabling monitoring for a project, refer to the [project administration section]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/).
>A stream of timestamped values belonging to the same metric and the same set of labeled dimensions, along with comprehensive statistics and metrics of the monitored cluster.
In other words, Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Using timestamps, Prometheus lets you query and view these metrics in easy-to-read graphs and visuals, either through the Rancher UI or [Grafana](https://grafana.com/), which is an analytics viewing platform deployed along with Prometheus.
By viewing data that Prometheus scrapes from your cluster control plane, nodes, and deployments, you can stay on top of everything happening in your cluster. You can then use these analytics to better run your organization: stop system emergencies before they start, develop maintenance strategies, restore crashed servers, etc.
Multi-tenancy support in terms of cluster-only and project-only Prometheus instances are also supported.
# Monitoring Scope
Using Prometheus, you can monitor Rancher at both the cluster level and [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/). For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
- Cluster monitoring allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- Kubernetes control plane
- etcd database
- All nodes (including workers)
- [Project monitoring]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/) allows you to view the state of pods running in a given project. Prometheus collects metrics from the project's deployed HTTP and TCP/UDP workloads.
# Enabling Cluster Monitoring
As an [administrator]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/) or [cluster owner]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.
> **Prerequisite:** Make sure that you are allowing traffic on port 9796 for each of your nodes because Prometheus will scrape metrics from here.
1. From the **Global** view, navigate to the cluster that you want to configure cluster monitoring.
1. Select **Tools > Monitoring** in the navigation bar.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/prometheus/). Review the [resource consumption recommendations](#resource-consumption) to ensure you have enough resources for Prometheus and on your worker nodes to enable monitoring. Enter in your desired configuration options.
1. Click **Save**.
**Result:** The Prometheus server will be deployed as well as two monitoring applications. The two monitoring applications, `cluster-monitoring` and `monitoring-operator`, are added as an [application]({{<baseurl>}}/rancher/v2.0-v2.4/en/catalog/apps/) to the cluster's `system` project. After the applications are `active`, you can start viewing [cluster metrics]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics/) through the Rancher dashboard or directly from Grafana.
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
# Resource Consumption
When enabling cluster monitoring, you need to ensure your worker nodes and Prometheus pod have enough resources. The tables below provides a guide of how much resource consumption will be used. In larger deployments, it is strongly advised that the monitoring infrastructure be placed on dedicated nodes in the cluster.
### Resource Consumption of Prometheus Pods
This table is the resource consumption of the Prometheus pod, which is based on the number of all the nodes in the cluster. The count of nodes includes the worker, control plane and etcd nodes. Total disk space allocation should be approximated by the `rate * retention` period set at the cluster level. When enabling cluster level monitoring, you should adjust the CPU and Memory limits and reservation.
Number of Cluster Nodes | CPU (milli CPU) | Memory | Disk
------------------------|-----|--------|------
5 | 500 | 650 MB | ~1 GB/Day
50| 2000 | 2 GB | ~5 GB/Day
256| 4000 | 6 GB | ~18 GB/Day
Additional pod resource requirements for cluster level monitoring.
| Workload | Container | CPU - Request | Mem - Request | CPU - Limit | Mem - Limit | Configurable |
|---------------------|---------------------------------|---------------|---------------|-------------|-------------|--------------|
| Prometheus | prometheus | 750m | 750Mi | 1000m | 1000Mi | Y |
| | prometheus-proxy | 50m | 50Mi | 100m | 100Mi | Y |
| | prometheus-auth | 100m | 100Mi | 500m | 200Mi | Y |
| | prometheus-config-reloader | - | - | 50m | 50Mi | N |
| | rules-configmap-reloader | - | - | 100m | 25Mi | N |
| Grafana | grafana-init-plugin-json-copy | 50m | 50Mi | 50m | 50Mi | Y |
| | grafana-init-plugin-json-modify | 50m | 50Mi | 50m | 50Mi | Y |
| | grafana | 100m | 100Mi | 200m | 200Mi | Y |
| | grafana-proxy | 50m | 50Mi | 100m | 100Mi | Y |
| Kube-State Exporter | kube-state | 100m | 130Mi | 100m | 200Mi | Y |
| Node Exporter | exporter-node | 200m | 200Mi | 200m | 200Mi | Y |
| Operator | prometheus-operator | 100m | 50Mi | 200m | 100Mi | Y |
### Resource Consumption of Other Pods
Besides the Prometheus pod, there are components that are deployed that require additional resources on the worker nodes.
Pod | CPU (milli CPU) | Memory (MB)
----|-----------------|------------
Node Exporter (Per Node) | 100 | 30
Kube State Cluster Monitor | 100 | 130
Grafana | 100 | 150
Prometheus Cluster Monitoring Nginx | 50 | 50
@@ -0,0 +1,118 @@
---
title: Cluster Metrics
weight: 3
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring/cluster-metrics
- /rancher/v2.0-v2.4/en/cluster-admin/tools/monitoring/cluster-metrics
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/cluster-metrics
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/cluster-metrics
---
_Available as of v2.2.0_
Cluster metrics display the hardware utilization for all nodes in your cluster, regardless of its role. They give you a global monitoring insight into the cluster.
Some of the biggest metrics to look out for:
- **CPU Utilization**
High load either indicates that your cluster is running efficiently or that you're running out of CPU resources.
- **Disk Utilization**
Be on the lookout for increased read and write rates on nodes nearing their disk capacity. This advice is especially true for etcd nodes, as running out of storage on an etcd node leads to cluster failure.
- **Memory Utilization**
Deltas in memory utilization usually indicate a memory leak.
- **Load Average**
Generally, you want your load average to match your number of logical CPUs for the cluster. For example, if your cluster has 8 logical CPUs, the ideal load average would be 8 as well. If you load average is well under the number of logical CPUs for the cluster, you may want to reduce cluster resources. On the other hand, if your average is over 8, your cluster may need more resources.
## Finding Node Metrics
1. From the **Global** view, navigate to the cluster that you want to view metrics.
1. Select **Nodes** in the navigation bar.
1. Select a specific node and click on its name.
1. Click on **Node Metrics**.
[_Get expressions for Cluster Metrics_]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/#cluster-metrics)
### Etcd Metrics
>**Note:** Only supported for [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/).
Etcd metrics display the operations of the etcd database on each of your cluster nodes. After establishing a baseline of normal etcd operational metrics, observe them for abnormal deltas between metric refreshes, which indicate potential issues with etcd. Always address etcd issues immediately!
You should also pay attention to the text at the top of the etcd metrics, which displays leader election statistics. This text indicates if etcd currently has a leader, which is the etcd instance that coordinates the other etcd instances in your cluster. A large increase in leader changes implies etcd is unstable. If you notice a change in leader election statistics, you should investigate them for issues.
Some of the biggest metrics to look out for:
- **Etcd has a leader**
etcd is usually deployed on multiple nodes and elects a leader to coordinate its operations. If etcd does not have a leader, its operations are not being coordinated.
- **Number of leader changes**
If this statistic suddenly grows, it usually indicates network communication issues that constantly force the cluster to elect a new leader.
[_Get expressions for Etcd Metrics_]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/#etcd-metrics)
### Kubernetes Components Metrics
Kubernetes components metrics display data about the cluster's individual Kubernetes components. Primarily, it displays information about connections and latency for each component: the API server, controller manager, scheduler, and ingress controller.
>**Note:** The metrics for the controller manager, scheduler and ingress controller are only supported for [Rancher launched Kubernetes clusters]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-provisioning/rke-clusters/).
When analyzing Kubernetes component metrics, don't be concerned about any single standalone metric in the charts and graphs that display. Rather, you should establish a baseline for metrics considered normal following a period of observation, e.g. the range of values that your components usually operate within and are considered normal. After you establish this baseline, be on the lookout for large deltas in the charts and graphs, as these big changes usually indicate a problem that you need to investigate.
Some of the more important component metrics to monitor are:
- **API Server Request Latency**
Increasing API response times indicate there's a generalized problem that requires investigation.
- **API Server Request Rate**
Rising API request rates usually coincide with increased API response times. Increased request rates also indicate a generalized problem requiring investigation.
- **Scheduler Preemption Attempts**
If you see a spike in scheduler preemptions, it's an indication that you're running out of hardware resources, as Kubernetes is recognizing it doesn't have enough resources to run all your pods and is prioritizing the more important ones.
- **Scheduling Failed Pods**
Failed pods can have a variety of causes, such as unbound persistent volume claims, exhausted hardware resources, non-responsive nodes, etc.
- **Ingress Controller Request Process Time**
How fast ingress is routing connections to your cluster services.
[_Get expressions for Kubernetes Component Metrics_]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression/#kubernetes-components-metrics)
## Rancher Logging Metrics
Although the Dashboard for a cluster primarily displays data sourced from Prometheus, it also displays information for cluster logging, provided that you have [configured Rancher to use a logging service]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/logging/).
[_Get expressions for Rancher Logging Metrics_]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/#rancher-logging-metrics)
## Finding Workload Metrics
Workload metrics display the hardware utilization for a Kubernetes workload. You can also view metrics for [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [stateful sets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) and so on.
1. From the **Global** view, navigate to the project that you want to view workload metrics.
1. From the main navigation bar, choose **Resources > Workloads.** In versions before v2.3.0, choose **Workloads** on the main navigation bar.
1. Select a specific workload and click on its name.
1. In the **Pods** section, select a specific pod and click on its name.
- **View the Pod Metrics:** Click on **Pod Metrics**.
- **View the Container Metrics:** In the **Containers** section, select a specific container and click on its name. Click on **Container Metrics**.
[_Get expressions for Workload Metrics_]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/#workload-metrics)
@@ -0,0 +1,492 @@
---
title: Prometheus Custom Metrics Adapter
weight: 5
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring/custom-metrics
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/custom-metrics
- /rancher/v2.0-v2.4/en/cluster-admin/tools/monitoring/custom-metrics/
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/custom-metrics
---
After you've enabled [cluster level monitoring]({{< baseurl >}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/), You can view the metrics data from Rancher. You can also deploy the Prometheus custom metrics adapter then you can use the HPA with metrics stored in cluster monitoring.
## Deploy Prometheus Custom Metrics Adapter
We are going to use the [Prometheus custom metrics adapter](https://github.com/DirectXMan12/k8s-prometheus-adapter/releases/tag/v0.5.0), version v0.5.0. This is a great example for the [custom metrics server](https://github.com/kubernetes-incubator/custom-metrics-apiserver). And you must be the *cluster owner* to execute following steps.
- Get the service account of the cluster monitoring is using. It should be configured in the workload ID: `statefulset:cattle-prometheus:prometheus-cluster-monitoring`. And if you didn't customize anything, the service account name should be `cluster-monitoring`.
- Grant permission to that service account. You will need two kinds of permission.
One role is `extension-apiserver-authentication-reader` in `kube-system`, so you will need to create a `Rolebinding` to in `kube-system`. This permission is to get api aggregation configuration from config map in `kube-system`.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: custom-metrics-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
The other one is cluster role `system:auth-delegator`, so you will need to create a `ClusterRoleBinding`. This permission is to have subject access review permission.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: custom-metrics:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: cluster-monitoring
namespace: cattle-prometheus
```
- Create configuration for custom metrics adapter. Following is an example configuration. There will be a configuration details in next session.
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: cattle-prometheus
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)$
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
template: <<.Resource>>
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
template: <<.Resource>>
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
resourceRules:
cpu:
containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
memory:
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
window: 1m
```
- Create HTTPS TLS certs for your api server. You can use following command to create a self-signed cert.
```bash
openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out serving.crt -keyout serving.key -subj "/C=CN/CN=custom-metrics-apiserver.cattle-prometheus.svc.cluster.local"
# And you will find serving.crt and serving.key in your path. And then you are going to create a secret in cattle-prometheus namespace.
kubectl create secret generic -n cattle-prometheus cm-adapter-serving-certs --from-file=serving.key=./serving.key --from-file=serving.crt=./serving.crt
```
- Then you can create the prometheus custom metrics adapter. And you will need a service for this deployment too. Creating it via Import YAML or Rancher would do. Please create those resources in `cattle-prometheus` namespaces.
Here is the prometheus custom metrics adapter deployment.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
replicas: 1
selector:
matchLabels:
app: custom-metrics-apiserver
template:
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
serviceAccountName: cluster-monitoring
containers:
- name: custom-metrics-apiserver
image: directxman12/k8s-prometheus-adapter-amd64:v0.5.0
args:
- --secure-port=6443
- --tls-cert-file=/var/run/serving-cert/serving.crt
- --tls-private-key-file=/var/run/serving-cert/serving.key
- --logtostderr=true
- --prometheus-url=http://prometheus-operated/
- --metrics-relist-interval=1m
- --v=10
- --config=/etc/adapter/config.yaml
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
name: volume-serving-cert
readOnly: true
- mountPath: /etc/adapter/
name: config
readOnly: true
- mountPath: /tmp
name: tmp-vol
volumes:
- name: volume-serving-cert
secret:
secretName: cm-adapter-serving-certs
- name: config
configMap:
name: adapter-config
- name: tmp-vol
emptyDir: {}
```
Here is the service of the deployment.
```yaml
apiVersion: v1
kind: Service
metadata:
name: custom-metrics-apiserver
namespace: cattle-prometheus
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: custom-metrics-apiserver
```
- Create API service for your custom metric server.
```yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: cattle-prometheus
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
```
- Then you can verify your custom metrics server by `kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1`. If you see the return datas from the api, it means that the metrics server has been successfully set up.
- You create HPA with custom metrics now. Here is an example of HPA. You will need to create a nginx deployment in your namespace first.
```yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: nginx
spec:
scaleTargetRef:
# point the HPA at the nginx deployment you just created
apiVersion: apps/v1
kind: Deployment
name: nginx
# autoscale between 1 and 10 replicas
minReplicas: 1
maxReplicas: 10
metrics:
# use a "Pods" metric, which takes the average of the
# given metric across all pods controlled by the autoscaling target
- type: Pods
pods:
metricName: memory_usage_bytes
targetAverageValue: 5000000
```
And then, you should see your nginx is scaling up. HPA with custom metrics works.
## Configuration of prometheus custom metrics adapter
> Refer to https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/config.md
The adapter determines which metrics to expose, and how to expose them,
through a set of "discovery" rules. Each rule is executed independently
(so make sure that your rules are mutually exclusive), and specifies each
of the steps the adapter needs to take to expose a metric in the API.
Each rule can be broken down into roughly four parts:
- *Discovery*, which specifies how the adapter should find all Prometheus
metrics for this rule.
- *Association*, which specifies how the adapter should determine which
Kubernetes resources a particular metric is associated with.
- *Naming*, which specifies how the adapter should expose the metric in
the custom metrics API.
- *Querying*, which specifies how a request for a particular metric on one
or more Kubernetes objects should be turned into a query to Prometheus.
A basic config with one rule might look like:
```yaml
rules:
# this rule matches cumulative cAdvisor metrics measured in seconds
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
resources:
# skip specifying generic resource<->label mappings, and just
# attach only pod and namespace resources by mapping label names to group-resources
overrides:
namespace: {resource: "namespace"},
pod_name: {resource: "pod"},
# specify that the `container_` and `_seconds_total` suffixes should be removed.
# this also introduces an implicit filter on metric family names
name:
# we use the value of the capture group implicitly as the API name
# we could also explicitly write `as: "$1"`
matches: "^container_(.*)_seconds_total$"
# specify how to construct a query to fetch samples for a given series
# This is a Go template where the `.Series` and `.LabelMatchers` string values
# are available, and the delimiters are `<<` and `>>` to avoid conflicts with
# the prometheus query language
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
### Discovery
Discovery governs the process of finding the metrics that you want to
expose in the custom metrics API. There are two fields that factor into
discovery: `seriesQuery` and `seriesFilters`.
`seriesQuery` specifies Prometheus series query (as passed to the
`/api/v1/series` endpoint in Prometheus) to use to find some set of
Prometheus series. The adapter will strip the label values from this
series, and then use the resulting metric-name-label-names combinations
later on.
In many cases, `seriesQuery` will be sufficient to narrow down the list of
Prometheus series. However, sometimes (especially if two rules might
otherwise overlap), it's useful to do additional filtering on metric
names. In this case, `seriesFilters` can be used. After the list of
series is returned from `seriesQuery`, each series has its metric name
filtered through any specified filters.
Filters may be either:
- `is: <regex>`, which matches any series whose name matches the specified
regex.
- `isNot: <regex>`, which matches any series whose name does not match the
specified regex.
For example:
```yaml
# match all cAdvisor metrics that aren't measured in seconds
seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
isNot: "^container_.*_seconds_total"
```
### Association
Association governs the process of figuring out which Kubernetes resources
a particular metric could be attached to. The `resources` field controls
this process.
There are two ways to associate resources with a particular metric. In
both cases, the value of the label becomes the name of the particular
object.
One way is to specify that any label name that matches some particular
pattern refers to some group-resource based on the label name. This can
be done using the `template` field. The pattern is specified as a Go
template, with the `Group` and `Resource` fields representing group and
resource. You don't necessarily have to use the `Group` field (in which
case the group is guessed by the system). For instance:
```yaml
# any label `kube_<group>_<resource>` becomes <group>.<resource> in Kubernetes
resources:
template: "kube_<<.Group>>_<<.Resource>>"
```
The other way is to specify that some particular label represents some
particular Kubernetes resource. This can be done using the `overrides`
field. Each override maps a Prometheus label to a Kubernetes
group-resource. For instance:
```yaml
# the microservice label corresponds to the apps.deployment resource
resource:
overrides:
microservice: {group: "apps", resource: "deployment"}
```
These two can be combined, so you can specify both a template and some
individual overrides.
The resources mentioned can be any resource available in your kubernetes
cluster, as long as you've got a corresponding label.
### Naming
Naming governs the process of converting a Prometheus metric name into
a metric in the custom metrics API, and vice versa. It's controlled by
the `name` field.
Naming is controlled by specifying a pattern to extract an API name from
a Prometheus name, and potentially a transformation on that extracted
value.
The pattern is specified in the `matches` field, and is just a regular
expression. If not specified, it defaults to `.*`.
The transformation is specified by the `as` field. You can use any
capture groups defined in the `matches` field. If the `matches` field
doesn't contain capture groups, the `as` field defaults to `$0`. If it
contains a single capture group, the `as` field defautls to `$1`.
Otherwise, it's an error not to specify the as field.
For example:
```yaml
# match turn any name <name>_total to <name>_per_second
# e.g. http_requests_total becomes http_requests_per_second
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
```
### Querying
Querying governs the process of actually fetching values for a particular
metric. It's controlled by the `metricsQuery` field.
The `metricsQuery` field is a Go template that gets turned into
a Prometheus query, using input from a particular call to the custom
metrics API. A given call to the custom metrics API is distilled down to
a metric name, a group-resource, and one or more objects of that
group-resource. These get turned into the following fields in the
template:
- `Series`: the metric name
- `LabelMatchers`: a comma-separated list of label matchers matching the
given objects. Currently, this is the label for the particular
group-resource, plus the label for namespace, if the group-resource is
namespaced.
- `GroupBy`: a comma-separated list of labels to group by. Currently,
this contains the group-resource label used in `LabelMatchers`.
For instance, suppose we had a series `http_requests_total` (exposed as
`http_requests_per_second` in the API) with labels `service`, `pod`,
`ingress`, `namespace`, and `verb`. The first four correspond to
Kubernetes resources. Then, if someone requested the metric
`pods/http_request_per_second` for the pods `pod1` and `pod2` in the
`somens` namespace, we'd have:
- `Series: "http_requests_total"`
- `LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"`
- `GroupBy`: `pod`
Additionally, there are two advanced fields that are "raw" forms of other
fields:
- `LabelValuesByName`: a map mapping the labels and values from the
`LabelMatchers` field. The values are pre-joined by `|`
(for used with the `=~` matcher in Prometheus).
- `GroupBySlice`: the slice form of `GroupBy`.
In general, you'll probably want to use the `Series`, `LabelMatchers`, and
`GroupBy` fields. The other two are for advanced usage.
The query is expected to return one value for each object requested. The
adapter will use the labels on the returned series to associate a given
series back to its corresponding object.
For example:
```yaml
# convert cumulative cAdvisor metrics into rates calculated over 2 minutes
metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
```
@@ -0,0 +1,435 @@
---
title: Prometheus Expressions
weight: 4
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring/expression
- /rancher/v2.0-v2.4/en/cluster-admin/tools/monitoring/expression
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/expression
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression
---
The PromQL expressions in this doc can be used to configure [alerts.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/alerts/)
> Before expressions can be used in alerts, monitoring must be enabled. For more information, refer to the documentation on enabling monitoring [at the cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) or [at the project level.]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/)
For more information about querying Prometheus, refer to the official [Prometheus documentation.](https://prometheus.io/docs/prometheus/latest/querying/basics/)
<!-- TOC -->
- [Cluster Metrics](#cluster-metrics)
- [Cluster CPU Utilization](#cluster-cpu-utilization)
- [Cluster Load Average](#cluster-load-average)
- [Cluster Memory Utilization](#cluster-memory-utilization)
- [Cluster Disk Utilization](#cluster-disk-utilization)
- [Cluster Disk I/O](#cluster-disk-i-o)
- [Cluster Network Packets](#cluster-network-packets)
- [Cluster Network I/O](#cluster-network-i-o)
- [Node Metrics](#node-metrics)
- [Node CPU Utilization](#node-cpu-utilization)
- [Node Load Average](#node-load-average)
- [Node Memory Utilization](#node-memory-utilization)
- [Node Disk Utilization](#node-disk-utilization)
- [Node Disk I/O](#node-disk-i-o)
- [Node Network Packets](#node-network-packets)
- [Node Network I/O](#node-network-i-o)
- [Etcd Metrics](#etcd-metrics)
- [Etcd Has a Leader](#etcd-has-a-leader)
- [Number of Times the Leader Changes](#number-of-times-the-leader-changes)
- [Number of Failed Proposals](#number-of-failed-proposals)
- [GRPC Client Traffic](#grpc-client-traffic)
- [Peer Traffic](#peer-traffic)
- [DB Size](#db-size)
- [Active Streams](#active-streams)
- [Raft Proposals](#raft-proposals)
- [RPC Rate](#rpc-rate)
- [Disk Operations](#disk-operations)
- [Disk Sync Duration](#disk-sync-duration)
- [Kubernetes Components Metrics](#kubernetes-components-metrics)
- [API Server Request Latency](#api-server-request-latency)
- [API Server Request Rate](#api-server-request-rate)
- [Scheduling Failed Pods](#scheduling-failed-pods)
- [Controller Manager Queue Depth](#controller-manager-queue-depth)
- [Scheduler E2E Scheduling Latency](#scheduler-e2e-scheduling-latency)
- [Scheduler Preemption Attempts](#scheduler-preemption-attempts)
- [Ingress Controller Connections](#ingress-controller-connections)
- [Ingress Controller Request Process Time](#ingress-controller-request-process-time)
- [Rancher Logging Metrics](#rancher-logging-metrics)
- [Fluentd Buffer Queue Rate](#fluentd-buffer-queue-rate)
- [Fluentd Input Rate](#fluentd-input-rate)
- [Fluentd Output Errors Rate](#fluentd-output-errors-rate)
- [Fluentd Output Rate](#fluentd-output-rate)
- [Workload Metrics](#workload-metrics)
- [Workload CPU Utilization](#workload-cpu-utilization)
- [Workload Memory Utilization](#workload-memory-utilization)
- [Workload Network Packets](#workload-network-packets)
- [Workload Network I/O](#workload-network-i-o)
- [Workload Disk I/O](#workload-disk-i-o)
- [Pod Metrics](#pod-metrics)
- [Pod CPU Utilization](#pod-cpu-utilization)
- [Pod Memory Utilization](#pod-memory-utilization)
- [Pod Network Packets](#pod-network-packets)
- [Pod Network I/O](#pod-network-i-o)
- [Pod Disk I/O](#pod-disk-i-o)
- [Container Metrics](#container-metrics)
- [Container CPU Utilization](#container-cpu-utilization)
- [Container Memory Utilization](#container-memory-utilization)
- [Container Disk I/O](#container-disk-i-o)
<!-- /TOC -->
# Cluster Metrics
### Cluster CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])))` |
### Cluster Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load5</td><td>`sum(node_load5) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr><tr><td>load15</td><td>`sum(node_load15) by (instance) / count(node_cpu_seconds_total{mode="system"})`</td></tr></table> |
### Cluster Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes) by (instance) / sum(node_memory_MemTotal_bytes) by (instance)` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes)` |
### Cluster Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance) - sum(node_filesystem_free_bytes{device!="rootfs"}) by (instance)) / sum(node_filesystem_size_bytes{device!="rootfs"}) by (instance)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs"}) - sum(node_filesystem_free_bytes{device!="rootfs"})) / sum(node_filesystem_size_bytes{device!="rootfs"})` |
### Cluster Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m])) by (instance)`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total[5m]))`</td></tr></table> |
### Cluster Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
### Cluster Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m])) by (instance)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*"}[5m]))</code></td></tr></table> |
# Node Metrics
### Node CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `avg(irate(node_cpu_seconds_total{mode!="idle", instance=~"$instance"}[5m])) by (mode)` |
| Summary | `1 - (avg(irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])))` |
### Node Load Average
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
| Summary | <table><tr><td>load1</td><td>`sum(node_load1{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load5</td><td>`sum(node_load5{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr><tr><td>load15</td><td>`sum(node_load15{instance=~"$instance"}) / count(node_cpu_seconds_total{mode="system",instance=~"$instance"})`</td></tr></table> |
### Node Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"})` |
| Summary | `1 - sum(node_memory_MemAvailable_bytes{instance=~"$instance"}) / sum(node_memory_MemTotal_bytes{instance=~"$instance"}) ` |
### Node Disk Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"}) by (device)) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) by (device)` |
| Summary | `(sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"}) - sum(node_filesystem_free_bytes{device!="rootfs",instance=~"$instance"})) / sum(node_filesystem_size_bytes{device!="rootfs",instance=~"$instance"})` |
### Node Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(node_disk_read_bytes_total{instance=~"$instance"}[5m]))`</td></tr><tr><td>written</td><td>`sum(rate(node_disk_written_bytes_total{instance=~"$instance"}[5m]))`</td></tr></table> |
### Node Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive-dropped</td><td><code>sum(rate(node_network_receive_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-errs</td><td><code>sum(rate(node_network_receive_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>receive-packets</td><td><code>sum(rate(node_network_receive_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-dropped</td><td><code>sum(rate(node_network_transmit_drop_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-errs</td><td><code>sum(rate(node_network_transmit_errs_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit-packets</td><td><code>sum(rate(node_network_transmit_packets_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
### Node Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m])) by (device)</code></td></tr></table> |
| Summary | <table><tr><td>receive</td><td><code>sum(rate(node_network_receive_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr><tr><td>transmit</td><td><code>sum(rate(node_network_transmit_bytes_total{device!~"lo &#124; veth.* &#124; docker.* &#124; flannel.* &#124; cali.* &#124; cbr.*",instance=~"$instance"}[5m]))</code></td></tr></table> |
# Etcd Metrics
### Etcd Has a Leader
`max(etcd_server_has_leader)`
### Number of Times the Leader Changes
`max(etcd_server_leader_changes_seen_total)`
### Number of Failed Proposals
`sum(etcd_server_proposals_failed_total)`
### GRPC Client Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_client_grpc_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_client_grpc_sent_bytes_total[5m]))`</td></tr></table> |
### Peer Traffic
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>in</td><td>`sum(rate(etcd_network_peer_received_bytes_total[5m]))`</td></tr><tr><td>out</td><td>`sum(rate(etcd_network_peer_sent_bytes_total[5m]))`</td></tr></table> |
### DB Size
| Catalog | Expression |
| --- | --- |
| Detail | `sum(etcd_debugging_mvcc_db_total_size_in_bytes) by (instance)` |
| Summary | `sum(etcd_debugging_mvcc_db_total_size_in_bytes)` |
### Active Streams
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) by (instance)`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>lease-watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"})`</td></tr><tr><td>watch</td><td>`sum(grpc_server_started_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}) - sum(grpc_server_handled_total{grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"})`</td></tr></table> |
### Raft Proposals
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m])) by (instance)`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m])) by (instance)`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m])) by (instance)`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>applied</td><td>`sum(increase(etcd_server_proposals_applied_total[5m]))`</td></tr><tr><td>committed</td><td>`sum(increase(etcd_server_proposals_committed_total[5m]))`</td></tr><tr><td>pending</td><td>`sum(increase(etcd_server_proposals_pending[5m]))`</td></tr><tr><td>failed</td><td>`sum(increase(etcd_server_proposals_failed_total[5m]))`</td></tr></table> |
### RPC Rate
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m])) by (instance)`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>total</td><td>`sum(rate(grpc_server_started_total{grpc_type="unary"}[5m]))`</td></tr><tr><td>fail</td><td>`sum(rate(grpc_server_handled_total{grpc_type="unary",grpc_code!="OK"}[5m]))`</td></tr></table> |
### Disk Operations
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m])) by (instance)`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m])) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>commit-called-by-backend</td><td>`sum(rate(etcd_disk_backend_commit_duration_seconds_sum[1m]))`</td></tr><tr><td>fsync-called-by-wal</td><td>`sum(rate(etcd_disk_wal_fsync_duration_seconds_sum[1m]))`</td></tr></table> |
### Disk Sync Duration
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>wal</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))`</td></tr><tr><td>db</td><td>`histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))`</td></tr></table> |
| Summary | <table><tr><td>wal</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr><tr><td>db</td><td>`sum(histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le)))`</td></tr></table> |
# Kubernetes Components Metrics
### API Server Request Latency
| Catalog | Expression |
| --- | --- |
| Detail | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance, verb) /1e+06` |
| Summary | `avg(apiserver_request_latencies_sum / apiserver_request_latencies_count) by (instance) /1e+06` |
### API Server Request Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(apiserver_request_count[5m])) by (instance, code)` |
| Summary | `sum(rate(apiserver_request_count[5m])) by (instance)` |
### Scheduling Failed Pods
| Catalog | Expression |
| --- | --- |
| Detail | `sum(kube_pod_status_scheduled{condition="false"})` |
| Summary | `sum(kube_pod_status_scheduled{condition="false"})` |
### Controller Manager Queue Depth
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>volumes</td><td>`sum(volumes_depth) by instance`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth) by instance`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth) by instance`</td></tr><tr><td>service</td><td>`sum(service_depth) by instance`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth) by instance`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth) by instance`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth) by instance`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth) by instance`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth) by instance`</td></tr></table> |
| Summary | <table><tr><td>volumes</td><td>`sum(volumes_depth)`</td></tr><tr><td>deployment</td><td>`sum(deployment_depth)`</td></tr><tr><td>replicaset</td><td>`sum(replicaset_depth)`</td></tr><tr><td>service</td><td>`sum(service_depth)`</td></tr><tr><td>serviceaccount</td><td>`sum(serviceaccount_depth)`</td></tr><tr><td>endpoint</td><td>`sum(endpoint_depth)`</td></tr><tr><td>daemonset</td><td>`sum(daemonset_depth)`</td></tr><tr><td>statefulset</td><td>`sum(statefulset_depth)`</td></tr><tr><td>replicationmanager</td><td>`sum(replicationmanager_depth)`</td></tr></table> |
### Scheduler E2E Scheduling Latency
| Catalog | Expression |
| --- | --- |
| Detail | `histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06` |
| Summary | `sum(histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) by (le, instance)) / 1e+06)` |
### Scheduler Preemption Attempts
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(scheduler_total_preemption_attempts[5m])) by (instance)` |
| Summary | `sum(rate(scheduler_total_preemption_attempts[5m]))` |
### Ingress Controller Connections
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"}) by (instance)`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"}) by (instance)`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"}) by (instance)`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m]))) by (instance)`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m]))) by (instance)`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m]))) by (instance)`</td></tr></table> |
| Summary | <table><tr><td>reading</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="reading"})`</td></tr><tr><td>waiting</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="waiting"})`</td></tr><tr><td>writing</td><td>`sum(nginx_ingress_controller_nginx_process_connections{state="writing"})`</td></tr><tr><td>accepted</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="accepted"}[5m])))`</td></tr><tr><td>active</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="active"}[5m])))`</td></tr><tr><td>handled</td><td>`sum(ceil(increase(nginx_ingress_controller_nginx_process_connections_total{state="handled"}[5m])))`</td></tr></table> |
### Ingress Controller Request Process Time
| Catalog | Expression |
| --- | --- |
| Detail | `topk(10, histogram_quantile(0.95,sum by (le, host, path)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
| Summary | `topk(10, histogram_quantile(0.95,sum by (le, host)(rate(nginx_ingress_controller_request_duration_seconds_bucket{host!="_"}[5m]))))` |
# Rancher Logging Metrics
### Fluentd Buffer Queue Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_buffer_queue_length[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_buffer_queue_length[5m]))` |
### Fluentd Input Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_input_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_input_status_num_records_total[5m]))` |
### Fluentd Output Errors Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_errors[5m])) by (type)` |
| Summary | `sum(rate(fluentd_output_status_num_errors[5m]))` |
### Fluentd Output Rate
| Catalog | Expression |
| --- | --- |
| Detail | `sum(rate(fluentd_output_status_num_records_total[5m])) by (instance)` |
| Summary | `sum(rate(fluentd_output_status_num_records_total[5m]))` |
# Workload Metrics
### Workload CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""}) by (pod_name)` |
| Summary | `sum(container_memory_working_set_bytes{namespace="$namespace",pod_name=~"$podName", container_name!=""})` |
### Workload Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
### Workload Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m])) by (pod_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name=~"$podName",container_name!=""}[5m]))`</td></tr></table> |
# Pod Metrics
### Pod CPU Utilization
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>cfs throttled seconds</td><td>`sum(rate(container_cpu_cfs_throttled_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>usage seconds</td><td>`sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>system seconds</td><td>`sum(rate(container_cpu_system_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr><tr><td>user seconds</td><td>`sum(rate(container_cpu_user_seconds_total{container_name!="POD",namespace="$namespace",pod_name="$podName", container_name!=""}[5m]))`</td></tr></table> |
### Pod Memory Utilization
| Catalog | Expression |
| --- | --- |
| Detail | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""}) by (container_name)` |
| Summary | `sum(container_memory_working_set_bytes{container_name!="POD",namespace="$namespace",pod_name="$podName",container_name!=""})` |
### Pod Network Packets
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive-packets</td><td>`sum(rate(container_network_receive_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-dropped</td><td>`sum(rate(container_network_receive_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>receive-errors</td><td>`sum(rate(container_network_receive_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-packets</td><td>`sum(rate(container_network_transmit_packets_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-dropped</td><td>`sum(rate(container_network_transmit_packets_dropped_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit-errors</td><td>`sum(rate(container_network_transmit_errors_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Network I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
| Summary | <table><tr><td>receive</td><td>`sum(rate(container_network_receive_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>transmit</td><td>`sum(rate(container_network_transmit_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
### Pod Disk I/O
| Catalog | Expression |
| --- | --- |
| Detail | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m])) by (container_name)`</td></tr></table> |
| Summary | <table><tr><td>read</td><td>`sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr><tr><td>write</td><td>`sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name!=""}[5m]))`</td></tr></table> |
# Container Metrics
### Container CPU Utilization
| Catalog | Expression |
| --- | --- |
| cfs throttled seconds | `sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| usage seconds | `sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| system seconds | `sum(rate(container_cpu_system_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| user seconds | `sum(rate(container_cpu_user_seconds_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
### Container Memory Utilization
`sum(container_memory_working_set_bytes{namespace="$namespace",pod_name="$podName",container_name="$containerName"})`
### Container Disk I/O
| Catalog | Expression |
| --- | --- |
| read | `sum(rate(container_fs_reads_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
| write | `sum(rate(container_fs_writes_bytes_total{namespace="$namespace",pod_name="$podName",container_name="$containerName"}[5m]))` |
@@ -0,0 +1,84 @@
---
title: Project Monitoring
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/monitoring/project-monitoring
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/project-monitoring
---
_Available as of v2.2.4_
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with [Prometheus](https://prometheus.io/), a leading open-source monitoring solution.
This section covers the following topics:
- [Monitoring scope](#monitoring-scope)
- [Permissions to configure project monitoring](#permissions-to-configure-project-monitoring)
- [Enabling project monitoring](#enabling-project-monitoring)
- [Project-level monitoring resource requirements](#project-level-monitoring-resource-requirements)
- [Project metrics](#project-metrics)
### Monitoring Scope
Using Prometheus, you can monitor Rancher at both the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and project level. For each cluster and project that is enabled for monitoring, Rancher deploys a Prometheus server.
- [Cluster monitoring]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) allows you to view the health of your Kubernetes cluster. Prometheus collects metrics from the cluster components below, which you can view in graphs and charts.
- Kubernetes control plane
- etcd database
- All nodes (including workers)
- Project monitoring allows you to view the state of pods running in a given project. Prometheus collects metrics from the project's deployed HTTP and TCP/UDP workloads.
### Permissions to Configure Project Monitoring
Only [administrators]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/global-permissions/), [cluster owners or members]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#cluster-roles), or [project owners]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/#project-roles) can configure project level monitoring. Project members can only view monitoring metrics.
### Enabling Project Monitoring
> **Prerequisite:** Cluster monitoring must be [enabled.]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/)
1. Go to the project where monitoring should be enabled. Note: When cluster monitoring is enabled, monitoring is also enabled by default in the **System** project.
1. Select **Tools > Monitoring** in the navigation bar.
1. Select **Enable** to show the [Prometheus configuration options]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/prometheus/). Enter in your desired configuration options.
1. Click **Save**.
### Project-Level Monitoring Resource Requirements
Container| CPU - Request | Mem - Request | CPU - Limit | Mem - Limit | Configurable
---------|---------------|---------------|-------------|-------------|-------------
Prometheus|750m| 750Mi | 1000m | 1000Mi | Yes
Grafana | 100m | 100Mi | 200m | 200Mi | No
**Result:** A single application,`project-monitoring`, is added as an [application]({{<baseurl>}}/rancher/v2.0-v2.4/en/catalog/apps/) to the project. After the application is `active`, you can start viewing project metrics through the [Rancher dashboard]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) or directly from Grafana.
> The default username and password for the Grafana instance will be `admin/admin`. However, Grafana dashboards are served via the Rancher authentication proxy, so only users who are currently authenticated into the Rancher server have access to the Grafana dashboard.
### Project Metrics
[Workload metrics]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/expression/#workload-metrics) are available for the project if monitoring is enabled at the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) and at the [project level.](#enabling-project-monitoring)
You can monitor custom metrics from any [exporters.](https://prometheus.io/docs/instrumenting/exporters/) You can also expose some custom endpoints on deployments without needing to configure Prometheus for your project.
> **Example:**
> A [Redis](https://redis.io/) application is deployed in the namespace `redis-app` in the project `Datacenter`. It is monitored via [Redis exporter](https://github.com/oliver006/redis_exporter). After enabling project monitoring, you can edit the application to configure the <b>Advanced Options -> Custom Metrics</b> section. Enter the `Container Port` and `Path` and select the `Protocol`.
To access a project-level Grafana instance,
1. From the **Global** view, navigate to a cluster that has monitoring enabled.
1. Go to a project that has monitoring enabled.
1. From the project view, click **Apps.** In versions before v2.2.0, choose **Catalog Apps** on the main navigation bar.
1. Go to the `project-monitoring` application.
1. In the `project-monitoring` application, there are two `/index.html` links: one that leads to a Grafana instance and one that leads to a Prometheus instance. When you click the Grafana link, it will redirect you to a new webpage for Grafana, which shows metrics for the cluster.
1. You will be signed in to the Grafana instance automatically. The default username is `admin` and the default password is `admin`. For security, we recommend that you log out of Grafana, log back in with the `admin` password, and change your password.
**Results:** You will be logged into Grafana from the Grafana instance. After logging in, you can view the preset Grafana dashboards, which are imported via the [Grafana provisioning mechanism](http://docs.grafana.org/administration/provisioning/#dashboards), so you cannot modify them directly. For now, if you want to configure your own dashboards, clone the original and modify the new copy.
@@ -0,0 +1,111 @@
---
title: Prometheus Configuration
weight: 1
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring/prometheus
- /rancher/v2.0-v2.4/en/cluster-admin/tools/monitoring/prometheus/
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/prometheus
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/prometheus
---
_Available as of v2.2.0_
While configuring monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) or [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/), there are multiple options that can be configured.
- [Basic Configuration](#basic-configuration)
- [Advanced Options](#advanced-options)
- [Node Exporter](#node-exporter)
- [Persistent Storage](#persistent-storage)
- [Remote Storage](#remote-storage)
# Basic Configuration
Option | Description
-------|-------------
Data Retention | How long your Prometheus instance retains monitoring data scraped from Rancher objects before it's purged.
[Enable Node Exporter](#node-exporter) | Whether or not to deploy the node exporter.
Node Exporter Host Port | The host port on which data is exposed, i.e. data that Prometheus collects from your node hardware. Required if you have enabled the node exporter.
[Enable Persistent Storage](#persistent-storage) for Prometheus | Whether or not to configure storage for Prometheus so that metrics can be retained even if the Prometheus pod fails.
[Enable Persistent Storage](#persistent-storage) for Grafana | Whether or not to configure storage for Grafana so that the Grafana dashboards and configuration can be retained even if the Grafana pod fails.
Prometheus [CPU Limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) | CPU resource limit for the Prometheus pod.
Prometheus [CPU Reservation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) | CPU reservation for the Prometheus pod.
Prometheus [Memory Limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) | Memory resource limit for the Prometheus pod.
Prometheus [Memory Reservation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) | Memory resource requests for the Prometheus pod.
Selector | Ability to select the nodes in which Prometheus and Grafana pods are deployed to. To use this option, the nodes must have labels.
# Advanced Options
Since monitoring is an [application](https://github.com/rancher/system-charts/tree/dev/charts/rancher-monitoring) from the [Rancher catalog]({{<baseurl>}}/rancher/v2.0-v2.4/en/catalog/), it can be configured like any other catalog application, by passing in values to Helm.
> **Warning:** Any modification to the application without understanding the entire application can lead to catastrophic errors.
### Prometheus RemoteRead and RemoteWrite
_Available as of v2.4.0_
Prometheus RemoteRead and RemoteWrite can be configured as custom answers in the **Advanced Options** section.
For more information on remote endpoints and storage, refer to the [Prometheus documentation.](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage)
The Prometheus operator documentation contains the full [RemoteReadSpec](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#remotereadspec) and [RemoteWriteSpec.](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#remotewritespec)
An example configuration would be:
| Variable | Value |
|--------------|------------|
| `prometheus.remoteWrite[0].url` | `http://mytarget.com` |
### LivenessProbe and ReadinessProbe
_Available as of v2.4.0_
Prometheus LivenessProbe and ReadinessProbe can be configured as custom answers in the **Advanced Options** section.
The Kubernetes probe spec is [here.](https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#probe-v1-core)
Some example key-value pairs are:
| Variable | Value |
|--------------|------------|
| `prometheus.livenessProbe.timeoutSeconds` | 60 |
| `prometheus.readinessProbe.timeoutSeconds` | 60 |
# Node Exporter
The [node exporter](https://github.com/prometheus/node_exporter/blob/master/README.md) is a popular open source exporter, which exposes the metrics for hardware and \*NIX kernels OS. It is designed to monitor the host system. However, there are still issues with namespaces when running it in a container, mostly around filesystem mount spaces. In order to monitor actual network metrics for the container network, the node exporter must be deployed with the `hostNetwork` mode.
When configuring Prometheus and enabling the node exporter, enter a host port in the **Node Exporter Host Port** that will not produce port conflicts with existing applications. The host port chosen must be open to allow internal traffic between Prometheus and the Node Exporter.
>**Warning:** In order for Prometheus to collect the metrics of the node exporter, after enabling cluster monitoring, you must open the <b>Node Exporter Host Port</b> in the host firewall rules to allow intranet access. By default, `9796` is used as that host port.
# Persistent Storage
>**Prerequisite:** Configure one or more StorageClasses to use as [persistent storage]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/volumes-and-storage/) for your Prometheus or Grafana pod.
By default, when you enable Prometheus for either a cluster or project, all monitoring data that Prometheus collects is stored on its own pod. With local storage, if the Prometheus or Grafana pods fail, all the data is lost. Rancher recommends configuring an external persistent storage to the cluster. With the external persistent storage, if the Prometheus or Grafana pods fail, the new pods can recover using data from the persistent storage.
When enabling persistent storage for Prometheus or Grafana, specify the size of the persistent volume and select the StorageClass.
# Remote Storage
>**Prerequisite:** Need a remote storage endpoint to be available. The possible list of integrations is available [here](https://prometheus.io/docs/operating/integrations/)
Using advanced options, remote storage integration for the Prometheus installation can be configured as follows:
```
prometheus.remoteWrite[0].url = http://remote1/push
prometheus.remoteWrite[0].remoteTimeout = 33s
prometheus.remoteWrite[1].url = http://remote2/push
prometheus.remoteRead[0].url = http://remote1/read
prometheus.remoteRead[0].proxyUrl = http://proxy.url
prometheus.remoteRead[0].bearerToken = token-value
prometheus.remoteRead[1].url = http://remote2/read
prometheus.remoteRead[1].remoteTimeout = 33s
prometheus.remoteRead[1].readRecent = true
```
Additional fields can be set up based on the [ReadSpec](https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#remotereadspec) and [RemoteWriteSpec](https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#remotewritespec)
@@ -0,0 +1,65 @@
---
title: Viewing Metrics
weight: 2
aliases:
- /rancher/v2.0-v2.4/en/project-admin/tools/monitoring/viewing-metrics
- /rancher/v2.0-v2.4/en/cluster-admin/tools/monitoring/viewing-metrics
- /rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/viewing-metrics
- /rancher/v2.0-v2.4/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-monitoring/viewing-metrics
---
_Available as of v2.2.0_
After you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) or [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/), you will want to be start viewing the data being collected. There are multiple ways to view this data.
## Rancher Dashboard
>**Note:** This is only available if you've enabled monitoring at the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/). Project specific analytics must be viewed using the project's Grafana instance.
Rancher's dashboards are available at multiple locations:
- **Cluster Dashboard**: From the **Global** view, navigate to the cluster.
- **Node Metrics**: From the **Global** view, navigate to the cluster. Select **Nodes**. Find the individual node and click on its name. Click **Node Metrics.**
- **Workload Metrics**: From the **Global** view, navigate to the project. From the main navigation bar, choose **Resources > Workloads.** (In versions before v2.3.0, choose **Workloads** on the main navigation bar.) Find the individual workload and click on its name. Click **Workload Metrics.**
- **Pod Metrics**: From the **Global** view, navigate to the project. Select **Workloads > Workloads**. Find the individual workload and click on its name. Find the individual pod and click on its name. Click **Pod Metrics.**
- **Container Metrics**: From the **Global** view, navigate to the project. From the main navigation bar, choose **Resources > Workloads.** (In versions before v2.3.0, choose **Workloads** on the main navigation bar.) Find the individual workload and click on its name. Find the individual pod and click on its name. Find the individual container and click on its name. Click **Container Metrics.**
Prometheus metrics are displayed and are denoted with the Grafana icon. If you click on the icon, the metrics will open a new tab in Grafana.
Within each Prometheus metrics widget, there are several ways to customize your view.
- Toggle between two views:
- **Detail**: Displays graphs and charts that let you view each event in a Prometheus time series
- **Summary** Displays events in a Prometheus time series that are outside the norm.
- Change the range of the time series that you're viewing to see a more refined or expansive data sample.
- Customize the data sample to display data between specific dates and times.
When analyzing these metrics, don't be concerned about any single standalone metric in the charts and graphs. Rather, you should establish a baseline for your metrics over the course of time, e.g. the range of values that your components usually operate within and are considered normal. After you establish the baseline, be on the lookout for any large deltas in the charts and graphs, as these big changes usually indicate a problem that you need to investigate.
## Grafana
If you've enabled monitoring at either the [cluster level]({{<baseurl>}}/rancher/v2.0-v2.4/en/monitoring-alerting/legacy/monitoring/cluster-monitoring/) or [project level]({{<baseurl>}}/rancher/v2.0-v2.4/en/project-admin/tools/monitoring/), Rancher automatically creates a link to Grafana instance. Use this link to view monitoring data.
Grafana allows you to query, visualize, alert, and ultimately, understand your cluster and workload data. For more information on Grafana and its capabilities, visit the [Grafana website](https://grafana.com/grafana).
### Authentication
Rancher determines which users can access the new Grafana instance, as well as the objects they can view within it, by validating them against the user's [cluster or project roles]({{<baseurl>}}/rancher/v2.0-v2.4/en/admin-settings/rbac/cluster-project-roles/). In other words, a user's access in Grafana mirrors their access in Rancher.
When you go to the Grafana instance, you will be logged in with the username `admin` and the password `admin`. If you log out and log in again, you will be prompted to change your password. You will only have access to the URL of the Grafana instance if you have access to view the corresponding metrics in Rancher. So for example, if your Rancher permissions are scoped to the project level, you won't be able to see the Grafana instance for cluster-level metrics.
### Accessing the Cluster-level Grafana Instance
1. From the **Global** view, navigate to a cluster that has monitoring enabled.
1. Go to the **System** project view. This project is where the cluster-level Grafana instance runs.
1. Click **Apps.** In versions before v2.2.0, choose **Catalog Apps** on the main navigation bar.
1. Go to the `cluster-monitoring` application.
1. In the `cluster-monitoring` application, there are two `/index.html` links: one that leads to a Grafana instance and one that leads to a Prometheus instance. When you click the Grafana link, it will redirect you to a new webpage for Grafana, which shows metrics for the cluster.
1. You will be signed in to the Grafana instance automatically. The default username is `admin` and the default password is `admin`. For security, we recommend that you log out of Grafana, log back in with the `admin` password, and change your password.
**Results:** You are logged into Grafana from the Grafana instance. After logging in, you can view the preset Grafana dashboards, which are imported via the [Grafana provisioning mechanism](http://docs.grafana.org/administration/provisioning/#dashboards), so you cannot modify them directly. For now, if you want to configure your own dashboards, clone the original and modify the new copy.
@@ -0,0 +1,92 @@
---
title: Istio
weight: 15
aliases:
- /rancher/v2.0-v2.4/en/dashboard/istio
- /rancher/v2.0-v2.4/en/project-admin/istio/configuring-resource-allocations/
- /rancher/v2.0-v2.4/en/cluster-admin/tools/istio/
- /rancher/v2.0-v2.4/en/project-admin/istio
- /rancher/v2.0-v2.4/en/istio/legacy/cluster-istio
---
_Available as of v2.3.0_
[Istio](https://istio.io/) is an open-source tool that makes it easier for DevOps teams to observe, control, troubleshoot, and secure the traffic within a complex network of microservices.
As a network of microservices changes and grows, the interactions between them can become more difficult to manage and understand. In such a situation, it is useful to have a service mesh as a separate infrastructure layer. Istio's service mesh lets you manipulate traffic between microservices without changing the microservices directly.
Our integration of Istio is designed so that a Rancher operator, such as an administrator or cluster owner, can deliver Istio to developers. Then developers can use Istio to enforce security policies, troubleshoot problems, or manage traffic for green/blue deployments, canary deployments, or A/B testing.
This service mesh provides features that include but are not limited to the following:
- Traffic management features
- Enhanced monitoring and tracing
- Service discovery and routing
- Secure connections and service-to-service authentication with mutual TLS
- Load balancing
- Automatic retries, backoff, and circuit breaking
After Istio is enabled in a cluster, you can leverage Istio's control plane functionality with `kubectl`.
Rancher's Istio integration comes with comprehensive visualization aids:
- **Trace the root cause of errors with Jaeger.** [Jaeger](https://www.jaegertracing.io/) is an open-source tool that provides a UI for a distributed tracing system, which is useful for root cause analysis and for determining what causes poor performance. Distributed tracing allows you to view an entire chain of calls, which might originate with a user request and traverse dozens of microservices.
- **Get the full picture of your microservice architecture with Kiali.** [Kiali](https://www.kiali.io/) provides a diagram that shows the services within a service mesh and how they are connected, including the traffic rates and latencies between them. You can check the health of the service mesh, or drill down to see the incoming and outgoing requests to a single component.
- **Gain insights from time series analytics with Grafana dashboards.** [Grafana](https://grafana.com/) is an analytics platform that allows you to query, visualize, alert on and understand the data gathered by Prometheus.
- **Write custom queries for time series data with the Prometheus UI.** [Prometheus](https://prometheus.io/) is a systems monitoring and alerting toolkit. Prometheus scrapes data from your cluster, which is then used by Grafana. A Prometheus UI is also integrated into Rancher, and lets you write custom queries for time series data and see the results in the UI.
Istio needs to be set up by a Rancher administrator or cluster administrator before it can be used in a project.
# Prerequisites
Before enabling Istio, we recommend that you confirm that your Rancher worker nodes have enough [CPU and memory]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/istio/resources) to run all of the components of Istio.
# Setup Guide
Refer to the [setup guide]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/istio/setup) for instructions on how to set up Istio and use it in a project.
# Disabling Istio
To remove Istio components from a cluster, namespace, or workload, refer to the section on [disabling Istio.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/istio/disabling-istio)
# Accessing Visualizations
> By default, only cluster owners have access to Jaeger and Kiali. For instructions on how to allow project members to access them, see [this section.]({{<baseurl>}}/rancher/v2.0-v2.4/en/cluster-admin/tools/istio/rbac/)
After Istio is set up in a cluster, Grafana, Prometheus, Jaeger, and Kiali are available in the Rancher UI.
Your access to the visualizations depend on your role. Grafana and Prometheus are only available for cluster owners. The Kiali and Jaeger UIs are available only to cluster owners by default, but cluster owners can allow project members to access them by editing the Istio settings. When you go to your project and click **Resources > Istio,** you can go to each UI for Kiali, Jaeger, Grafana, and Prometheus by clicking their icons in the top right corner of the page.
To see the visualizations, go to the cluster where Istio is set up and click **Tools > Istio.** You should see links to each UI at the top of the page.
You can also get to the visualization tools from the project view.
# Viewing the Kiali Traffic Graph
1. From the project view in Rancher, click **Resources > Istio.**
1. If you are a cluster owner, you can go to the **Traffic Graph** tab. This tab has the Kiali network visualization integrated into the UI.
# Viewing Traffic Metrics
Istios monitoring features provide visibility into the performance of all your services.
1. From the project view in Rancher, click **Resources > Istio.**
1. Go to the **Traffic Metrics** tab. After traffic is generated in your cluster, you should be able to see metrics for **Success Rate, Request Volume, 4xx Response Count, Project 5xx Response Count** and **Request Duration.** Cluster owners can see all of the metrics, while project members can see a subset of the metrics.
# Architecture
Istio installs a service mesh that uses [Envoy](https://www.envoyproxy.io/learn/service-mesh) sidecar proxies to intercept traffic to each workload. These sidecars intercept and manage service-to-service communication, allowing fine-grained observation and control over traffic within the cluster.
Only workloads that have the Istio sidecar injected can be tracked and controlled by Istio.
Enabling Istio in Rancher enables monitoring in the cluster, and enables Istio in all new namespaces that are created in a cluster. You need to manually enable Istio in preexisting namespaces.
When a namespace has Istio enabled, new workloads deployed in the namespace will automatically have the Istio sidecar. You need to manually enable Istio in preexisting workloads.
For more information on the Istio sidecar, refer to the [Istio docs](https://istio.io/docs/setup/kubernetes/additional-setup/sidecar-injection/).
### Two Ingresses
By default, each Rancher-provisioned cluster has one NGINX ingress controller allowing traffic into the cluster. To allow Istio to receive external traffic, you need to enable the Istio ingress gateway for the cluster. The result is that your cluster will have two ingresses.
![In an Istio-enabled cluster, you can have two ingresses: the default Nginx ingress, and the default Istio controller.]({{<baseurl>}}/img/rancher/istio-ingress.svg)
@@ -0,0 +1,31 @@
---
title: Disabling Istio
weight: 4
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/tools/istio/disabling-istio
- /rancher/v2.0-v2.4/en/istio/legacy/disabling-istio
- /rancher/v2.0-v2.4/en/istio/v2.3.x-v2.4.x/disabling-istio
---
This section describes how to disable Istio in a cluster, namespace, or workload.
# Disable Istio in a Cluster
To disable Istio,
1. From the **Global** view, navigate to the cluster that you want to disable Istio for.
1. Click **Tools > Istio.**
1. Click **Disable,** then click the red button again to confirm the disable action.
**Result:** The `cluster-istio` application in the cluster's `system` project gets removed. The Istio sidecar cannot be deployed on any workloads in the cluster.
# Disable Istio in a Namespace
1. In the Rancher UI, go to the project that has the namespace where you want to disable Istio.
1. On the **Workloads** tab, you will see a list of namespaces and the workloads deployed in them. Go to the namespace where you want to disable and click the **&#8942; > Disable Istio Auto Injection.**
**Result:** When workloads are deployed in this namespace, they will not have the Istio sidecar.
# Remove the Istio Sidecar from a Workload
Disable Istio in the namespace, then redeploy the workloads with in it. They will be deployed without the Istio sidecar.
@@ -0,0 +1,62 @@
---
title: Role-based Access Control
weight: 3
aliases:
- /rancher/v2.0-v2.4/en/cluster-admin/tools/istio/rbac
- /rancher/v2.0-v2.4/en/istio/legacy/rbac
- /rancher/v2.0-v2.4/en/istio/v2.3.x-v2.4.x/rbac
---
This section describes the permissions required to access Istio features and how to configure access to the Kiali and Jaeger visualizations.
# Cluster-level Access
By default, only cluster administrators can:
- Enable Istio for the cluster
- Configure resource allocations for Istio
- View each UI for Prometheus, Grafana, Kiali, and Jaeger
# Project-level Access
After Istio is enabled in a cluster, project owners and members have permission to:
- Enable and disable Istio sidecar auto-injection for namespaces
- Add the Istio sidecar to workloads
- View the traffic metrics and traffic graph for the cluster
- View the Kiali and Jaeger visualizations if cluster administrators give access to project members
- Configure Istio's resources (such as the gateway, destination rules, or virtual services) with `kubectl` (This does not apply to read-only project members)
# Access to Visualizations
By default, the Kiali and Jaeger visualizations are restricted to the cluster owner because the information in them could be sensitive.
**Jaeger** provides a UI for a distributed tracing system, which is useful for root cause analysis and for determining what causes poor performance.
**Kiali** provides a diagram that shows the services within a service mesh and how they are connected.
Rancher supports giving groups permission to access Kiali and Jaeger, but not individuals.
To configure who has permission to access the Kiali and Jaeger UI,
1. Go to the cluster view and click **Tools > Istio.**
1. Then go to the **Member Access** section. If you want to restrict access to certain groups, choose **Allow cluster owner and specified members to access Kiali and Jaeger UI.** Search for the groups that you want to have access to Kiali and Jaeger. If you want all members to have access to the tools, click **Allow all members to access Kiali and Jaeger UI.**
1. Click **Save.**
**Result:** The access levels for Kiali and Jaeger have been updated.
# Summary of Default Permissions for Istio Users
| Permission | Cluster Administrators | Project Owners | Project Members | Read-only Project Members |
|------------------------------------------|----------------|----------------|-----------------|---------------------------|
| Enable and disable Istio for the cluster | ✓ | | | |
| Configure Istio resource limits | ✓ | | | |
| Control who has access to Kiali and the Jaeger UI | ✓ | | | |
| Enable and disable Istio for a namespace | ✓ | ✓ | ✓ | |
| Enable and disable Istio on workloads | ✓ | ✓ | ✓ | |
| Configure Istio with `kubectl` | ✓ | ✓ | ✓ | |
| View Prometheus UI and Grafana UI | ✓ | | | |
| View Kiali UI and Jaeger UI ([Configurable](#access-to-visualizations)) | ✓ | | | |
| View Istio project dashboard, including traffic metrics* | ✓ | ✓ | ✓ | ✓ |
* By default, only the cluster owner will see the traffic graph. Project members will see only a subset of traffic metrics. Project members cannot see the traffic graph because it comes from Kiali, and access to Kiali is restricted to cluster owners by default.

Some files were not shown because too many files have changed in this diff Show More