Merge pull request #1916 from catherineluse/etcd-snapshot

Fix merge conflict error
This commit is contained in:
Adrian Goins
2019-10-21 22:06:51 -03:00
committed by GitHub
@@ -25,257 +25,7 @@ You can use RKE to [restore your cluster from backup]({{<baseurl>}}/rke/latest/e
# Example Scenarios
<<<<<<< Updated upstream
These [example scenarios]({{<baseurl>}}/rke/latest/en/etcd-snapshots/example-scenarios) for backup and restore are different based on your version of RKE.
=======
### IAM Support for Storing Snapshots in S3
In addition to API access keys, RKE supports using IAM roles for S3 authentication. The cluster etcd nodes must be assigned an IAM role that has read/write access to the designated backup bucket on S3. Also, the nodes must have network access to the S3 endpoint specified.
To give an application access to S3, refer to the AWS documentation on [Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances.](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html)
### Local One-Time Snapshot Example
```
$ rke etcd snapshot-save --config cluster.yml --name snapshot-name
```
The snapshot is saved in `/opt/rke/etcd-snapshots`
### One-Time Snapshots uploaded to S3 Example
_Available as of v0.2.0_
```
$ rke etcd snapshot-save --config cluster.yml --name snapshot-name \
--s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
--bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com
```
The snapshot is saved in `/opt/rke/etcd-snapshots` as well as uploaded to the S3 backend.
## Recurring Snapshots
To schedule automatic recurring etcd snapshots, you can enable the `etcd-snapshot` service with [extra configuration options the etcd service](#options-for-the-etcd-snapshot-service). `etcd-snapshot` runs in a service container alongside the `etcd` container. By default, the `etcd-snapshot` service takes a snapshot for every node that has the `etcd` role and stores them to local disk in `/opt/rke/etcd-snapshots`. If you set up the [options for S3](#options-for-the-etcd-snapshot-service), the snapshot will also be uploaded to the S3 backend.
Prior to v0.2.0, along with the snapshots, RKE saves a backup of the certificates, i.e. a file named `pki.bundle.tar.gz`, in the same location. The snapshot and pki bundle file are required for the restore process in versions prior to v0.2.0.
When a cluster is launched with the `etcd-snapshot` service enabled, you can view the `etcd-rolling-snapshots` logs to confirm backups are being created automatically.
```
$ docker logs etcd-rolling-snapshots
time="2018-05-04T18:39:16Z" level=info msg="Initializing Rolling Backups" creation=1m0s retention=24h0m0s
time="2018-05-04T18:40:16Z" level=info msg="Created backup" name="2018-05-04T18:40:16Z_etcd" runtime=108.332814ms
time="2018-05-04T18:41:16Z" level=info msg="Created backup" name="2018-05-04T18:41:16Z_etcd" runtime=92.880112ms
time="2018-05-04T18:42:16Z" level=info msg="Created backup" name="2018-05-04T18:42:16Z_etcd" runtime=83.67642ms
time="2018-05-04T18:43:16Z" level=info msg="Created backup" name="2018-05-04T18:43:16Z_etcd" runtime=86.298499ms
```
### Options for the `Etcd-Snapshot` Service
Depending on your version of RKE, the options used to configure recurring snapshots may be different.
_Available as of v0.2.0_
|Option|Description| S3 Specific |
|---|---| --- |
|**interval_hours**| The duration in hours between recurring backups. This supercedes the `creation` option and will override it if both are specified.| |
|**retention**| The number of snapshots to retain before rotation. This supercedes the `retention` option and will override it if both are specified.| |
|**bucket_name**| S3 bucket name where backups will be stored| * |
|**access_key**| S3 access key with permission to access the backup bucket.| * |
|**secret_key** |S3 secret key with permission to access the backup bucket.| * |
|**region** |S3 region for the backup bucket. This is optional.| * |
|**endpoint** |S3 regions endpoint for the backup bucket.| * |
<br>
```yaml
services:
etcd:
backup_config:
interval_hours: 12
retention: 6
s3backupconfig:
access_key: S3_ACCESS_KEY
secret_key: S3_SECRET_KEY
bucket_name: s3-bucket-name
region: ""
endpoint: s3.amazonaws.com
```
#### Prior to v0.2.0
|Option|Description|
|---|---|
|**Snapshot**|By default, the recurring snapshot service is disabled. To enable the service, you need to define it as part of `etcd` and set it to `true`.|
|**Creation**|By default, the snapshot service will take snapshots every 5 minutes (`5m0s`). You can change the time between snapshots as part of the `creation` directive for the `etcd` service.|
|**Retention**|By default, all snapshots are saved for 24 hours (`24h`) before being deleted and purged. You can change how long to store a snapshot as part of the `retention` directive for the `etcd` service.|
```yaml
services:
etcd:
snapshot: true
creation: 5m0s
retention: 24h
```
## Etcd Disaster Recovery
If there is a disaster with your Kubernetes cluster, you can use `rke etcd snapshot-restore` to recover your etcd. This command reverts etcd to a specific snapshot. RKE also removes the old `etcd` container before creating a new `etcd` cluster using the snapshot that you have chosen.
>**Warning:** Restoring an etcd snapshot deletes your current etcd cluster and replaces it with a new one. Before you run the `rke etcd snapshot-restore` command, you should back up any important data in your cluster.
The snapshot used to restore your etcd cluster can either be stored locally in `/opt/rke/etcd-snapshots` or from a S3 compatible backend. The S3 backend option is available as of v0.2.0.
### Options for `rke etcd snapshot-restore`
| Option | Description | S3 Specific |
| --- | --- | ---|
| `--name` value | Specify snapshot name | |
| `--config` value | Specify an alternate cluster YAML file (default: "cluster.yml") [$RKE_CONFIG] | |
| `--s3` | Enabled backup to s3 |* |
| `--s3-endpoint` value | Specify s3 endpoint url (default: "s3.amazonaws.com") | * |
| `--access-key` value | Specify s3 accessKey | *|
| `--secret-key` value | Specify s3 secretKey | *|
| `--bucket-name` value | Specify s3 bucket name | *|
| `--region` value | Specify the s3 bucket location (optional) | *|
| `--ssh-agent-auth` | [Use SSH Agent Auth defined by SSH_AUTH_SOCK]({{< baseurl >}}/rke/latest/en/config-options/#ssh-agent) | |
| `--ignore-docker-version` | [Disable Docker version check]({{< baseurl >}}/rke/latest/en/config-options/#supported-docker-versions) |
### Example of Restoring from a Local Snapshot
When restoring etcd from a local snapshot, the snapshot is assumed to be located in `/opt/rke/etcd-snapshots`. In versions prior to v0.2.0, the `pki.bundle.tar.gz` file is also expected to be in the same location. As of v0.2.0, this file is no longer needed as v0.2.0 has changed how the [Kubernetes cluster state is stored]({{< baseurl >}}/rke/latest/en/installation/#kubernetes-cluster-state).
```
$ rke etcd snapshot-restore --config cluster.yml --name mysnapshot
```
### Example of Restoring from a Snapshot in S3
_Available as of v0.2.0_
> **Note:** Ensure your `cluster.rkestate` is present before starting the restore, as this contains your certificate data for the cluster
When restoring etcd from a snapshot located in S3, the command needs the S3 information in order to connect to the S3 backend and retrieve the snapshot.
```shell
$ rke etcd snapshot-restore --config cluster.yml --name snapshot-name \
--s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
--bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com
```
> **Note:** if you were restoring a cluster that had rancher installed the UI should start-up after a few minutes; you don't need to re-run helm.
### Example Scenario of restoring from a Local Snapshot
In this example, the Kubernetes cluster was deployed on two AWS nodes.
| Name | IP | Role |
|:-----:|:--------:|:----------------------:|
| node1 | 10.0.0.1 | [controlplane, worker] |
| node2 | 10.0.0.2 | [etcd] |
### Back up the `etcd` cluster
Take a local snapshot of the Kubernetes cluster. As of v0.2.0, you can also upload this snapshot directly to a S3 backend with the [S3 options](#options-for-rke-etcd-snapshot-save).
```
$ rke etcd snapshot-save --name snapshot.db --config cluster.yml
```
{{< img "/img/rke/rke-etcd-backup.png" "etcd snapshot">}}
### Store the Snapshot Externally in S3
As of v0.2.0, this step is no longer required, as RKE can upload and download snapshots automatically from S3 by adding in [S3 options](#options-for-rke-etcd-snapshot-save) when running the `rke etcd snapshot-save` command.
After taking the etcd snapshot on `node2`, we recommend saving this backup in a persistence place. One of the options is to save the backup and `pki.bundle.tar.gz` file on a S3 bucket or tape backup.
> **Note:** As of v0.2.0, the file **pki.bundle.tar.gz** is no longer required for the restore process.
```
# If you're using an AWS host and have the ability to connect to S3
root@node2:~# s3cmd mb s3://rke-etcd-backup
root@node2:~# s3cmd /opt/rke/etcd-snapshots/snapshot.db /opt/rke/etcd-snapshots/pki.bundle.tar.gz s3://rke-etcd-backup/
```
### Place the backup on a new node
To simulate the failure, let's power down `node2`.
```
root@node2:~# poweroff
```
| Name | IP | Role |
|:-----:|:--------:|:----------------------:|
| node1 | 10.0.0.1 | [controlplane, worker] |
| ~~node2~~ | ~~10.0.0.2~~ | ~~[etcd]~~ |
| node3 | 10.0.0.3 | [etcd] |
| | | |
Before restoring etcd and running `rke up`, we need to retrieve the backup saved on S3 to a new node, e.g. `node3`. As of v0.2.0, you can directly retrieve the snapshot from S3 when running the restore command, so this step is for users who stored the snapshot externally without using the integrated S3 options.
```
# Make a Directory
root@node3:~# mkdir -p /opt/rke/etcdbackup
# Get the Backup from S3
root@node3:~# s3cmd get s3://rke-etcd-backup/snapshot.db /opt/rke/etcd-snapshots/snapshot.db
# Get the pki bundle from S3, only needed prior to v0.2.0
root@node3:~# s3cmd get s3://rke-etcd-backup/pki.bundle.tar.gz /opt/rke/etcd-snapshots/pki.bundle.tar.gz
```
### Restore `etcd` on the new node from the backup
Before updating and restoring etcd, you will need to add the new node into the Kubernetes cluster with the `etcd` role. In the `cluster.yml`, comment out the old node and add in the new node. `
```yaml
nodes:
- address: 10.0.0.1
hostname_override: node1
user: ubuntu
role:
- controlplane
- worker
# - address: 10.0.0.2
# hostname_override: node2
# user: ubuntu
# role:
# - etcd
- address: 10.0.0.3
hostname_override: node3
user: ubuntu
role:
- etcd
```
After the new node is added to the `cluster.yml`, run `rke etcd snapshot-restore` to launch `etcd` from the backup. The snapshot and `pki.bundle.tar.gz` file are expected to be saved at `/opt/rke/etcd-snapshots`.
As of v0.2.0, if you want to directly retrieve the snapshot from S3, add in the [S3 options](#options-for-rke-etcd-snapshot-restore).
> **Note:** As of v0.2.0, the file **pki.bundle.tar.gz** is no longer required for the restore process as the certificates required to restore are preserved within the `cluster.rkestate`
```
$ rke etcd snapshot-restore --name snapshot.db --config cluster.yml
```
Finally, we need to restore the operations on the cluster by making the Kubernetes API point to the new `etcd` by running `rke up` again using the new `cluster.yml`.
```
$ rke up --config cluster.yml
```
Confirm that your Kubernetes cluster is functional by checking the pods on your cluster.
```
> kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-65899c769f-kcdpr 1/1 Running 0 17s
nginx-65899c769f-pc45c 1/1 Running 0 17s
nginx-65899c769f-qkhml 1/1 Running 0 17s
```
>>>>>>> Stashed changes
## Troubleshooting