Merge pull request #1405 from ChrisMcKee/master

additional detail to rancher ha (back/restore) and etcd snapshots
2026-05-13 16:43:22 +00:00 · 2019-05-08 10:57:56 -07:00
parent 512ff12662 75660cfd35
commit aecbca33ba
3 changed files with 89 additions and 18 deletions
@@ -44,37 +44,54 @@ To take recurring snapshots, enable the `etcd-snapshot` service, which is a serv
 **To Enable Recurring Snapshots:**

 1. Open `rancher-cluster.yml` with your favorite text editor.
-
 2. Add the following code block to the bottom of the file:

-	```
-	services:
-	  etcd:
-	    snapshot: true # enables recurring etcd snapshots
-	    creation: 6h0s # time increment between snapshots
-	    retention: 24h # time increment before snapshot purge
-	```
+_Pre 0.2.0_
+
+~~~yaml
+```
+services:
+  etcd:
+    snapshot: true # enables recurring etcd snapshots
+    creation: 6h0s # time increment between snapshots
+    retention: 24h # time increment before snapshot purge
+```
+~~~
+_Post 0.2.0: Note: S3 backup is optional_
+~~~yaml
+```
+services:
+  etcd:
+    backup_config: 
+      enabled: true     # enables recurring etcd snapshots
+      interval_hours: 6 # time increment between snapshots
+      retention: 60     # time in days before snapshot purge
+      s3_backup_config: # optional
+        access_key: "myaccesskey"
+        secret_key:  "myaccesssecret"
+        bucket_name: "my-backup-bucket"
+        endpoint: "s3.eu-west-1.amazonaws.com"
+        region: "eu-west-1"
+```
+~~~

 3. Edit the code according to your requirements.

 4. Save and close `rancher-cluster.yml`.
-
 5. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
-
 6. Run the following command:

 	```
 	rke up --config rancher-cluster.yml
 	```

-
 **Result:** RKE is configured to take recurring snapshots of `etcd` on all nodes running the `etcd` role. Snapshots are saved to the following directory: `/opt/rke/etcd-snapshots/`.

 #### Option B: One-Time Snapshots

 When you're about to upgrade Rancher or restore it to a previous snapshot, you should snapshot your live image so that you have a backup of `etcd` in its last known state.

-**To Take a One-Time Snapshot:**
+**To Take a One-Time Local Snapshot:**

 1. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.

@@ -86,7 +103,23 @@ When you're about to upgrade Rancher or restore it to a previous snapshot, you s

 **Result:** RKE takes a snapshot of `etcd` running on each `etcd` node. The file is saved to `/opt/rke/etcd-snapshots`.

-### 2. Backup Snapshots to a Safe Location
+**To Take a One-Time S3 Snapshot:**
+
+1. Open **Terminal** and change directory to the location of the RKE binary. Your `rancher-cluster.yml` file must reside in the same directory.
+
+2. Enter the following command. Replace `<SNAPSHOT.db>` with any name that you want to use for the snapshot (e.g. `upgrade.db`).
+
+   ```shell
+   rke etcd snapshot-save --config rancher-cluster.yml --name snapshot-name  \
+   --s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
+   --bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com
+   ```
+
+   *The snapshot is saved in `/opt/rke/etcd-snapshots` as well as uploaded to the S3 backend.*
+
+### 2. Backup Local Snapshots to a Safe Location
+
+> Note: This step is done for you where S3 backups are enabled

 After taking the `etcd` snapshots, save them to a safe location so that they're unaffected if your cluster experiences a disaster scenario. This location should be persistent.

@@ -98,3 +131,6 @@ In this documentation, as an example, we're using Amazon S3 as our safe location
 root@node:~# s3cmd mb s3://rke-etcd-snapshots
 root@node:~# s3cmd put /opt/rke/etcd-snapshots/snapshot.db s3://rke-etcd-snapshots/
 ```
+
+
+
@@ -33,10 +33,38 @@ We recommend that you start with fresh nodes and a clean state. Alternatively yo

 ### 2. Place Snapshot and PKI Bundle

+**Local Snapshots**
+
 Pick a one of the clean nodes. That node will be the "target node" for the initial restore.  Place the snapshot and PKI certificate bundle files in the `/opt/rke/etcd-snapshots` directory on the "target node".

 * Snapshot - `<snapshot>.db`
-* PKI Bundle - `pki.bundle.tar.gz`
+* PKI Bundle - `pki.bundle.tar.gz`  *(Pre RKE 0.2.0 only; after 0.2 you should have an cluster.rkestate file)*
+
+***Continue to step 3***
+
+
+
+**Remote Snapshots** (rancher 2.1 / rke 0.2.0 onwards)
+
+
+Ensuring your `cluster.rkestate` file is present, run rke restore from s3.
+
+```shell
+rke etcd snapshot-restore --config rancher-cluster-restore.yml \ 
+--name snap-shot-name.db \
+--s3 --access-key KEY --secret-key SECRET \
+--bucket-name my-rancher-etcd-backup-bucket \
+--s3-endpoint s3.amazonaws.com \
+--region eu-west-2
+```
+
+Once the process has completed, if rancher has been installed via helm, the UI will load (can take a few minutes).
+
+At this point the restoration is complete. 
+
+> Note: At this point it is a good idea to ensure your `kube_config_cluster.yml` and `cluster.rkestate` are backed up and preserved for any future maintenance. 
+
+

 ### 3. Configure RKE

@@ -158,14 +158,18 @@ $ rke etcd snapshot-restore --config cluster.yml --name mysnapshot

 _Available as of v0.2.0_

+> **Note:** Ensure your `cluster.rkestate` is present before starting the restore, as this contains your certificate data for the cluster
+
 When restoring etcd from a snapshot located in S3, the command needs the S3 information in order to connect to the S3 backend and retrieve the snapshot.

-```
+```shell
 $ rke etcd snapshot-restore --config cluster.yml --name snapshot-name \
 --s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \
 --bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com
 ```
-## Example
+> **Note:** if you were restoring a cluster that had rancher installed the UI should start-up after a few minutes; you don't need to re-run helm.
+
+### Example Scenario of restoring from a Local Snapshot

 In this example, the Kubernetes cluster was deployed on two AWS nodes.

@@ -185,7 +189,7 @@ $ rke etcd snapshot-save --name snapshot.db --config cluster.yml
 ![etcd snapshot]({{< baseurl >}}/img/rke/rke-etcd-backup.png)


-### Store the Snapshot Externally to S3
+### Store the Snapshot Externally in S3

 As of v0.2.0, this step is no longer required, as RKE can upload and download snapshots automatically from S3 by adding in [S3 options](#options-for-rke-etcd-snapshot-save) when running the `rke etcd snapshot-save` command.

@@ -253,7 +257,7 @@ nodes:
 After the new node is added to the `cluster.yml`, run `rke etcd snapshot-restore` to launch `etcd` from the backup. The snapshot and `pki.bundle.tar.gz` file are expected to be saved at `/opt/rke/etcd-snapshots`.
 As of v0.2.0, if you want to directly retrieve the snapshot from S3, add in the [S3 options](#options-for-rke-etcd-snapshot-restore).

-> **Note:** As of v0.2.0, the file **pki.bundle.tar.gz** is no longer required for the restore process.
+> **Note:** As of v0.2.0, the file **pki.bundle.tar.gz** is no longer required for the restore process as the certificates required to restore are preserved within the `cluster.rkestate` 

 ```
 $ rke etcd snapshot-restore --name snapshot.db --config cluster.yml
@@ -294,3 +298,6 @@ docker container inspect rke-bundle-cert
 ```

 The important thing to note is the mounts of the container and location of the **pki.bundle.tar.gz**.
+
+
+