Merge pull request #1576 from wjimenez5271/master

BPG merge into docs site
2026-05-20 11:55:12 +00:00 · 2019-07-15 22:13:33 -07:00
parent 33f97e3580 b855cf8edd
commit e6e95960b7
13 changed files with 133 additions and 0 deletions
@@ -0,0 +1,13 @@
+---
+title: Container Best Practices 
+weight: 100
+---
+
+Running well built containers can greatly impact the overall performance and security of your environment. Few tips:
+
+- When possible, try to standardize on a common container base OS. Smaller distributions such as Alpine and BusyBox reduce container image size and generally have a smaller attack/vulnerability surface. Popular distributions such as Ubuntu, Fedora, and CentOS are more field tested and offer more functionality.
+- If your microservice is a standalone static binary, use a FROM scratch container. This will have the smallest attack surface and smallest image size.
+- When possible, use a non-privileged user when running processes within your container. While container runtimes provide isolation, vulnerabilities and attacks are still possible. Inadvertent or accidental host mounts can also be impacted if the container is running as root.
+- Apply CPU and memory limits to your pods. This can help manage the resources on your worker nodes and avoid a malfunctioning microservice from impacting other microservices.
+- Also apply CPU and memory requirements to your pods. This is crucial for informing the scheduler which type of compute node your pod needs to be placed on, and ensuring it does not overprovision that node. Without this the scheduler makes assumptions that will likely not be helpful to your application once the cluster experiences load. 
+- Set up liveness and readiness probes for your container. Unless your container completely crashes, Kubernetes will not know it's unhealthy unless you create an endpoint or mechanism that can report container status. Alternatively, make sure you container halts and crashes if unhealthy.
@@ -0,0 +1,21 @@
+---
+title: Rancher Deployment Strategies
+weight: 100
+---
+
+## Hub & Spoke
+---
+<img src="{{site.baseurl}}/src/img/bpg/hub-and-spoke.png" width="800" alt="Hub & Spoke deployment">
+
+In this deployment scenario, there is a single Rancher control plane managing Kubernetes clusters  across the globe. The control plane would be run in an HA configuration, and there would be impact due to latencies.
+
+### Pros:
+
+* Environments could have nodes and network connectivity across regions.
+* Single control plane interface to view/see all regions and environments.
+* Kubernetes does not require Rancher to operate and can tolerate loosing connectivity to the Rancher control plane.
+
+### Cons:
+
+* Subject to network latencies
+* If control plane goes out global provisioning of new services is unavailable until restored. However each Kubernetes cluster can continue to be managed indvidually.
@@ -0,0 +1,22 @@
+---
+title: Rancher Deployment Strategies
+weight: 100
+---
+
+## Regional
+---
+<img src="{{site.baseurl}}/src/img/bpg/regional.png" width="800" alt="Regional deployment">
+
+In the regional deployment model a control plane is deployed in close proximity to the compute nodes.
+
+### Pros:
+
+* Rancher functionality in regions stay operational if a control plane in another region go down.
+* Network latency is greatly reduced, improving the performance of functionality in Rancher
+* Upgrades of Rancher control plane can be done independently per region
+
+### Cons:
+
+* Overhead of managing multiple Rancher installations.
+* Visibility across global Kubernetes clusters requires multiple interfaces/panes of glass. 
+* Deploying multi-cluster apps in Rancher requires repeating the process for each Rancher server. 
@@ -0,0 +1,11 @@
+---
+title: Deployment Types
+weight: 100
+---
+
+For production and any installation deemed as "important" should use a three-node installation. Having multiple Rancher instances running on multiple nodes ensures high availability that cannot be accomplished with a single node environment. It's also strongly recommended to have a "staging" or "pre-production" Rancher HA environment that mirrors your production environment as closely as possible in terms of software and hardware configuration. Also consider the following points for your Rancher HA setup:
+ - For best performance, run all three of your nodes in the same geographic datacenter. If you are running nodes in the cloud, such as AWS, run each node in a separate Availability Zone. For example, launch node 1 in us-west-2a, node 2 in us-west-2b, and node 3 in us-west-2c.
+ - Don't run other workloads / microservices in your Rancher HA cluster.
+ - Run Rancher HA within the system and hardware requirements as closely as possible. The more you deviate from this, the more risk you take. However metrics-driven capacity planning analysis should be the ultimate guidance for scaling Rancher as published requirements take into account a variety of workload types. You can use the including Prometheus and Grafana monitoring framework to establish a baseline for key metrics as you scale. 
+ - Don't run Rancher HA in a hosted Kubernetes environment such as GKE, EKS, or AKS. Rancher upgrades and rollbacks are not supported due to etcd snapshot support
+
@@ -0,0 +1,18 @@
+---
+title: Best Practices Guide
+weight: 1000
+
+---
+
+# Best Practices Guide
+---
+
+The purpose of this site is to consolidate best practices for Rancher implementations. This also includes recommendations to related technologies, such as Kubernetes, Docker, containers, and more. The objective is to improve the outcome of a Rancher implementation using the operational experience of Rancher and its customers. If you have any questions about how these might apply to your use case, please contact your Customer Success Manager or Support. 
+
+Use the navigation bar on the left to find the current best practices for managing and deploying Rancher Server
+
+Additional resources that can be consulted are:
+
+<a href="https://www.rancher.com/" target="_blank">Rancher Homepage</a><br>
+<a href="https://docs.rancher.com/" target="_blank">Rancher Docs</a><br>
+<a href="https://forums.rancher.com/" target="_blank">Rancher Forum</a><br>
@@ -0,0 +1,48 @@
+---
+title: Rancher & Kubernetes Management
+weight: 100
+---
+
+Rancher Operating System and Docker
+Rancher is container based and can potentially run on any Linux-based operating system. However, only operating systems listed in the requirements documentation (see https://rancher.com/docs/rancher/v2.x/en/installation/requirements/) should be used along with a supported version of Docker. These versions have been most thoroughly tested and can be properly supported by the Rancher Support team.
+
+## Kubernetes Clusters
+Rancher allows you to set up numerous combinations of configurations. Some configurations are more appropriate for development and testing, while there are other best practices for production environments for maximum availability and fault tolerance. The following best practices should be followed for production:
+
+- Separate the etcd, control plane, and workers onto different hosts. Don't assign multiple roles to the same host, such as a worker and control plane. This will give you maximum scalability.
+- Provision 3 or 5 etcd nodes. Etcd requires a quorum and is not recommended to have clusters of even numbers. Three etcd nodes is generally sufficient for smaller clusters and five etcd nodes for large clusters.
+- Provision 2 or more control plane nodes. Some control plane components, such as the kube-apiserver run in active-active mode and will give you more scalability. Other components such as kube-scheduler and kube-controller run in active-passive (leader elect) and give you more fault tolerance.
+- Run your etcd and control plane nodes on virtual machines where you can scale vCPU and memory easily if needed in the future.
+- Make sure etcd recurring snapshots are enabled. Extend the snapshot retention to a period of time that meets your business needs. In the event of a catastrophic failure or deletion of data, this may be your only recourse for recovery.
+- When possible, use Rancher to provision your Kubernetes cluster rather than importing a cluster. This will ensure the best compatibility and supportability.
+- Have multiple people in your organization set up calendar reminders for certificate renewal. Rancher provisioned Kubernetes clusters will use certificates that expire in one year. Clusters provisioned by other means may have a longer or shorter expiration. Certificates can be renewed for Rancher provisioned clusters through the user interface. Consider renewing the certificate two weeks to one month in advance. If you have multiple certificates to track, consider using monitoring and alerting mechanisms to tracking certificate expiration.
+- Closely monitor and scale your nodes as needed. Use the included Prometheus and Grafana options (see “Monitoring” sections of documentation) as a starting point to achieve this.  
+- Keep your Kubernetes cluster up to date with a recent and supported version. Typically the Kubernetes community will support the current version and previous 3 minor releases (for example, 1.14.x, 1.13.x, 1.12.x, and 1.11.x). Once a new version is released, older version will be EOL'd. Running on an EOL release can be a risk if a security issues are found and patches are not available. The community typically makes minor releases every quarter (3 months). Rancher’s SLA’s are not community dependent but as Kubernetes is a community driven software, the quality of experience will degrade as you get farther away from their supported target. 
+- Run chaoskube or a similar mechanism to randomly kill pods in your test environment. This will test the resiliency of your infrastructure and the ability of Kubernetes to self-heal. It's not recommended to run this in your production environment.
+Kubernetes Orchestration 
+- Rancher's "Add Cluster" UI is preferable for getting started with Kubernetes cluster orchestration or for simple use cases, however for more complex or demanding use cases using a CLI/API driven approach is preferred. Terraform is recommended as the tooling to implement this. Using terraform with version control and a CI/CD environment you can have high assurances of consistency and reliability when deploying Kubernetes clusters. This approach also gives you the most customization options. 
+- Rancher maintains a terraform provider for working with Rancher 2.0 Kubernetes called the rancher2 provider.
+Network Topology
+- Kubernetes clusters are best served by low-latency networks. This is especially true for the control plane components and etcd where lots of coordination / leader election traffic occurs. Networking between Rancher server and the Kubernetes clusters it manages are more tolerant of latency.
+- Limit the use of proxies or load balancers between Rancher server and Kubernetes clusters. As Rancher is maintaining a long-lived web sockets connection, these intermediaries can interfere with the connection lifecycle as they often weren't configured with this use case in mind. 
+
+
+### Rancher Software Updates
+Keep your Rancher installation up to date with the latest patches. Patch updates have important software fixes and sometimes have security fixes. When patches with security fixes are released, customers with Rancher licenses are notified by e-mail. These updates are also posted on Rancher's forum.
+
+If you believe you have uncovered a security related problem in Rancher, please communicate this immediately and discretely to the Rancher support team (rancher@support.com). Posting security issues on public forums such as Twitter, Rancher Slack, GitHub, etc. can potentially compromise security for all Rancher customers. Reporting security issues discretely allows Rancher to assess and mitigate the problem. Security patches are typically given high priority and released as quickly as possible.
+
+Feature version upgrades, for example 2.1.x to 2.2.x, should also be considered as and when they are released. Not all bug fixes and most features are not back ported into older versions. Do not upgrade production environments to beta, release candidate, or "latest" versions. Make sure the feature version you are upgrading to is considered "stable" as determined by Rancher. Use the beta, release candidate, and "latest" versions in a testing, development, or demo environment to try out new features. Keep in mind that Rancher does End of Life support for old versions, so you will eventually want to upgrade if you want to continue to receive patches.
+
+All upgrades, both patch and feature upgrades, should be first tested on a staging environment before production is upgraded. The more closely the staging environment mirrors production, the higher chance your production upgrade will be successful. Notify Rancher support of your upgrade plans so they can be on full alert during your maintenance window in the event you need their assistance.
+
+Do not upgrade production environments to a release candidate (rc), alpha, or beta release. These early releases are often not stable and may not have a future upgrade path. When installing or upgrading a non-production environment to an early release, anticipate problems such as features not working, data loss, outages, and inability to upgrade without a reinstall.
+
+In addition to Rancher software updates, closely monitor security fixes for related software, such as Docker, Linux, and any libraries used by your workloads. For production environments, try to avoid upgrading too many entities during a single maintenance window. Upgrading multiple components can make it difficult to root cause an issue in the event of a failure. As business requirements allow, upgrade one component at a time.
+
+## Network Security
+
+In general, you can use network security best practices in your Rancher and Kubernetes clusters. Consider the following:
+
+- Firewalls should be used between your hosts and the Internet (or corporate Intranet). This could be enterprise firewall appliances in a datacenter or SDN constructs in the cloud, such as VPCs, security groups, ingress, and egress rules. Try to limit inbound access only to ports and IP addresses that require it. Outbound access can be shut off (air gap) if environment sensitive information that requires this restriction. If available, use firewalls with intrusion detection and DDoS prevention.
+- Run security and penetration scans on your environment periodically. Even with well design infrastructure, a poorly designed microservice could compromise the entire environment.