Top Ten Challenges - Part 10: Cluster placement, 'stretch' clusters, HA clusters

(This is part 10 - and also the last part - of a series about challenges and considerations that I have come across with customers when deploying software into Kubernetes. You can find the intro post here.)
 
It seems appropriate to conclude the Top Ten list with this post, because, ultimately, after you have figured out how to best deploy your software, make it secure and performant, attach the right storage layer, and so forth, you must also think about how your Kubernetes cluster(s) - and the software running on them - should be laid out to meet your requirements towards availability, and your recovery objectives for when disaster strikes.

Kubernetes has a number of mechanisms built in that make it suitable for high availability, or rather, continuous availability. There is redundancy in all the basic services of a cluster, there are components that automatically monitor deployed software, by configuring if and when and how to restart and scale and uninstall your pods and containers, and all of it runs across a number of 'nodes' that support toleration and automatic failover in case a node fails. Needless to say, however, that your software must be architected to work well in such an environment, too. If, for example, a web application stores relevant data in its heap, and relies on session affinity to make sure a user is always tied to the same server instance, then it's hard to make this app scalable and highly available, and running it in a Kubernetes cluster makes no difference there. So keep in mind that even if your cluster is configured to support certain SLAs, your software has to support those same SLAs, too, and that's not always the case. A comment I make often is "A poorly architected application is still a poorly architected application if you run it in containers and Kubernetes." 

But let's focus here on what the ways are to deploy and configure the cluster as a whole. For example, OpenShift will always require a minimum of three master nodes, and even though I don't believe it is a requirement, we never deploy less than three worker nodes. The number three is the magic number to overcome the so-called split-brain syndrome. This allows us to have the required level of redundancy in the control plane (represented by the master nodes) and also for any deployed application. Note that when it comes to the required number of pods for each component of an application, you only need a minimum of three if they are stateful, and two if they are stateless (because split-brain doesn't apply).

By default, Kubernetes will try to spread pods across nodes. For example, if you have a ReplicaSet that defines three pods and you have three worker nodes, Kubernetes will try to place one on each node. That only happens, of course, if each node has enough capacity left to host the pod. You can use additional node and pod affinity rules to define the expected behavior, giving you fine grained control over which pods land where and under which circumstances. There are plenty of documents available that describe how this works, with many examples, and if you haven't done so yet, I recommend you become familiar with the concepts. (Maybe start with this one.) While talking about where pods are placed in the cluster, this also covers where pods are NOT to be placed (for example, you can define that two pods of the same type MUST NOT be run on the same node), and how pods are "evicted" from a node. In fact, we have come across cases where pods refused to be evicted from a node (due to a PodDisruptionBudget), which then stopped a node from being drained as part of a version upgrade. Not good.

Another consideration is that you may want to isolate pods of a certain kind to specific nodes in the cluster. Reasons for this could be that you have nodes with special processors that you want to use for a particular workload, or you may want to isolate a mission critical workload on nodes that do not run any other workloads to reduce the risk of negatively impacting your mission critical app. A variation of this is one case I have come across, where pods that were part of an application had to run on the same node, because they were so 'chatty' that sufficient performance could only be achieved by running them closely together, even at the cost of availability. 

A question I have heard fairly often is whether a Kubernetes cluster can be run across multiple different data centers. And while it's technically possible, we usually do not recommend it, simply because of latency concerns when running in-cluster traffic across data center boundaries. However, if the data centers are in close proximity (say, within the same metro area) and/or have fast network connections between them, it may be perfectly acceptable to run a single cluster across more than one data center, and I have come across companies successfully doing that. We have started calling these clusters "stretch clusters", even though it seems that there are different definitions of that term going around.

If a "stretch cluster" cannot be used, then you have to stand up multiple clusters, of course, and you have to make sure you (a) keep these clusters in sync with respect to applications that run in them; (b) configure your global load balancers appropriately; (c) configure the required data replication mechanisms. That last point is especially tricky when running an active-active setup, where the same application runs in two or more clusters (in two or more data centers) and you need to synchronously replicate data between the two. That may indeed be the weakest link in the chain, so to speak.
It seems to me like many of these conversations are no different from years past, where Kubernetes wasn't around. You need to determine what your expected SLAs are, how quickly you have to be able to recover from a disaster, how recent the data sync point has to be when such a disaster occurs, and then map that against your data centers, your network and, again, your application architecture. Kubernetes makes none of that obsolete.

And by the way, if you run your Kubernetes cluster in the cloud, regardless of whether you use a managed Kubernetes service offered by the cloud provider or whether you stand it up yourself on the IaaS layer, you have to do very similar planning. An added feature of a public cloud, if you will, is the notion of an "availability zone" (AZ). They first appeared in AWS, but are now supported in some form or shape by every public cloud. I have seen articles referring to a cluster running across AZs as a "stretch cluster" (unlike the definition I mentioned above where such a cluster goes across data centers). And just like before, running a single cluster across multiple cloud "regions" is probably not a good idea, and as far as I know, none of the cloud-native Kubernetes services supports it to begin with. It's like running a single cluster across data centers that are far apart.

I would suggest that, as a rule of thumb, any cluster deployed into a public cloud should be deployed across multiple AZs, unless there are reasons speaking against it. It simply increases the overall availability and stability of the system. One example that may make it difficult is storage, namely when using storage software that does not support multi-AZ deployment. It seems to be the same thing every time: once you have your storage configuration sorted, everything else becomes (almost) smooth sailing. :-) 

(Photos by Kelly Sikkema and NASA on Unsplash.)

 

Comments

Popular Posts