Top Ten Challenges - Part 7: Storage

(This is part 7 of a series about challenges and considerations that I have come across with customers when deploying software into Kubernetes. You can find the intro post here.)

In an earlier draft of this post, I started out by trying to explain what a PV is, what a PVC is and how you can use a Storage Class to dynamically provision volumes. But besides becoming too long, I figured I can just point to a good piece of existing documentation for you to study, in the unlikely case that you are not already familiar with the concepts. The Kubernetes documentation on Persistent Volumes gives a solid introduction to this topic, so I'll just reference it here and won't even try to do a better job.

Our software expects that storage can be provisioned dynamically, and it makes heavy use of PVCs and Storage Classes. In other words, you won't find in our documentation the need to manually create Persistent Volumes. But this also means that storage classes have to exist that support the types of storage and the modes of access our services require. We mostly use block and file storage and we mostly use RWO and RWX access modes. Is that typical? I think so, but it obviously depends on the kind of application.

So when discussing the deployment of our software into a customer environment, it is indeed one of the first questions we ask: do you have the storage provisioners and associated storage classes available that we require? Sometimes the answer is Yes, sometimes it is No. But it requires a detailed conversation regardless. There are questions about things like latency and throughput and I/O rates, for which some of our services have very strict requirements and we have to ensure those can be met. As mentioned above, we have services that require RWX access mode, and that is almost always a challenge, especially when running clusters in multiple availability zones. Virtually none of the clusters we work with (and we are typically working with Red Hat OpenShift) come with a storage class that support RWX out of the box, so there is always extra work needed before we can deploy our software. 

Related to this is the difference between file and block storage (and yes, there is object storage, too, of course, but I believe that none of the database services we leverage use it). A Google search on this leads to loads of material that explains that difference and the pro's and con's of each one, but I think a rather simple formula is that block storage, while faster in many cases, really only supports RWO, whereas file storage can support RWX, too. And block storage solutions sometimes do not support replication across AZs. Well, file storage that is accessed in RWX mode also often does not support replication across AZs. See why there is a discussion about this needed? :-) For what it's worth, we are trying to avoid the need for RWX access mode wherever possible.

There are many different types of storage providers available, and I sort them into "native" and "add-on" solutions. To me, native solutions are those that take advantage of existing storage management in the environment they run in. For public clouds, those are services like EBS/EFS in AWS, or AzureDisk/AzureFiles in Azure etc. For on-premise clusters, it can be NFS or, say, a vSphere volume. For those, only a thin layer of software is needed that connects the cluster to the storage system.
There are also systems that run the storage management software inside the cluster. One example that we use frequently is Red Hat's OpenShift Data Foundation, which used to be called OpenShift Container Storage. It deploys the software needed to provision and manage storage into the cluster, even though, there is also a way to run it outside of the cluster that uses it, called "external mode". Other examples for containerized storage management services that I have come across are IBM's Spectrum Fusion and Portworx's Kubernetes Storage. There are probably many others that I am not familiar with.

At the end of the day, it comes down to an assessment of storage options that depend also on where you run your cluster (on-prem, off-prem, external storage or in-cluster) and mapping those to the storage types and access modes that your software requires. And then making sure it supports the availability and replication requirements you have in your production environments. It's not rocket science, in my opinion, but it's critically important and you need to have a strategy for it before you can start deploying Kubernetes-based solutions in production. 

Comments

Popular Posts