Top Ten Challenges - Part 8: Version lifecycle management

(This is part 8 of a series about challenges and considerations that I have come across with customers when deploying software into Kubernetes. You can find the intro post here.)

One overall concern I have - and which comes up in many discussions I have with customers - is the growing discrepancy between the pace at which technologies evolve and the ability by IT organizations to consume them. As a software vendor, we are asked to deliver functional and operational updates to our products at ever increasing speed, and we, just like virtually every other software company, have started establishing a CI/CD model that delivers a constant stream of updates and upgrades to our customers. It creates a welcome need for maximum efficiency and automation in the delivery process, but it also has its share of challenges.

One conversation I remember from a while back started with the CISO organization of a company telling us that the resolution to certain detected vulnerabilities had to be deployed within days as per company policy, only to then hear from the operational team in that same company that they only applied upgrades and patches to their systems once a year, simply to reduce the risk of instability. This conflict of interest, so to speak, exists in many organizations: change is needed, but it also introduces increased risk. 

Using Kubernetes is not making it any better or worse, in my opinion. It also follows a fairly rapid release schedule, with a new (minor) version coming out about every three months. Commercial distributions, like Red Hat's OpenShift, generally follow that cadence, typically with a delay that allows for additional testing and whatever porting that is needed. For example, at the time of this writing, the most recent Kubernetes version is 1.23, the most recent OpenShift version (4.9) is based on Kubernetes 1.22, whereas, as another example, the currently available Amazon EKS service is based on Kubernetes 1.21. 

For starters, this means that you cannot assume Kubernetes is the same everywhere. Kubernetes versions matter, primarily because each new version brings change with it, and sometimes there are incompatible changes. The two most common cases we have found are the removal of deprecated APIs, and the promotion of API, for example, from alpha to beta (which leads to an incompatible API change). This directly drives the need for us to align our software release with the underlying platform, in our case, OpenShift. We use early, nightly drivers for each upcoming version to start testing against them as soon as possible and determine what if any code changes might be needed. 

Besides making sure your software supports the right version of whatever flavor of Kubernetes you use, there is also the question of how and when to apply upgrades over time. And the first question to ask there is for how long the version of Kubernetes you are using is going to be supported. For example, Red Hat changed the version support policy for OpenShift not long ago and is now labeling every other minor release, beginning with 4.8, as an "Extended Update Support" (EUS) release, which means they are supported for 18 months. We have our own support policies at IBM, but since our software runs on OpenShift, we have to align accordingly. Of course, every Kubernetes provider might have a different approach, so you have to make sure you read their respective release notes carefully. 

The customers I talk to about this generally fall into two categories: 
(1) Those who are upgrading constantly and will consume new versions as soon as they become available. And this comes with the expectation, of course, that the software running on Kubernetes will do the same.
(2) Those who stay on a version for as long as possible, and only upgrade when the version they use falls out of support ( and sometimes even past that). And guess what, that comes with the same expectation, namely that the software they use also stays supported for the duration. 

This increases the number of Kubernetes (OpenShift) versions we have to support at any given point in time, and it also requires support for seamless upgrades between these versions. Which, in turn, means having heavily automated test, package and publish processes is critical for us. And we also need to test all possible upgrade scenarios, specifically, and make sure they can be "rolling". After all, one of the benefits of using Kubernetes and containers is that you get support for rolling upgrades, that is, for upgrades that do not incur any outage in service. 

Kubernetes upgrades are typically done by draining one node after the other, in other words, moving the pods on that node to another node, and then upgrading and restarting each node one by one. Our software makes heavy use of operators for its lifecycle management, so we needed to come up with a strategy for upgrading both the operators and their respective operands, as well as the CRDs. There are two ways in which you can do this, namely either keep the versions for operators and their operands in sync, which means that when you upgrade the operator, it will also upgrade its operands, or manage their versions separately. The latter can be achieved by adding a version attribute to the CRD, for example, but the operator has to be aware of which versions (and which upgrades between them!) are valid. Since we are using the Operator Lifecycle Management (OLM) framework to manage our operators, we also take advantage of the versioning support it comes with. And while being very powerful, it is also pretty complex, and it took us a while to come up with a solid set of guidelines that all of our development teams follow.

A version strategy for Kubernetes as well as for all the software running on top of it is critically important. The nuances of what is compatible and what isn't, what is supported for what timeframe, and building the appropriate operational processes for upgrades can be, frankly, an annoying topic that has the potential to make your head spin, but it cannot be avoided.

(Photos by Hello I'm Nik and Bryan Goff on Unsplash.)

Comments

Popular Posts