Top Ten Challenges - Part 3: Cluster-admin and elevated privileges

(This is part 3 of a series about challenges and considerations that I have come across with customers when deploying software into Kubernetes. You can find the intro post here.)

I apologize in advance, because this post will be a bit longer than others. The reason being that this is probably the topic that comes up the most, certainly in the initial stages of a deployment project - and I could argue that it is one of the most challenging. 

It starts with questions about the code you run in the cluster: does it require any elevated privileges? Worse, do any of the containers need to run as root? And if the answer is Yes, you might have a problem right away, because most security teams will not allow this code to be deployed, because it is not compliant with their rules.

As far as I know, we don't have any code that requires to run as root, we have an internal rule saying that it's prohibited, and we have an automated process in place that enforces the rule. I would suggest that whenever a piece of software has to have root access, this must be very well justified. Elevated privileges are a bit different, because there are indeed reasons for why those could be needed. For example, access to the host's network may be suitable for software that do network-specific things, or there may be a reason why a container must be able to directly mount file systems from the host. 
I think a best practice is that these privileges are not given by default and are only granted on an exceptional basis with good justification. And that those exceptions are monitored explicitly. A comment I have heard often from customers' security teams is "We may let you run some of your software with specific elevated privileges, but you have to tell us where and why."

In OpenShift (and it slightly diverts from standard Kubernetes in this regard), there is mechanism called a Security Context Constraint, or SCC, which controls which privileges are granted to containers running inside a pod. Every pod has an SCC associated with it, and the default is an SCC called "restricted", which allows, well, not a lot. Then there are other SCCs that come with the system, which allow specific rights, for example, an SCC called "hostnetwork", or one called "anyuid". The highest level SCC is one called "privileged", and, again, that one should be used with caution! Oh, and you can create custom SCCs, but we prefer not to, because I have seen cases where custom SCCs are simply not allowed. Aren't those discussions with security teams always fun? :-)) And by the way, the related construct in 'plain' Kubernetes is a PodSecurityPolicy, however, note that PSPs have been deprecated in the latest version of Kubernetes. 
 
Besides defining what a pod/container can and cannot do (and I am really only scratching the surface here), there is the concept of a "ServiceAccount", which effectively lets you define what a pod can do at the Kubernetes level, as in, which Kubernetes APIs it is allowed to use. Service accounts are bound to roles, and these roles (which can be scoped to a specific namespace or the entire cluster) carry the details of which APIs can be used by the service account. I don't know about you, but it regularly makes my head spin, and I need to go back to the docs all the time to refersh my memory about it all. However, a few key questions always come up and are important: Does your application create its own service accounts and roles, if so, why, and what kind of access do they allow? And does any of your code require a role at cluster scope? 

That second point is one that is critically important, in my experience (and I will get back to this aspect in later posts in this series): in the majority of companies I have talked to, there are relatively few Kubernetes clusters in place, shared across many teams, with a centralized ops team. And these centralized ops teams have allergic reactions whenever something is deployed at cluster scope, because it risks interfering with other applications running on the same cluster. Consequently, they insist that there can never be any cluster roles! It is simply driven by the desire to 'lock' teams into individual namespaces and not allow any resource in one namespace to access resources in any other namespace. 

 
 We have a couple of cases where we need to create resources at the cluster level and require cluster admin type authority to do so. One is an ImageContentSourcePolicy, which allows redirecting requests for images from public image registries to local ones (almost none of our customers allow pulling images from public registries). The other one is the "global pull secret", which is, as the name suggests, the secret that contains pull secrets for images that is used across the entire cluster. I believe that both of these are valid and reasonable exceptions from the rule that says never require anything to be run as cluster-admin.

In summary, it makes sense to have a set of strict rules ("nothing can run as root", "everything has to run under the 'restricted' SCC", "nothing can have a cluster role", and "nothing must require cluster admin privileges"), and then deal with well defined exceptions on a one by one basis. And in order to take this approach, the software you deploy must be transparent about its needs in this respect. 
 
(Photos by FLY:D and Jason Dent on Unsplash)

Comments

Popular Posts