by boosh on 16 October 2019 | Permalink
Devops and developers using Kubernetes are used to the idea that containers should be ephemeral. Not having to babysit containers is what allows us to scale. Containers can be created and deleted automatically depending on the traffic to the Kubernetes cluster. Some applications may use persistent volumes to store state, but things are generally simpler if state is stored external to the cluster, e.g. on hosted databases or storage systems like S3.
The thing is, why stop there? In this post, we’ll explain some of the benefits of making your entire Kubernetes clusters ephemeral as well – able to be created and deleted on-demand with total automation.
Until now creating clusters on-demand was problematic. Sure, there are tools like Kops and Minikube, and hosted services like Amazon’s EKS, Google’s GKE and Azure’s AKS. The trouble is just creating a virgin cluster isn’t enough. For it to be useful, you need to install the applications you care about onto it. Since these often have interdependencies, you need a way of installing things in the correct order. An extra complication is that various applications (or the cluster itself) may require cloud infrastructure to be created, such as DNS zones, load balancers, databases or other cloud services.
Given all this complexity, it’s quite common for organisations to opt for “long-lived” clusters – ones that someone sets up possibly at the start of a project, customises, and then babysits. These long-lived clusters will hopefully have updates applied to them and the devops responsible will hope that they go smoothly. They haven’t got much choice.
There are several major problems with using long-lived Kubernetes clusters:
The problems above are serious. They can sap a team’s energy by requiring members to spend more time firefighting issues instead of developing new features. Upgrades and releases can be stressful and make the team’s velocity grind to a halt. It’s all wasted effort, time and money for absolutely no benefit.
But if we think about it, these are really the same types of problems we freed ourselves from when we started using containers in the first place. Immutability provides stronger guarantees about a system, which with correct testing should lead to improved robustness. If we could create fully-functioning Kubernetes clusters whenever we wanted, we wouldn’t have any of the above problems. Many teams already automate the creation of cloud infrastructure with tools like Terraform or by using the APIs of various cloud providers for exactly the reasons described above. So it’s time to adopt automation for the entire cluster.
Let’s imagine we could create Kubernetes clusters whenever we wanted to, sized according to what we wanted to use them for, and with the applications installed that we wanted to work on or serve. Here’s what we’d gain from that:
Again, there are parallels with using containers in the first place. Deployments become easier, it’d be simpler to rollback upgrades and portability would increase – in this case the entire cluster would be more portable across regions or instance sizes.
This is what Sugarkube allows you to do with only one or two commands. In fact, Sugarkube goes one step further and ultimately can allow you to treat the specific location of a cluster (e.g. remote EKS, local Minikube) as just a detail. This makes it simple to work locally (where it’s fastest to work) before going to the cloud when absolutely necessary. It can even simplify going multi-cloud.
Ephemeral clusters aren’t always the answer though. You may legitimately need to store large amounts of data in a cluster e.g. for monitoring purposes. In that case migrating the data to a new cluster may take a large amount of time and be costly. For those situations, ephemeral clusters can still help by allowing you to easily create a replica of the cluster in which you could stage any upgrades before applying them to the main cluster.