One of the significant challenges that arise when using Kubernetes is that it is extremely difficult to set up storage to accommodate stateful applications while ensuring resiliency and application mobility.
The truth is – storage is falling behind the curve of the rest of cloud infrastructure, and traditional storage solutions are complex and are extremely difficult to deploy on Kubernetes. The complexity of the task is such that there is a high level of aversion to using K8s for stateful apps because it is difficult and complicated. And those organizations that do use K8s for stateful apps have resigned themselves to high levels of risk of downtime, data loss, and vendor lock-in. This article will discuss considerations needed before deploying a stateful application on Kubernetes and offer solutions to the common problems.
Stateless apps are a breeze to deploy with K8s because they don’t have any dependencies on things like persistent storage, which is usually external to the container runtime or the Kubernetes cluster. But stateful apps have more complex requirements for data storage that often turn into a DevOps nightmare.
The main challenges stemming from the lack of a fitting storage solution that works natively with K8s clusters are as follows:
Vendor lock-in: In the cloud-native world, storage is often synonymous with the services the cloud vendors provide (EBS, Azure Disk, etc.) Since you buy your storage and all of your infrastructure from a specific vendor, vendor lock-in is inevitable. Data gravity exacerbates the problem. The more data you have at a certain location, the more difficult it is to move. Moreover, your applications are now pulled to where your data is. Now, be honest, are you 100% sure that all of your infrastructure, all of your data, all of your business will be forever in AWS region us-west-2, for instance? Something to think about.
Resiliency challenges: “Our business continuity plan is to choose AWS region that fails the least.” Achieving availability and durability in the event of a node, cluster, network, or location failure is a significant challenge. Relying exclusively on a single cloud provider or a single location within a cloud provider has substantial limitations when it comes to resiliency. Multi-region or even multicloud vendor resiliency plans are relegated to the realm of science-fiction due to the sheer complexity of setting up such an infrastructure for stateful apps.
As a result, organizations accept certain levels of risk as inevitable. Resiliency plans often amount to running your business on an AWS or Google Cloud location that statistically fails the least and relying on availability zones – a solution that works only during fire drills. Needless to say, true resiliency is out of reach in that case.
Complexity: In the on-prem days, data storage used to be managed by a dedicated team of storage experts who configured and maintained the specialized storage networks and equipment and made sure it all was available, resilient, and backed up. However, in the public cloud, the standard solutions on offer leave much to be desired, and anything beyond requires significant expertise to set up and maintain. The problem is that the skill set required to set up storage infrastructures differs significantly from what most DevOps people are trained to deal with. Not that they have time for it anyway.
The growing complexity, along with the greater consequences of failure, creates the need for more sophisticated approaches to resiliency, performance, and operations. But this sophistication then quickly snowballs. So what we really need are sophisticated yet accessible solutions to these problems.
Where does the Persistent Storage Reside?
There are many ways to attach storage to K8s. So what are your options for storing your application data?
- Inside the container. Data will be lost upon container restart. This is not an appropriate solution for stateful applications and can only be used for scratch data. In addition, data stored in a container will be lost when the cluster is restarted or scaled up due to high demand.
- Host-attached Kubernetes provides the ability to run multiple replicas of containers, but these are typically limited by resources available on the nodes themselves. The default cloud-provided solutions like Amazon EBS or Azure Disk are the most common. This setup is resilient to container outages, but data will be lost when the host node has failed. Therefore this solution is also not fitting for storing persistent data. If a node goes, your data goes.
- Your own SDS solution that resides within the K8s cluster on multiple nodes (Like Portworx, OpenEBS). Some products let you run storage infrastructure within your K8s cluster. However, for the most part, this doesn’t solve application mobility – the ability to move your applications to a different geographical region or even a different cloud provider.
In terms of resiliency, all of this is confined to a single location in the world, so the data won’t survive if your location goes down. Some cluster-level setups allow you to have replicated data between clusters. But then you need to figure out how to configure networking between those clusters. Doing multi-region networking is difficult enough. Doing multicloud networking is literally hell.
- Storing your data outside the K8s cluster (using something else that is not integrated with Kubernetes but is using industry-standard protocols such as iSCSI or NFS). This removes the reliance on container and pod lifecycles and is the best option for resilience, although even these solutions are mostly confined to one geographical or cloud region. Still, it requires managing remote storage, provisioning, and networking that often snowball into an ops nightmare. These solutions are difficult to set up and maintain. It requires you to know and understand storage. The complexity of this setup is off the charts and is often prohibitive, making organizations settle for high-risk and lack of resilience.
So what are the considerations to take into account when setting up stateful applications?
Stateful Apps Storage Considerations for K8s
There are several considerations for the long-term storage of application data on K8s: resiliency, technical configurations, performance, application mobility. Let’s have a closer look at each one in turn.
The Amount and Type of Data
The first is to understand precisely what data you have and what is required from your storage infrastructure. Some questions to guide you are as follows:
- How much data do you have? Do you have a relatively small amount of persistent data (Less than 100GB, for example) that is relatively easy to copy between locations? Or are you dealing with large databases of upwards of 500GB that will require downtime to move from place to place?
- Can the data be replicated? How complex will it be?
- Will I need replication to a remote region? In some cases, compliance can require that you have three or more copies of your storage system in different geographical locations.
- What is the acceptable latency? For example, which cloud regions do I have in proximity to my primary region?
- Can data be replicated across a mix of different infrastructures? Can it be mounted by the container on any infrastructure?
The next thing to consider when running stateful applications on K8s is resiliency.
Stateful apps need to store data in a way that can tolerate occasional node, cluster, or even location failures. Some questions you must ask yourself when it comes to resiliency:
- How often is the data backed up, and how many copies exist?
- What kind of replication is optimal (synchronous/asynchronous, how often are the snapshots)? What kind of replication is even possible?
- What’s the recovery procedure, if necessary?
- What is the Recovery Point Objective (RPO) that meets the application needs? When your application fails, it will need to restart somewhere. Does the application require all of its latest data when restarted? In other words, can you afford data loss, if so, how much (measured in time)? Make sure to set up snapshots or sync/async replication accordingly.
- What is the Recovery Time Objective (RTO) – The time it will take the application to recover.
Some applications can tolerate some level of failure. But what to do when you can’t afford to lose any uptime or data at all?
Low-Level Tech Stuff that No One Likes to Deal With (Connectivity, Protocols)
Theoretically, you can configure multiple K8s clusters to ensure that your data survive a failure. But this is a problem that requires storage engineering and expertise. You must set up non-trivial networking configurations, dealing with storage terms that you never dealt with or even heard of. Cloud-native DevOps teams don’t have the time or know-how to deal with bits and bytes when it comes to storage. After all, the vast majority of cloud-natives have never touched a disk, let alone dealt with complex SDS setups. And quite frankly, we rather like it that way!
However, when it comes to stateful apps, these questions are fundamental.
- How do the containers access the storage?
- Is there an integration between the storage and K8s?
This means that you will still need an understanding of networking concepts and be able to manage persistent network connections manually. Fun!
Stateful applications are more complex than their less-demanding stateless counterparts, so it makes sense that they would also need dedicated resources and infrastructure that needs to be maintained and supported.
So here is the million-dollar question – who will be responsible for maintaining it?
- Who performs the ongoing management of the storage resources?
- Are you responsible for expansions to add capacity?
- Are there contractual implications if changes are required?
- Are you or your cloud storage vendor responsible for monitoring usage?
- How does a user know when they are close to their storage allocation limit?
- Disaster Recovery (DR) – what are the costs for a DR event?
- What level of service does your storage provider offer?
- How long will it take to recover from a disaster if one were to occur at your site?
- Who does the backups? And how?
All these questions should be answered before proceeding with deploying stateful applications.
How fast do your applications access the data. Performance of stateful K8s workloads depends on many factors, including the following:
– The number of users the application has and the load each one generates
– How frequently data is accessed
– The I/O pattern of the application: Sequential reads? Random writes? Block sizes?
This is especially critical for applications that perform a large number of operations in a short amount of time.
The Application Mobility Question
How to move data between places? Between regions or cloud providers?
You might think you don’t need to, but you do!
It is simple to move applications around between locations and cloud providers, the same cannot be said about the state. Kubernetes apps are highly mobile by their nature – that is, until persistent data is involved. So while stateless apps can be moved around with a single command, application mobility is a significant challenge for stateful apps.
That means that stateful apps are often “married for life” to their location, and organizations have to make peace with that fact, even when business requirements change.
Application mobility, global application mobility, cross-cloud application mobility are keys to resiliency, solving vendor lock-in, and lowering costs.
Imagine the freedom of moving your state around between locations or cloud providers with a click of a button? This means no more vendor and region lock-in, this means solving the resiliency issue and this means the ability to shift your workloads freely, based on business requirements and pricing considerations.
The Missing Piece of the Puzzle: Application Mobility
What happens if the data center where the storage is available fails? You’ll have to bring up the application in a different place, but how do you make sure that the data is available there?
What if another cloud provider offers you hundreds of thousands of dollars in credits – you just need to move your applications to run there. But how do you move all the data without days or weeks of downtime? And what if the migration fails and you’d like to move back?
Stateful application mobility is the key, but when storage is involved, things become infinitely more complicated. It requires storing your data outside the cluster, synchronized between multiple locations, but as I argued above, this is highly complex and prohibitively expensive for most organizations. Moving the application state between multiple locations or transitioning it to a multicloud setup is a highly complex task.
As a result, most organizations are resigned to accepting risks associated with data gravity. Unfortunately, most K8s-enabled organizations look at solving this challenge as a luxury that they can’t afford, relegating themselves to compromise where they accept high levels of risk and inevitability of the vendor lock-in.
K8s Storage with Cross-Cloud Application Mobility can be Simple. Really
Your application already lives in the cloud, but storage didn’t get on with the times. This disconnect causes a lot of complexity, frustration, and compromise.
To address these issues, we built StateHub as a Stateful Application Mobility Platform that allows users to provision stateful applications without worrying about how it’s configured or deployed. StateHub offers a simple way to store your data outside the cluster and make it available to any cluster anywhere in the world at any time. The state is now independent of any cluster. Still, at the same time, it is available to any cluster, making cross-region and cross-cloud application mobility not only possible but simple to implement.
Essentially, StateHub does for the application state what K8s did for application deployment in terms of reducing complexity. Configure your clusters to work with StateHub and forget about it forever. With StateHub, everything is taken care of automatically; you don’t need to know storage, you don’t need to understand networking, and you don’t need to worry about resiliency or application mobility anymore.
StateHub brings simplicity to the world of K8s storage, providing application mobility out of the box. It reduces time to market by giving a production-ready environment, pre-populated with everything needed to assure mobility and resiliency while protecting your data from data gravity and vendor dependency risks.