According to a 2020 CNCF survey, Kubernetes continues to lead the container surge, and is truly becoming the de-facto way to deploy applications:

“This year, 91% of respondents report using Kubernetes, 83% of them in production. This continues a steady increase from 78% last year and 58% in 2018.”

The reasons why Kubernetes continues to lead the way are clear to all: a declarative configuration that can be stored as a code and reused, portable workloads that can be shifted to a different cluster by simply running a YAML file or HELM chart, all combined with self-healing application resiliency.

All of those are very impressive capabilities, well…. at least for stateless applications. But not all applications are stateless – most require data, and that’s a completely different story.

When it comes to stateful applications, that story is about storage, and it comes with all the wonderful challenges you wanted to avoid, such as:

  • Data resiliency storage is often a single point of failure, and storage resiliency architectures are complex.
  • Recovery point objective and data consistency are hard to achieve even with data protection mechanisms.
  • The complexity involved with networking and replication, which are not exactly a walk in the park and require expertise and time.
  • High costs, since end-to-end data protection solutions, are expensive.


The Challenges of Running Stateful Apps on K8s


When you run stateful applications on K8s, you are most likely to encounter these dilemmas:

  • What to choose?
    How to deploy so that my storage is available on all clusters?
  • How to defend my data from failure?
    How does my stateful app provisions and uses available storage? and how will the data be used by my stateful app?
  • What can fail? What failures do I need to be protected from?
    Or in other words – how granular is the redundancy required by your application? Is it required for a node, availability zone, region, or cloud provider?
  • How much data loss can I afford? What’s my RPO?
    Does every transaction count? Or is recovering from a day-old backup/snapshot good enough?
  • How much am I willing to invest?
    Since higher data protection options increase complexity and cost.


Existing Solutions for Stateful Apps on K8s


So what are the existing options for running stateful K8s applications? Let’s review them:

Local Disks

Local disks are directly attached to the node. They will not protect your data from failure at the node, AZ, region, or cloud provider. If the node fails, the data will be lost.

Cloud Storage

The public cloud block storage is confined to a single availability zone and will protect your data from failure at the node level, but not at the AZ, region, or cloud provider level. If these fail, the data is gone.

Cloud Storage with Regional Snapshots

Cloud block storage with regional snapshots brings us back to the RPO question – how much data can you lose? is a point in time solution with inherited data loss good enough for your data? Regional snapshots will let you resume operations at the node and AZ level but will not protect you from region and cloud provider failure.

Cloud Storage with Snapshot Shipping

Can cloud storage with snapshot shipping save the day? The process of replicating snapshots to a different region to protect data from a region failure is still a point in time solution with the risk of data loss. It will cover region failure, but is limited to a single cloud provider, and has its share of complexity and associated costs.

Managed Service

By now some of you might be thinking “I will just use a managed service”. It’s easier to just let someone else deal with the storage and infrastructure, assuring data resiliency and minimizing overhead costs. The problem is that most managed services, like Aurora or RDS, are confined to a single region, and exist outside of your Kubernetes cluster. If a region is gone, you can’t resume operations in a different region.

Replicated Database

So how about if we run a database inside the K8s cluster that already has replication capabilities built into it? well, that is a good solution to assure you have an end-to-end data resiliency architecture with data loss that won’t reach hours, but you still have to establish all the networking by yourself and it’s going to cost you a lot.

SD with Replication

Another option is running software-defined storage solutions. Some solutions will provide asynchronous replication between clouds and regions, but you will still need to manage storage and establish networking. These solutions are costly and complicated and will not assure zero data loss.

To summarize, if we take a look at all the options – we can see a clear tradeoff between cost and complexity to data resiliency.

But why can’t you have it all?

Solution Comparison Statehub


The Answer – Statehub


We believe you can have it all, and the answer is simple if you keep data mobility in mind:

Data Resiliency = Data Mobility

If a location fails, you better have an up-to-date replica of the data outside of your failed location. Current solutions were simply not designed with data mobility in mind. Add to that the complexity of networking and the fact that storage and replication are complicated, expensive, and not promoted by the public cloud vendors, and we can understand why the majority of stateful applications have yet to utilize Kubernetes.

Statehub continuous data protection (CDP) assures your data is available where your application requires it, freeing your stateful applications from geographic or vendor constraints, making them available anywhere, anytime.