Replicating a Kafka cluster across regions or clouds raises a lot of issues due to high latency between the physical locations. Traditional approaches usually end up with a complex architecture and high latency between brokers.

In addition, Egress traffic charges and the need to have running broker instances on the secondary location make cross-region or multicloud replication very expensive. By using Statehub, you can simply move your Kafka cluster and dependent workloads between different locations with no additional complexity, and at a much lower and predictable cost.

This guide walks you through the process of deploying a stateful Kafka cluster on a Kubernetes cluster, using the latest version of the Strimzi operator for simplifying Kafka management, with Statehub as a data service enabling cross-region and multicloud business continuity. So that in the event of a failure of your cluster, or the entire cloud region where it is deployed, you’ll be able to start your Kafka processes, with all of its latest messages, offsets, and topics intact on another cluster at a different location in just a few seconds, and resume your entire operation.

 

Before You Begin

Before you begin, it might be useful to familiarize yourself with Statehub concepts such as Clusters, States, Volumes, and the Default State.

Prerequisites

The following is necessary to use Statehub with your Kubernetes clusters:

1. A Statehub account. Sign up here

2. A UNIX-compatible command-line interface (Linux or Mac terminal, WSL or CygWin for Windows)

3. The `kubectl` command-line utility to control your clusters

Initial Setup

This guide assumes you have two Kubernetes clusters in two distinct cloud locations, similar to the topology below:

Step 1: Set Up the Statehub CLI

Go to Get-Statehub to Download and Install the Statehub CLI

Setting up your Kubernetes clusters to use Statehub requires using the Statehub CLI.
After installation is complete, copy the link to your browser and go to the Statehub login page.

If you don’t already have a Statehub account, you can create one on the Statehub login page.

Once you’ve logged in to your Statehub account, you should be automatically redirected to the tokens page and prompted to create a token for your CLI. Click “Yes” to create a token.
Copy the token to the CLI prompt and press Enter.

Your Statehub CLI installation should now be configured.

 

WANT TO GET STARTED FOR FREE? GET A 100GB TIER TO TRY STATEHUB OUT

 

Step 2: Register Your Clusters with Statehub

Register the Clusters Using the Statehub CLI

The following command registers the cluster associated with your current context:

kubectl config use-context my-cluster
statehub register-cluster

To register another cluster, switch to its context, and then run register it:

kubectl config use-context another-cluster
statehub register-cluster

for further information about the cluster registration process go to Statehub – Cluster Registration

💡
Please Note:
In certain scenarios, you might not have a second cluster in place at another location. If you wish your data to be replicated to another location, and create a cluster on-demand if needed, follow the procedure to add a location to a state.

To validate the cluster registered properly, run:

kubectl get storageclass

The response should describe Statehub as the default Storage Class for your cluster, with csi.statehub.io as the data provisioner.

What We’ve Done

To use Statehub with your Kubernetes clusters, register them using the Statehub CLI. The Statehub CLI makes use of your Kubernetes configuration contexts to identify your clusters and is aware of the current context.

To register another cluster, either switch to the appropriate context and run the above command. You need to register all the clusters between which you want to move your stateful applications.

This operation will:

  1. Make Statehub aware of your clusters
  1. Identify your clusters’ location(s) and report them to Statehub
  1. Generate a cluster token for your cluster and save it in your cluster’s secrets, so that your cluster can access the Statehub REST API.
  1. Install the Statehub Controller components and the Statehub CSI driver on your cluster
  1. Add the cluster’s location(s) to the default state, if the default state doesn’t span this location yet
  1. Configure the default state’s corresponding storage class (default.your-statehub-org-id.statehub.io) as the default storage class.

Once you’ve registered all of your clusters, you should be able to start stateful applications and fail them over between your clusters.

At this point, your topology will be as follows:

 

NEED SOME ASSISTANCE? WE’D LOVE TO HELP OUT!

Step 3: Deploying the Strimzi Operator and Kafka Cluster

💡
Please note: In this guide, we’re deploying the latest version of the operator with 3 Kafka brokers and 3 zookeepers on 2 namespaces: kafka for the operator controllers, and my-kafka-project for the cluster itself. You can use your preferred operator version, and change the number of brokers/zookeepers in your cluster according to your needs.

Download the Strimzi Apache Kafka Operator

Download and extract the Strimzi operator file from :

strimzi-x.y.z.zip

Install the Needed Custom Resource Definitions

Create the kafka namespace for the Strimzi Cluster Operator:

kubectl create ns kafka

Modify the files to reference the namespace in which the operator is installed (in our case: kafka)

On Linux:

sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

On macOS:

sed -i '' 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

Create another namespace on which your Kafka cluster will be deployed

kubectl create ns <your-cluster-namespace>

Edit the install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml file and configure the namespace environment variable as follows”

# ...
env:
- name: STRIMZI_NAMESPACE
  value: <your-cluster-namespace>
# ...

Deploy the Strimzi Apache Kafka Operator

Deploy the operator, CRDs, and role-bindings, granting the operator its needed permissions:
(update the namespaces according to your configuration)

kubectl create -f install/cluster-operator/ -n kafka
kubectl create -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n <your-cluster-namespace>
kubectl create -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n <your-cluster-namespace>

WANT TO GET STARTED FOR FREE? GET A 100GB TIER TO TRY STATEHUB OUT

 

Step 4: Create and Deploy the Kafka Cluster and the Kafka Topic

Create and launch a Kafka cluster using the following command:

💡
Please note:
As mentioned before, the cluster in our example is configured with persistent volumes. The storage configuration provisions 100 Gi of storage for every zookeeper and broker (in our case 3 each).
In addition, we exposed the Kafka to the cluster with an external listener configured to use a nodeport

Wait for the Kafka cluster to be deployed, the process can be observed using the following command:

kubectl get pods -A -w

Once the cluster entity operator pods are ready, the cluster is marked as ready.
In order to validate that the cluster is Ready = True, run:

kubectl get kafka -n <your-cluster-namespace>

What We’ve Done

As soon as Kubernetes tries to create a PVC as part of the Kafka cluster creation with a Statehub storage class (i.e – default.your-statehub-org-id.statehub.io), Statehub will create a volume on the state corresponding to the storage class, replicated between all of the locations the state spans.

Since the owner of the state is the primary cluster, only it is allowed to access the volumes, and create new volumes on the state.

Create a Kafka Topic Using the Following Command (Optional)

You now have a running Kafka cluster with a Kafka topic ready for producers and consumers messages and events
Your topology will be as follows:

 

NEED SOME ASSISTANCE? WE’D LOVE TO HELP OUT!

 

Step 5: Failover Between Clusters in Different Locations

In this part, failing over between regions/cloud providers is demonstrated.

To demonstrate a failover to another location without data loss:

  • Launch a Kafka cluster on the primary location as we did in this guide. Run a producer and publish some messages to the topic.
  • Delete all of the application and Kafka resources from the primary Kubernetes including the producer (This optional, but good for testing. During an actual failure, there will be no one to shut everything down gracefully).
  • Set the new location as the owner of the state.
  • Launch the Kafka cluster and topic. However, instead of a producer, launch a consumer for the same topic. This consumer will receive the messages made by the producer on the original cluster at another location
💡
Please note:
It’s an operator’s job to ensure that, for any given object, the actual state of the world (both the cluster state and potentially external state such as Kafka’s state) matches the desired state in the object.And because of that, it is important that you do not delete the Kafka topic resource before the Kafka cluster deletion is complete. Otherwise, the deletion of the topic will trigger an event and the Strimzi operator will reconcile and delete the topic in your Kafka cluster to reflect the state outside of it.

Run a Producer and Simulate Messages Within a Kafka Topic

Following steps 1-4 of this guide, you have a running Kafka cluster with a Kafka topic.

Launch a Kafka producer pod using the following command:

kubectl run -n <your-cluster-namespace> -it --stdin --tty --rm --image=bitnami/kafka:3.0.0 --restart=Never kafka-producer -- kafka-console-producer.sh --broker-list my-cluster-kafka-external-bootstrap:9094 --topic my-topic

Inside the command prompt of the producer, invoke messages as you please.

Now you can delete the Kafka cluster resources and the Strimzi operator from the main location, you can do that using the following commands:

kubectl delete kafka <your_kafka_cluster_name> -n <your-cluster-namespace>
kubectl delete -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n <your-cluster-namespace>
kubectl delete -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n <your-cluster-namespace>

Start Your Application on the Other Cluster:

  1. Choose the cluster on which you want your application and switch to its context
kubectl config use-context another-cluster
  1. Make sure this cluster is the owner of the default state by running the following command:
statehub set-owner default another-cluster
  1. Launch the Strimzi operator and the Kafka cluster in the location – repeat steps 2-4 of this guide with the same configuration as the primary cluster.

Run a Kafka Consumer

Once you have a running Kafka cluster and Kafka topic in the new location, you need to launch a consumer to digest the messages made by the producer at the primary location before failover.

Launch a Kafka consumer pod in order pull the messages, using the following messages:

kubectl run -n <your-cluster-namespace> -it --stdin --tty --rm --image=bitnami/kafka:3.0.0 --restart=Never kafka-consumer -- kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-external-bootstrap:9094 --topic my-topic --from-beginning

You are now able to pull messages made by the producer at the primary location before the failover, with the consumer pod launched in the new location cluster.

Conclusion

While failures might happen anytime, there is a simple solution for preventing data loss and managing stateful data and applications, using Statehub’s features.
In this guide, we went through the steps of deploying Apache Kafka cluster managed by the Strimzi operator on Kubernetes, with cross-region or multicloud business continuity using Statehub, giving you the freedom of running stateful Kafka clusters without worrying about data loss in a case of a failure.

As always, feel free to ping us with any questions at support@statehub.io or book a demo with us.