Replicating a Kafka cluster across regions or clouds raises a lot of issues due to high latency between the physical locations. Traditional approaches usually end up with a complex architecture and high latency between brokers.
In addition, Egress traffic charges and the need to have running broker instances on the secondary location make cross-region or multicloud replication very expensive. By using Statehub, you can simply move your Kafka cluster and dependent workloads between different locations with no additional complexity, and at a much lower and predictable cost.
This guide walks you through the process of deploying a stateful Kafka cluster on a Kubernetes cluster, using the latest version of the Strimzi operator for simplifying Kafka management, with Statehub as a data service enabling cross-region and multicloud business continuity. So that in the event of a failure of your cluster, or the entire cloud region where it is deployed, you’ll be able to start your Kafka processes, with all of its latest messages, offsets, and topics intact on another cluster at a different location in just a few seconds, and resume your entire operation.
Before You Begin
The following is necessary to use Statehub with your Kubernetes clusters:
1. A Statehub account. Sign up here
2. A UNIX-compatible command-line interface (Linux or Mac terminal, WSL or CygWin for Windows)
3. The `kubectl` command-line utility to control your clusters
This guide assumes you have two Kubernetes clusters in two distinct cloud locations, similar to the topology below:
Step 1: Set Up the Statehub CLI
Go to Get-Statehub to Download and Install the Statehub CLI
Setting up your Kubernetes clusters to use Statehub requires using the Statehub CLI.
After installation is complete, copy the link to your browser and go to the Statehub login page.
If you don’t already have a Statehub account, you can create one on the Statehub login page.
Once you’ve logged in to your Statehub account, you should be automatically redirected to the tokens page and prompted to create a token for your CLI. Click “Yes” to create a token.
Copy the token to the CLI prompt and press Enter.
Your Statehub CLI installation should now be configured.
Step 2: Register Your Clusters with Statehub
Register the Clusters Using the Statehub CLI
The following command registers the cluster associated with your current context:
kubectl config use-context my-cluster
To register another cluster, switch to its context, and then run register it:
kubectl config use-context another-cluster
for further information about the cluster registration process go to Statehub – Cluster Registration
To validate the cluster registered properly, run:
kubectl get storageclass
The response should describe Statehub as the default Storage Class for your cluster, with csi.statehub.io as the data provisioner.
What We’ve Done
To use Statehub with your Kubernetes clusters, register them using the Statehub CLI. The Statehub CLI makes use of your Kubernetes configuration contexts to identify your clusters and is aware of the current context.
To register another cluster, either switch to the appropriate context and run the above command. You need to register all the clusters between which you want to move your stateful applications.
This operation will:
- Make Statehub aware of your clusters
- Identify your clusters’ location(s) and report them to Statehub
- Generate a cluster token for your cluster and save it in your cluster’s secrets, so that your cluster can access the Statehub REST API.
- Install the Statehub Controller components and the Statehub CSI driver on your cluster
- Add the cluster’s location(s) to the default state, if the default state doesn’t span this location yet
- Configure the default state’s corresponding storage class (
default.your-statehub-org-id.statehub.io) as the default storage class.
Once you’ve registered all of your clusters, you should be able to start stateful applications and fail them over between your clusters.
At this point, your topology will be as follows:
Step 3: Deploying the Strimzi Operator and Kafka Cluster
Download the Strimzi Apache Kafka Operator
Install the Needed Custom Resource Definitions
kafka namespace for the Strimzi Cluster Operator:
kubectl create ns kafka
Modify the files to reference the namespace in which the operator is installed (in our case:
sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
sed -i '' 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
Create another namespace on which your Kafka cluster will be deployed
kubectl create ns <your-cluster-namespace>
install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml file and configure the namespace environment variable as follows”
# ... env: - name: STRIMZI_NAMESPACE value: <your-cluster-namespace> # ...
Deploy the Strimzi Apache Kafka Operator
Deploy the operator, CRDs, and role-bindings, granting the operator its needed permissions:
(update the namespaces according to your configuration)
kubectl create -f install/cluster-operator/ -n kafka
kubectl create -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n <your-cluster-namespace>
kubectl create -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n <your-cluster-namespace>
Step 4: Create and Deploy the Kafka Cluster and the Kafka Topic
Create and launch a Kafka cluster using the following command:
Wait for the Kafka cluster to be deployed, the process can be observed using the following command:
kubectl get pods -A -w
Once the cluster entity operator pods are ready, the cluster is marked as ready.
In order to validate that the cluster is
Ready = True, run:
kubectl get kafka -n <your-cluster-namespace>
What We’ve Done
As soon as Kubernetes tries to create a PVC as part of the Kafka cluster creation with a Statehub storage class (i.e – default.your-statehub-org-id.statehub.io), Statehub will create a volume on the state corresponding to the storage class, replicated between all of the locations the state spans.
Since the owner of the state is the primary cluster, only it is allowed to access the volumes, and create new volumes on the state.
Create a Kafka Topic Using the Following Command (Optional)
You now have a running Kafka cluster with a Kafka topic ready for producers and consumers messages and events
Your topology will be as follows:
Step 5: Failover Between Clusters in Different Locations
In this part, failing over between regions/cloud providers is demonstrated.
To demonstrate a failover to another location without data loss:
- Launch a Kafka cluster on the primary location as we did in this guide. Run a producer and publish some messages to the topic.
- Delete all of the application and Kafka resources from the primary Kubernetes including the producer (This optional, but good for testing. During an actual failure, there will be no one to shut everything down gracefully).
- Set the new location as the owner of the state.
- Launch the Kafka cluster and topic. However, instead of a producer, launch a consumer for the same topic. This consumer will receive the messages made by the producer on the original cluster at another location
Run a Producer and Simulate Messages Within a Kafka Topic
Following steps 1-4 of this guide, you have a running Kafka cluster with a Kafka topic.
Launch a Kafka producer pod using the following command:
kubectl run -n <your-cluster-namespace> -it --stdin --tty --rm --image=bitnami/kafka:3.0.0 --restart=Never kafka-producer -- kafka-console-producer.sh --broker-list my-cluster-kafka-external-bootstrap:9094 --topic my-topic
Inside the command prompt of the producer,as you please.
Now you can delete the Kafka cluster resources and the Strimzi operator from the main location, you can do that using the following commands:
kubectl delete kafka <your_kafka_cluster_name> -n <your-cluster-namespace>
kubectl delete -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n <your-cluster-namespace>
kubectl delete -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n <your-cluster-namespace>
Start Your Application on the Other Cluster:
- Choose the cluster on which you want your application and switch to its context
kubectl config use-context another-cluster
- Make sure this cluster is the owner of the default state by running the following command:
statehub set-owner default another-cluster
- Launch the Strimzi operator and the Kafka cluster in the location – repeat steps 2-4 of this guide with the same configuration as the primary cluster.
Run a Kafka Consumer
Once you have a running Kafka cluster and Kafka topic in the new location, you need to launch a consumer to digest the messages made by the producer at the primary location before failover.
Launch a Kafka consumer pod in order pull the messages, using the following messages:
kubectl run -n <your-cluster-namespace> -it --stdin --tty --rm --image=bitnami/kafka:3.0.0 --restart=Never kafka-consumer -- kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-external-bootstrap:9094 --topic my-topic --from-beginning
You are now able to pull messages made by the producer at the primary location before the failover, with the consumer pod launched in the new location cluster.
While failures might happen anytime, there is a simple solution for preventing data loss and managing stateful data and applications, using Statehub’s features.
In this guide, we went through the steps of deploying Apache Kafka cluster managed by the Strimzi operator on Kubernetes, with cross-region or multicloud business continuity using Statehub, giving you the freedom of running stateful Kafka clusters without worrying about data loss in a case of a failure.