Kubernetes Leader Election

Using lease locks to facilitate leader election.

Distributed systems are becoming more popular. More companies are using complex distributed systems to handle billions of requests and upgrade without downtime.

Handling failures in distributed systems is crucial for higher availability. A leader election process ensures that if the leader fails, the candidate replicas can be elected as the leader.

Inspiration

I had explored the Raft Consensus Algorithm a few months back, and when I started exploring Kubernetes, a thought came to my mind:

“Kubernetes being a container orchestrator should have leader election implemented in its backend. πŸ€””

And this is where I found client-go. On further exploration, I got to know about the leaderelection package.

What is Leader Election?

Leader election involves the selection of a leader among the healthy candidates eligible for the election. Once the election is won, the leader continually “heartbeats” to renew their position as the leader, and the other candidates periodically make new attempts to become the leader. Once the leader’s health check fails, the candidates start a re-election to elect the next leader among them. This ensures that a new leader is elected quickly, if the current leader fails for some reason.

Why Leader Election for Kubernetes?

When a Kubernetes controller is deployed as multiple instances, you would want to prevent any unexpected behavior, so, running a leader election process would ensure that a leader is elected amongst the replicas, and is the only one actively reconciling the cluster. The idea behind this is that the other instances should remain inactive, but ready to take over if the leader instance fails.

How Kubernetes Leader Election works

Leader election in Kubernetes is simple. It begins with the creation of a lock object, where the leader updates the current timestamp at regular intervals as a way of informing other replicas of its current status, i.e, leadership.

This lock object holds the details about the renew deadlines, identity of the current leader, etc. If the leader fails to update the timestamp within the given interval, it is assumed to have been crashed. The candidates replicas then race to acquire leadership by updating the lock with their identity. The pod which successfully acquires the lock gets to be the new leader.

Here is a visualization:

Green pods are leaders, blue pods are candidates and red pods are crashed pods.

Leader Election
Pod 1 is the leader
Leader Election Crashed
Pod 1 crashes, new leader is elected

Writing code

We’ll need to set up a local Kubernetes cluster first. I’ll be using kind instead of minikube, but feel free to choose your own tool.

Here is the entire code:

// main.go

package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"os"
	"time"

	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	clientset "k8s.io/client-go/kubernetes"
	"k8s.io/client-go/rest"
	"k8s.io/client-go/tools/leaderelection"
	"k8s.io/client-go/tools/leaderelection/resourcelock"
	"k8s.io/klog"
)

// create a new clientset
var client *clientset.Clientset

// createLease creates a new lease lock object.
func createLease(leaseName, podName, namespace string) *resourcelock.LeaseLock {
	fmt.Println("Creating lease using the following metadata:")
	fmt.Println("Lease Name: " + leaseName)
	fmt.Println("Pod Name: " + podName)
	fmt.Println("Namespace: " + namespace)

	return &resourcelock.LeaseLock{
		LeaseMeta: metav1.ObjectMeta{
			Name:      leaseName,
			Namespace: namespace,
		},
		Client: client.CoordinationV1(),
		LockConfig: resourcelock.ResourceLockConfig{
			Identity: podName,
		},
	}
}

// elect helps in electing a new leader by using the leaderelection API.
func elect(lock *resourcelock.LeaseLock, ctx context.Context, id string) {
	leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
		Lock:          lock,
		LeaseDuration: 15 * time.Second,
		RenewDeadline: 10 * time.Second,
		RetryPeriod:   2 * time.Second,
		Callbacks: leaderelection.LeaderCallbacks{
			OnStartedLeading: func(c context.Context) {
				sampleTask()
			},
			OnStoppedLeading: func() {
				klog.Info("Evicted as leader: finding new leaders..")
			},
			OnNewLeader: func(identity string) {
				if identity == id {
					klog.Info("I'm the new leader! πŸ˜‹")
					return
				}
				klog.Info("New leader is: " + identity)
			},
		},
		ReleaseOnCancel: true,
	})
}

// sampleTask is ran when a LeaderElector starts running.
func sampleTask() {
	for {
		klog.Info("k8sensus is running sample task.")
		time.Sleep(10 * time.Second)
	}
}

func main() {
	var leaseName string
	var leaseNamespace string
	var podName = os.Getenv("POD_NAME")

	flag.StringVar(&leaseName, "lease-name", "", "Lease Name (Lock Name)")
	flag.StringVar(&leaseNamespace, "lease-namespace", "default", "Lease Namespace")

	flag.Parse()

	// validate lease name
	if leaseName == "" {
		log.Fatalln("Lease Name not found. Provide a valid lease name through --lease-name.")
	}

	// validate lease namespace
	if leaseNamespace == "" {
		log.Fatalln("Lease Namespace not found. Provide a valid lease namespace through --lease-namespace.")
	}

	fmt.Println("πŸš’πŸ—οΈ k8sensus is running!")

	config, err := rest.InClusterConfig()
	// dies if no config is given
	client = clientset.NewForConfigOrDie(config)

	if err != nil {
		log.Fatalln("Failed to get kube config.")
	}

	// create context
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	// create a lease lock
	lock := createLease(leaseName, podName, leaseNamespace)

	// run leader election
	elect(lock, ctx, podName)
}

Inside main, we first parse the required flags and create a new client using the clientset declared above. The createLease function is where the creation of the lease lock occurs, which is then used to run elect; the function for leader election.

Inside elect, we can see that RunOrDie is being called. RunOrDie internally creates a LeaderElector, which calls the Run method.

The Run method is responsible for acquiring the lock, renewing the lease and running the OnStartedLeading callback (which we had defined in elect).

Before running the whole thing, we can create a Dockerfile for creating and pushing the image to Docker Hub:

// Dockerfile

FROM golang:1.16-alpine as builder

WORKDIR /app

COPY go.mod go.mod
COPY go.sum go.sum

RUN go mod download

COPY main.go main.go

RUN CGO_ENABLED=0 GOOS=linux go build -o k8sensus main.go

FROM alpine
RUN apk --no-cache add ca-certificates

COPY --from=builder /app/k8sensus .

ENTRYPOINT ["./k8sensus"]

After pushing the image to Docker Hub, we need to create a deployment file for deploying the leader election process to our local Kubernetes cluster.

But, here’s a catch:

The default Service Account does not have access to the coordination API.

So, we’ll need to create another account and set up RBAC accordingly:

# rbac.yaml

# The default ServiceAccount doesn't have access to the coordination API
# So, a new RBAC is created in this definition

apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  name: leaderelection-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: leaderelection-role
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs:
      - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: leaderelection-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: leaderelection-role
subjects:
  - kind: ServiceAccount
    name: leaderelection-sa

And, then use the account on the deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: k8sensus
  name: k8sensus
spec:
  replicas: 3
  selector:
    matchLabels:
      app: k8sensus
  template:
    metadata:
      labels:
        app: k8sensus
    spec:
      automountServiceAccountToken: true
      serviceAccount: leaderelection-sa
      containers:
        - image: docker.io/burntcarrot/k8sensus
          name: k8sensus
          args:
            - --lease-name=k8sensus-lease
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name

Running the leader election process

First things first, apply both of the definitions:

kubectl apply -f rbac.yaml
kubectl apply -f deployment.yaml

After applying the definitions, check the pods:

❯ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
k8sensus-67798d9cf6-gqfjf   1/1     Running   0          2m2s
k8sensus-67798d9cf6-qwxj6   1/1     Running   0          2m2s
k8sensus-67798d9cf6-vtlx6   1/1     Running   0          15s

To simulate leader election, we would need to delete a pod:

❯ kubectl delete pod k8sensus-67798d9cf6-n4sgl

At this point, the candidate pods should be trying to acquire the lease lock and promote themselves as the new leader.

Now, we can check the logs of all pods:

❯ kubectl logs k8sensus-67798d9cf6-vtlx6
πŸš’πŸ—οΈ k8sensus is running!
Creating lease using the following metadata:
Lease Name: k8sensus-lease
Pod Name: k8sensus-67798d9cf6-vtlx6
Namespace: default
I1103 10:34:56.358804       1 leaderelection.go:248] attempting to acquire leader lease default/k8sensus-lease...
I1103 10:34:56.374900       1 main.go:60] New leader is: k8sensus-67798d9cf6-n4sgl
I1103 10:35:07.734547       1 main.go:60] New leader is: k8sensus-67798d9cf6-qwxj6

It’s working! A pod has been elected as the new leader. Let’s see the logs of the new leader pod:

❯ kubectl logs k8sensus-67798d9cf6-qwxj6
πŸš’πŸ—οΈ k8sensus is running!
Creating lease using the following metadata:
Lease Name: k8sensus-lease
Pod Name: k8sensus-67798d9cf6-qwxj6
Namespace: default
I1103 10:33:37.924516       1 leaderelection.go:248] attempting to acquire leader lease default/k8sensus-lease...
I1103 10:33:37.938187       1 main.go:60] New leader is: k8sensus-67798d9cf6-n4sgl
I1103 10:35:07.562421       1 leaderelection.go:258] successfully acquired lease default/k8sensus-lease
I1103 10:35:07.562627       1 main.go:57] I'm the new leader! πŸ˜‹
I1103 10:35:07.562845       1 main.go:70] k8sensus is running sample task.
I1103 10:35:17.563958       1 main.go:70] k8sensus is running sample task.
I1103 10:35:27.566011       1 main.go:70] k8sensus is running sample task.

As expected, the leader pod runs our sample task successfully!

Let’s inspect the lease lock object too:

❯ kubectl describe lease k8sensus-lease
Name:         k8sensus-lease
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  coordination.k8s.io/v1
Kind:         Lease
Metadata:
  Creation Timestamp:  2021-11-03T10:33:30Z
  Managed Fields:
    API Version:  coordination.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:acquireTime:
        f:holderIdentity:
        f:leaseDurationSeconds:
        f:leaseTransitions:
        f:renewTime:
    Manager:         k8sensus
    Operation:       Update
    Time:            2021-11-03T10:33:30Z
  Resource Version:  15947
  UID:               7b1aac4c-baa8-4c3d-a6c8-18deff2f0c2f
Spec:
  Acquire Time:            2021-11-03T10:35:07.497077Z
  Holder Identity:         k8sensus-67798d9cf6-qwxj6
  Lease Duration Seconds:  15
  Lease Transitions:       1
  Renew Time:              2021-11-03T10:39:26.731022Z
Events:                    <none>

Upon inspection of the Holder Identity field under Spec, we can see that the leader’s pod name matches the holder identity, which proves that the lease has been acquired by the new leader.

If you want to cleanup all the deployments deploying during this post, use:

❯ kubectl delete deployment k8sensus
deployment.apps "k8sensus" deleted

This was fun! I hope you learnt more about Kubernetes and leader election from this post; this post is the brief overview of my exploration of the Go client for Kubernetes.

I used Carlos Becker’s post as a reference while I was writing this post. Thank you Carlos!