ETCD Backup and Restore

In this post we’ll see how to perform a backup of the ETCD server and how to restore it afterwards. We are working with Minikube.

First of all, lets see the how ETCD is configured looking at the manifest file /etc/kubernetes/etcd/manifests/etcd.yaml

...
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.99.100:2379
    - --cert-file=/var/lib/minikube/certs/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/minikube/etcd
    - --initial-advertise-peer-urls=https://192.168.99.100:2380
    - --initial-cluster=minikube=https://192.168.99.100:2380
    - --key-file=/var/lib/minikube/certs/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.99.100:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.99.100:2380
    - --name=minikube
    - --peer-cert-file=/var/lib/minikube/certs/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/var/lib/minikube/certs/etcd/peer.key
    - --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
...

We need the following information:

  1. advertise-client-urls
  2. cert-file
  3. key-file
  4. trusted-ca-file

Get those certificates files and store in a local folder and try to connect to ETCD to see the members list:

$ export WORKDIR=${PWD}
$ ETCDCTL_API=3 etcdctl member list \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379
5d05948eea4c8c0a, started, minikube, https://192.168.99.100:2379, https://192.168.99.100:2379

It’s important to have something deployed on the cluster, if not, run this commands to create a couple of deployments:

$ kubectl run nginx --image nginx --replicas 3
$ kubectl run redis --image redis --replicas 2 

After that we’ll have two apps up and running

$ kubectl get all
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6db489d4b7-c8qjt   1/1     Running   0          54m
pod/nginx-6db489d4b7-cstbc   1/1     Running   0          54m
pod/nginx-6db489d4b7-dtpc2   1/1     Running   0          54m
pod/redis-5c7c978f78-mzjc4   1/1     Running   0          54m
pod/redis-5c7c978f78-svs5v   1/1     Running   0          54m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   29d

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   3/3     3            3           54m
deployment.apps/redis   2/2     2            2           54m

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-6db489d4b7   3         3         3       54m
replicaset.apps/redis-5c7c978f78   2         2         2       54m

Perform the backup by executing:

$ ETCDCTL_API=3 etcdctl snapshot save \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379
  ${WORKDIR}/my-backup.db

This command will save a copy of the ETCD database in my-backup.db file.

Now, we can destroy the apps we deployed in the step before:

$ kubectl delete deployment nginx redis

Let’s restore the database

ETCDCTL_API=3 etcdctl snapshot restore \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379 \
  ${WORKDIR}/my-backup.db \
  --initial-cluster="minikube=https://192.168.99.100:2379" \
  --initial-cluster-token="etcd-cluster-1" \
  --initial-advertise-peer-urls="https://192.168.99.100:2379" \
  --name="minikube" \
  --data-dir="${WORKDIR}/etcd-restore"

Here we use values we got from the ETCD manifest file like the IPs and name and we add the initial token for the initial bootstrap and the data-dir where we’re gonna restore the database. This folder is local, so we’ll have to copy this folder to minikube afterwards. You can use SSH for example. I’m using /var/lib/minikube/etcd-restore to keep this files.

Once we have the database restored we need to tell Kubernetes to read the restored one, to do that, we edit the ETCD manifest file:

...
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.99.100:2379
    - --cert-file=/var/lib/minikube/certs/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/minikube/etcd-restore
    - --initial-advertise-peer-urls=https://192.168.99.100:2380
    - --initial-cluster=minikube=https://192.168.99.100:2380
    - --key-file=/var/lib/minikube/certs/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.99.100:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.99.100:2380
    - --name=minikube
    - --peer-cert-file=/var/lib/minikube/certs/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/var/lib/minikube/certs/etcd/peer.key
    - --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --initial-cluster-token=etcd-cluster-1
...
    volumeMounts:
    - mountPath: /var/lib/minikube/etcd-restore
      name: etcd-data
    - mountPath: /var/lib/minikube/certs/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/minikube/certs/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/minikube/etcd-restore
      type: DirectoryOrCreate
    name: etcd-data

We need to change initial-cluster-token with the token we’ve choose before and data-dir to point to the folder we use to keep ETCD files. Then change this PATH also in the volumes and volumeMounts sections.

You can now check that everything is now restored if you run kubectl but restart kube-apiserver is needed, from minikube:

$ docker ps | grep kube-apiserver
8255c0e35e0f        41ef50a5f06a           "kube-apiserver --ad…"   3 hours ago         Up 3 hours                              k8s_kube-apiserver_kube-apiserver-minikube_kube-system_bcfa63252833e5b041a29d7485a74d90_3
$ docker rm -f 8255c0e35e0f

We’ll have some downtime at this point, but if we’re restoring a backup is probably because some disaster happened.

This entry was posted in Kubernetes and tagged , , , . Bookmark the permalink.