ETCD Backup and Restore

In this post we’ll see how to perform a backup of the ETCD server and how to restore it afterwards. We are working with Minikube.

First of all, lets see the how ETCD is configured looking at the manifest file /etc/kubernetes/etcd/manifests/etcd.yaml

...
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.99.100:2379
    - --cert-file=/var/lib/minikube/certs/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/minikube/etcd
    - --initial-advertise-peer-urls=https://192.168.99.100:2380
    - --initial-cluster=minikube=https://192.168.99.100:2380
    - --key-file=/var/lib/minikube/certs/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.99.100:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.99.100:2380
    - --name=minikube
    - --peer-cert-file=/var/lib/minikube/certs/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/var/lib/minikube/certs/etcd/peer.key
    - --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
...

We need the following information:

  1. advertise-client-urls
  2. cert-file
  3. key-file
  4. trusted-ca-file

Get those certificates files and store in a local folder and try to connect to ETCD to see the members list:

$ export WORKDIR=${PWD}
$ ETCDCTL_API=3 etcdctl member list \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379
5d05948eea4c8c0a, started, minikube, https://192.168.99.100:2379, https://192.168.99.100:2379

It’s important to have something deployed on the cluster, if not, run this commands to create a couple of deployments:

$ kubectl run nginx --image nginx --replicas 3
$ kubectl run redis --image redis --replicas 2 

After that we’ll have two apps up and running

$ kubectl get all
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6db489d4b7-c8qjt   1/1     Running   0          54m
pod/nginx-6db489d4b7-cstbc   1/1     Running   0          54m
pod/nginx-6db489d4b7-dtpc2   1/1     Running   0          54m
pod/redis-5c7c978f78-mzjc4   1/1     Running   0          54m
pod/redis-5c7c978f78-svs5v   1/1     Running   0          54m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   29d

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   3/3     3            3           54m
deployment.apps/redis   2/2     2            2           54m

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-6db489d4b7   3         3         3       54m
replicaset.apps/redis-5c7c978f78   2         2         2       54m

Perform the backup by executing:

$ ETCDCTL_API=3 etcdctl snapshot save \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379
  ${WORKDIR}/my-backup.db

This command will save a copy of the ETCD database in my-backup.db file.

Now, we can destroy the apps we deployed in the step before:

$ kubectl delete deployment nginx redis

Let’s restore the database

ETCDCTL_API=3 etcdctl snapshot restore \
  --cacert=${WORKDIR}/ca.crt \
  --cert=${WORKDIR}/server.crt \
  --key=${WORKDIR}/server.key \
  --endpoints=https://192.168.99.100:2379 \
  ${WORKDIR}/my-backup.db \
  --initial-cluster="minikube=https://192.168.99.100:2379" \
  --initial-cluster-token="etcd-cluster-1" \
  --initial-advertise-peer-urls="https://192.168.99.100:2379" \
  --name="minikube" \
  --data-dir="${WORKDIR}/etcd-restore"

Here we use values we got from the ETCD manifest file like the IPs and name and we add the initial token for the initial bootstrap and the data-dir where we’re gonna restore the database. This folder is local, so we’ll have to copy this folder to minikube afterwards. You can use SSH for example. I’m using /var/lib/minikube/etcd-restore to keep this files.

Once we have the database restored we need to tell Kubernetes to read the restored one, to do that, we edit the ETCD manifest file:

...
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.99.100:2379
    - --cert-file=/var/lib/minikube/certs/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/minikube/etcd-restore
    - --initial-advertise-peer-urls=https://192.168.99.100:2380
    - --initial-cluster=minikube=https://192.168.99.100:2380
    - --key-file=/var/lib/minikube/certs/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.99.100:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.99.100:2380
    - --name=minikube
    - --peer-cert-file=/var/lib/minikube/certs/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/var/lib/minikube/certs/etcd/peer.key
    - --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
    - --initial-cluster-token=etcd-cluster-1
...
    volumeMounts:
    - mountPath: /var/lib/minikube/etcd-restore
      name: etcd-data
    - mountPath: /var/lib/minikube/certs/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/minikube/certs/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/minikube/etcd-restore
      type: DirectoryOrCreate
    name: etcd-data

We need to change initial-cluster-token with the token we’ve choose before and data-dir to point to the folder we use to keep ETCD files. Then change this PATH also in the volumes and volumeMounts sections.

You can now check that everything is now restored if you run kubectl but restart kube-apiserver is needed, from minikube:

$ docker ps | grep kube-apiserver
8255c0e35e0f        41ef50a5f06a           "kube-apiserver --ad…"   3 hours ago         Up 3 hours                              k8s_kube-apiserver_kube-apiserver-minikube_kube-system_bcfa63252833e5b041a29d7485a74d90_3
$ docker rm -f 8255c0e35e0f

We’ll have some downtime at this point, but if we’re restoring a backup is probably because some disaster happened.

Posted in Kubernetes | Tagged , , , | Comments Off on ETCD Backup and Restore

Docker Image Scanner for Vulnerabilities With Clair

I’m gonna tell you how you can add a step in your CI pipeline to check if the Docker image you’re build contains vulnerabilities or not.

Pre-requisites

I assume you’ve Docker installed on your system. We’re gonna use Jenkins but is optional.

Clair

Clair is an open source project for the static analysis of vulnerabilities in appc and docker containers. It’s been developed by CoreOS.

Vulnerabilities Database

Clair relay on Postgres to keep the database. We’ve two different options here. We can create a new Postgres database which will be initializes when Clair starts or we can use a database already provisioned. 

If you want to try the first option keep in mind it takes a while to feed the database (10~15 minutes depends on network).

We’re using the second option here. Go to a terminal and run this command:

$ docker network create ci
$ docker volume create --name clair-postgres
$ docker run --detach \
   --name clair-postgres \
   --publish 5432:5432 \
   --net ci \
   --volume clair-postgres:/var/lib/postgresql/data \
   arminc/clair-db:latest

We just have created a new network to Clair can resolve the database by its name, a volume and a Docker container. After a couple seconds we can check the database is up and ready.

$ docker logs --tail 1 clair-postgres 
2019-05-15 13:36:00.068 UTC [1] LOG:  database system is ready to accept connections

Clair Service

Now it’s time to run the service. But first, a little configuration is needed. We must set the database for Clair in the config file. Run the following command:

$ curl --silent https://raw.githubusercontent.com/nordri/config-files/master/clair/config-clair.yaml | sed "s/POSTGRES_NAME/clair-postgres/" > config.yaml

We’re changing the string POSTGRES_NAME by clair-postgres which is the name we’ve gave to the Postgres container. Now, we can launch the Clair container by running:

$ docker run --detach \
  --name clair \
  --net ci \
  --publish 6060:6060 \
  --publish 6061:6061 \
  --volume ${PWD}/config.yaml:/config/config.yaml \
  quay.io/coreos/clair:latest -config /config/config.yaml

And that is.

Launch an scanner

It’s time to check everything is working properly. To run the Clair scanner I’ve just built a container with the latest release. You can found it here and then build the image using this Dockerfile.

FROM debian:jessie

COPY clair-scanner_linux_amd64 /clair-scanner
RUN chmod +x /clair-scanner

I already did so you can use this image: nordri/clair-scanner.

Let’s check some images now. One of Clair limitations is it cannot scan remote images. All images must be locals. To scan an image just launch

$ docker run -ti \
  --rm \
  --name clair-scanner \
  --net ci \
  -v /var/run/docker.sock:/var/run/docker.sock \
  nordri/clair-scanner:latest /bin/bash

Now we’re inside the container and we’re able to launch the scanner:

# export IP=$(ip r | tail -n1 | awk '{ print $9 }')
# /clair-scanner --ip ${IP} --clair=http://clair:6060 debian:jessie

If we launch that as it, we’ll see an endless list of vulnerabilities, which is more noise than something useful. So, we can filter by severity using this flag:

-t, --threshold="Unknown"             CVE severity threshold. Valid values; 'Defcon1', 'Critical', 'High', 'Medium', 'Low', 'Negligible', 'Unknown'

We can choose only check for critical or higher and then the list will be much more gentle because all the vulnerabilities will be treat as approved, and, which is much more interested, the command will exit with 0. Then we can add this to a CI pipeline.

Jenkins Integration

If you’ve Jenkins running in your infrastructure, you’ll be probably interested in check the images you’re delivering to your customers.

This simple Jenkinsfile can solve the problem

node {

   docker.image('docker').inside('-v /var/run/docker.sock:/var/run/docker.sock') {
      
       stage ('Checkout') {
         checkout scm
       }

       stage ('Build Docker image') {
           // Build docker image
           // docker build... DOCKER_IMAGE
       }
   }

   docker.image('nordri/clair-scanner').inside('--net ci') {

       stage ('Security scanner') {
           sh '''
             IP=$(ip r | tail -n1 | awk '{ print $9 }')
             /clair-scanner --ip ${IP} --clair=http://clair:6060 --threshold="Critical" DOCKER_IMAGE
           '''
       }
   }
}

As we can see, we’re using a container to build the image and another one to scan that image for vulnerabilities, we set the threshold to Critical so only really big problem will made the pipeline to fail.

References

Posted in Docker, Jenkins, Seguridad | Tagged , , , | Comments Off on Docker Image Scanner for Vulnerabilities With Clair

Recovering Untagged Images From ECR

Happened today that our CI system created a new Docker Image with a tag that was supposed to be used in production. So, we faced with a Docker Image tagged like the production one but it wasn’t the production one at all and, on the other hand, the Docker Image from production was there untagged. What should we do?

Actually, the process is quite simple if you know how to cope with that. First we have to log into the ECR Service.

$ $(aws ecr get-login --no-include-email --region eu-west-1)

Login Succeeded

Then locate the sha256 and download the manifest.

$ MANIFEST=$(aws ecr batch-get-image --repository-name sandbox --image-ids imageDigest=sha256:e226e9aaa12beb32bfe65c571cb60605b2de13338866bc832bba0e39f6819365 --query 'images[].imageManifest' --output text)

You can find the sha256 in the Image list, copy and paste the one belongs to the untagged image.

The manifest is basically all the layers which conform the Docker Image.

Then tag the image

$ aws ecr put-image --repository-name sandbox --image-tag backup --image-manifest "$MANIFEST"

Now, you are able to pull the new tagged image

docker pull myarn.amazonaws.com/sandbox:backup

Reference:

Posted in AWS, Docker, Tips | Tagged , , | Comments Off on Recovering Untagged Images From ECR

Azure Batch: Task in Containers

In today’s post I’ll be talking about how to send tasks to Microsoft Azure Batch able to run in containers. The task I want to solve in this example is calculating whether a number is prime or not. This Python code does the work for us. Then, I’ve write a Dockerfile adding the piece of code to the image. Now, we’re able to run the script from a Docker container.

Let’s move forward to Azure Batch, you need to create a Docker registry where you’ll push the Docker image and an Azure Batch account.

Docker Registry

Login into your Azure account and move to All resources, click Add and look for Container Registry. Then click in Create. Fill up the information with the name of the registry, the resource group, choose a location closer to you, enable Admin user to be able to push to the repo and the SKU (choose standard here).

Then Create

In a few seconds the registry will be ready. So go to the Dashboard and click on the registry name (the one you chose before). Click in Settings -> Access keys. Here are the credentials you’ll need to manage the registry.

Batch Account

From the All resources look for Batch service. Fill up the information with the Account name and Location, Subscription and Resource group should be ready.

Click Review + create and then Create. In a few seconds the service should be ready.

Building the Container

Clone the repo

git clone https://github.com/nordri/microsoft-azure

and build and push the container

cd batch-containers
docker build -t YOUR_REGISTRY_SERVER/YOUR_REGISTRY_NAME/YOUR_IMAGE_NAME .
# for example:
docker build -t pythonrepo.azurecr.io/pythonrepo/is_prime:latest .
# Check the image works:
docker run -ti --rm pythonrepo.azurecr.io/pythonrepo/is_prime python is_prime.py 7856
The number: 7856 is not prime
docker run -ti --rm pythonrepo.azurecr.io/pythonrepo/is_prime python is_prime.py 2237
The number: 2237 is prime
# login first
docker login pythonrepo.azurecr.io
Username: pythonrepo
Password: 
# Push
docker push pythonrepo.azurecr.io/pythonrepo/is_prime

Azure Batch

Now it’s time to send the task to Azure Batch. To do this, I’ve worked over this Python script. This script creates a pool, a job and three tasks to upload files to Azure Storage. So, I’ve made some modifications to fit it to my needs.

Creating the Pool

I need my pool to be created using instances able to run containers

...
def create_pool(batch_service_client, pool_id):
    print('Creating pool [{}]...'.format(pool_id))

    image_ref_to_use = batch.models.ImageReference(
        publisher='microsoft-azure-batch',
        offer='ubuntu-server-container',
        sku='16-04-lts',
        version='latest'
    )

    # Specify a container registry
    # We got the credentials from config.py
    containerRegistry = batchmodels.ContainerRegistry(
        user_name=config._REGISTRY_USER_NAME, 
        password=config._REGISTRY_PASSWORD, 
        registry_server=config._REGISTRY_SERVER
    )

    # The instance will pull the images defined here
    container_conf = batchmodels.ContainerConfiguration(
        container_image_names=[config._DOCKER_IMAGE],
        container_registries=[containerRegistry]
    )

    new_pool = batch.models.PoolAddParameter(
        id=pool_id,
        virtual_machine_configuration=batchmodels.VirtualMachineConfiguration(
            image_reference=image_ref_to_use,
            container_configuration=container_conf,
            node_agent_sku_id='batch.node.ubuntu 16.04'),
        vm_size='STANDARD_A2',
        target_dedicated_nodes=1
    )

    batch_service_client.pool.add(new_pool)
...

The key is the ImageReference where we set the instances to run with an OS able to run Docker. You must set the registry credentials and the default Docker image that will be pulled when the instance boots.

Creating the Task

I’ve also changed the Task for the same reason. This task is ready to launch a container in the instance.

...
def add_tasks(batch_service_client, job_id, task_id, number_to_test):
    print('Adding tasks to job [{}]...'.format(job_id))

    # This is the user who run the command inside the container.
    # An unprivileged one
    user = batchmodels.AutoUserSpecification(
        scope=batchmodels.AutoUserScope.task,
        elevation_level=batchmodels.ElevationLevel.non_admin
    )

    # This is the docker image we want to run
    task_container_settings = batchmodels.TaskContainerSettings(
        image_name=config._DOCKER_IMAGE,
        container_run_options='--rm'
    )
    
    # The container needs this argument to be executed
    task = batchmodels.TaskAddParameter(
        id=task_id,
        command_line='python /is_prime.py ' + str(number_to_test),
        container_settings=task_container_settings,
        user_identity=batchmodels.UserIdentity(auto_user=user)
    )

    batch_service_client.task.add(job_id, task)
...

You can see how I’ve defined the user inside the container as a non admin user. The Docker image we want to use and the arguments we need to pass in the command line, remember we launch the container like:

docker ... imagename python /is_prime.py number

Launching the Script

Configure

In order to launch the script we need to fill up some configuration. Open the config.py file and write all the credentials needed. Remember, all the credentials are in the Access keys section.

Installing Dependencies

You need Azure Python SDK installed to run the script.

pip install -r requirements.txt

Let’s go

Now we’re ready to launch the script:

python batch_containers.py 89
Sample start: 2018-11-10 10:11:11

Creating pool [ContainersPool]...
No handlers could be found for logger "msrest.pipeline.requests"
Creating job [ContainersJob]...
Adding tasks to job [ContainersJob]...
Monitoring all tasks for 'Completed' state, timeout in 0:30:00.....................................................................................................................................................................
  Success! All tasks reached the 'Completed' state within the specified timeout period.
Printing task output...
Task: ContainersTask
Standard output:
The number: 89 is prime

Standard error:


Sample end: 2018-11-10 10:14:31
Elapsed time: 0:03:20

Delete job? [Y/n] y
Delete pool? [Y/n] y

Press ENTER to exit...

If there’s a problem with the script we’ll see the error on stderr.txt.

Sample start: 2018-11-10 11:29:56

Creating pool [ContainersPool]...
No handlers could be found for logger "msrest.pipeline.requests"
Creating job [ContainersJob]...
Adding tasks to job [ContainersJob]...
Monitoring all tasks for 'Completed' state, timeout in 0:30:00..................................................................................................................................................................
  Success! All tasks reached the 'Completed' state within the specified timeout period.
Printing task output...
Task: ContainersTask
Standard output:

Standard error:
usage: is_prime.py [-h] number
is_prime.py: error: argument number: invalid int value: 'o'


Sample end: 2018-11-10 11:33:10
Elapsed time: 0:03:14

Delete job? [Y/n] y
yDelete pool? [Y/n] y

Press ENTER to exit...

Remember at the end to eliminate resources so that they do not infringe costs.

References

batch-python-quickstart
Run container applications on Azure Batch

Posted in Azure, Microsoft | Tagged , , , , , | Comments Off on Azure Batch: Task in Containers

Kubernetes Pipeline

Let’s explain an easy way to build an integration pipeline (CI) on Minikube.

Launch Minikube

If you don’t have Minikube running on your system,

$ minikube start --memory 4000 --cpus 2

Wait for a few minutes, you’ll see something like.

Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Loading cached images from config file.

Installing Helm

Helm is The Kubernetes Package Manager, it helps you to deploy services into Kubernetes.

$ wget https://storage.googleapis.com/kubernetes-helm/helm-v2.10.0-linux-amd64.tar.gz -O helm.tar.gz
$ tar zxf helm.tar.gz
$ sudo cp linux-amd64/helm /usr/local/bin/helm
$ sudo chmod +x /usr/local/bin/helm

Applying the RBAC policy

$ kubectl create -f https://raw.githubusercontent.com/nordri/kubernetes-experiments/master/Pipeline/ServiceAccount.yaml

and then launch helm.

helm init --service-account tiller

Checking

$ helm version
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}

Deploying Jenkins

I’m using a custom values file for this chart. What I’m adjusting is:

AdminPassword: set to admin1234
ServiceType: set to NodePort (because is Minikube)
In plugins:
– kubernetes:1.2
– workflow-aggregator:2.5
– workflow-job:2.17
– credentials-binding:1.15
– git:3.7.0

And then the deployment:

$ helm install --name jenkins -f jenkins-helm-values.yaml stable/jenkins

After a few minutes we could be able to access Jenkins with:

$ minikube service jenkins

Configuring Jenkins

First, set the credentials to access Docker Hub where we’ll push the Docker images. The only field you must keep is ID because is needed by the pipeline in a next step. Fill it with your information:

Back to Jenkins main screen, add a new item type Pipeline

And finally, configure the pipeline in the Pipeline section:

Save the changes and click on Build now

And that’s it!

The pipeline

Let’s deep into the pipeline

The head

The pipeline starts setting the worker id so the pod has different label on each execution.

Follow the pod definition where we can define the containers who will run inside the pod. For this example we’ll need:

  1. maven
  2. docker
  3. mysql, this one with environment variables
  4. java, also with environment variables

Then the volumes, we need the docker sock in order to run docker in docker and a folder to save the artefacts downloaded from the Internet (it’s a Maven project!) between the executions. Saving time and bandwidth.

Cloning the repo…

What we do here is clean the workspace and clone the repository. It a SpringBoot application with MySQL.

Building…

We build the package using maven container.

Testing…

In this stage we launch our app inside Java container and after 30 seconds we check if it online, a simple smoky test. We save the return value in RES to decide if it’s ok or not. If not, finish with fail. As we defined all the containers at the beginning there’s a MySQL running inside the pod.

Building & Uploading Docker images…

If the testing stage went OK, we can push it to Docker Hub. To set the tag we use the commit ID cut to eight characters. To login into Docker Hub we use the withCredentials who takes a credential by id and fill the environment variables.

References

Set Up a Jenkins CI/CD Pipeline with Kubernetes

Repository

GitHub

Posted in Jenkins, Kubernetes | Tagged , , , , , | Comments Off on Kubernetes Pipeline