How to build a Ceph backed Kubernetes cluster

Overview

In this tutorial, you will learn how to deploy a 3 node Charmed Kubernetes cluster that uses Ceph storage. We will use Juju and MAAS to deploy our cluster.

What is Kubernetes?

Kubernetes clusters host containerised applications in a reliable and scalable way.
Having DevOps in mind, Kubernetes makes maintenance tasks such as upgrades and security patching simple.

What is Ceph?

Ceph is a software-defined storage solution designed to address the object, block, and file storage needs of data centres adopting open source as the new norm for high-growth block storage, object stores and data lakes. Ceph provides enterprise scalable storage while keeping CAPEX and OPEX costs in line with underlying bulk commodity disk prices.

What you’ll learn

  • How to deploy Charmed Kubernetes and Ceph with Juju
  • How to create Ceph pools to be used with Kubernetes with Juju
  • How to create PersistentVolumeClaims that use Ceph StorageClasses

What you’ll need

  • 3 nodes with at least 2 disks and 1 network interface
  • Access to a MAAS environment setup with the 3 nodes in the ‘Ready’ state
  • A Juju controller setup to use the above MAAS cloud
  • The kubectl client installed
  • The bundle.yaml saved to a file

Edit bundle.yaml to contain the correct OSD devices

Duration: 2:00

Before deploying our bundle.yaml, we must ensure that our Ceph charm is configured to use the correct OSD devices.

  ceph-osd:
    charm: cs:ceph-osd
    num_units: 3
    options:
      osd-devices: /dev/sdb /dev/sdc
      source: distro
    bindings:
      "": oam-space
    to:
    - 1001
    - 1002
    - 1003

Notice that the osd-devices configuration above matches the Available disks and partitions section of the node in the image below:

Deploy the bundle.yaml

Duration: 10:00

Deploy the bundle with:
$ juju deploy ./bundle.yaml

A successful deployment should look similar to the following juju status output:

$ juju status
Model  Controller            Cloud/Region          Version  SLA          Timestamp
k8s    orangebox100-default  OrangeBox100/default  2.8.6    unsupported  18:22:29-08:00

App                    Version  Status  Scale  Charm                  Store       Rev  OS      Notes
ceph-mon               15.2.7   active      3  ceph-mon               jujucharms   51  ubuntu  
ceph-osd               15.2.7   active      3  ceph-osd               jujucharms  306  ubuntu  
containerd             1.3.3    active      5  containerd             jujucharms   97  ubuntu  
easyrsa                3.0.1    active      1  easyrsa                jujucharms  339  ubuntu  
etcd                   3.4.5    active      3  etcd                   jujucharms  544  ubuntu  
flannel                0.11.0   active      5  flannel                jujucharms  513  ubuntu  
kubeapi-load-balancer  1.18.0   active      1  kubeapi-load-balancer  jujucharms  753  ubuntu  exposed
kubernetes-master      1.19.6   active      2  kubernetes-master      jujucharms  912  ubuntu  
kubernetes-worker      1.19.6   active      3  kubernetes-worker      jujucharms  713  ubuntu  exposed

Unit                      Workload  Agent  Machine  Public address  Ports           Message
ceph-mon/0                active    idle   0/lxd/0  172.27.100.168                  Unit is ready and clustered
ceph-mon/1*               active    idle   1/lxd/0  172.27.100.165                  Unit is ready and clustered
ceph-mon/2                active    idle   2/lxd/0  172.27.100.172                  Unit is ready and clustered
ceph-osd/0*               active    idle   0        172.27.100.107                  Unit is ready (2 OSD)
ceph-osd/1                active    idle   1        172.27.100.110                  Unit is ready (2 OSD)
ceph-osd/2                active    idle   2        172.27.100.105                  Unit is ready (2 OSD)
easyrsa/0*                active    idle   0/lxd/1  172.27.100.174                  Certificate Authority connected.
etcd/0                    active    idle   0/lxd/2  172.27.100.170  2379/tcp        Healthy with 3 known peers
etcd/1*                   active    idle   1/lxd/1  172.27.100.166  2379/tcp        Healthy with 3 known peers
etcd/2                    active    idle   2/lxd/1  172.27.100.173  2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle   0/lxd/3  172.27.100.169  443/tcp         Loadbalancer ready.
kubernetes-master/0*      active    idle   1/lxd/2  172.27.100.167  6443/tcp        Kubernetes master running.
  containerd/4            active    idle            172.27.100.167                  Container runtime available
  flannel/4               active    idle            172.27.100.167                  Flannel subnet 10.1.83.1/24
kubernetes-master/1       active    idle   2/lxd/2  172.27.100.171  6443/tcp        Kubernetes master running.
  containerd/3            active    idle            172.27.100.171                  Container runtime available
  flannel/3               active    idle            172.27.100.171                  Flannel subnet 10.1.35.1/24
kubernetes-worker/0*      active    idle   0        172.27.100.107  80/tcp,443/tcp  Kubernetes worker running.
  containerd/1            active    idle            172.27.100.107                  Container runtime available
  flannel/1               active    idle            172.27.100.107                  Flannel subnet 10.1.86.1/24
kubernetes-worker/1       active    idle   1        172.27.100.110  80/tcp,443/tcp  Kubernetes worker running.
  containerd/0*           active    idle            172.27.100.110                  Container runtime available
  flannel/0*              active    idle            172.27.100.110                  Flannel subnet 10.1.27.1/24
kubernetes-worker/2       active    idle   2        172.27.100.105  80/tcp,443/tcp  Kubernetes worker running.
  containerd/2            active    idle            172.27.100.105                  Container runtime available
  flannel/2               active    idle            172.27.100.105                  Flannel subnet 10.1.88.1/24

Machine  State    DNS             Inst id              Series  AZ       Message
0        started  172.27.100.107  node05ob100          focal   default  Deployed
0/lxd/0  started  172.27.100.168  juju-1be73e-0-lxd-0  focal   default  Container started
0/lxd/1  started  172.27.100.174  juju-1be73e-0-lxd-1  focal   default  Container started
0/lxd/2  started  172.27.100.170  juju-1be73e-0-lxd-2  focal   default  Container started
0/lxd/3  started  172.27.100.169  juju-1be73e-0-lxd-3  focal   default  Container started
1        started  172.27.100.110  node07ob100          focal   default  Deployed
1/lxd/0  started  172.27.100.165  juju-1be73e-1-lxd-0  focal   default  Container started
1/lxd/1  started  172.27.100.166  juju-1be73e-1-lxd-1  focal   default  Container started
1/lxd/2  started  172.27.100.167  juju-1be73e-1-lxd-2  focal   default  Container started
2        started  172.27.100.105  node06ob100          focal   default  Deployed
2/lxd/0  started  172.27.100.172  juju-1be73e-2-lxd-0  focal   default  Container started
2/lxd/1  started  172.27.100.173  juju-1be73e-2-lxd-1  focal   default  Container started
2/lxd/2  started  172.27.100.171  juju-1be73e-2-lxd-2  focal   default  Container started

The deployment should reach the above state in about 10 minutes (depending on hardware).
Congrats, we have a kubernetes cluster up and running at this point!

Verify that Ceph StorageClasses were created

Duration: 2:00

Copy the kubeconfig file from a kubernetes-master node

$ mkdir -p .kube
$ juju scp kubernetes-master/0:~/config .kube/

Read Kubernetes StorageClasses

$ kubectl get sc
NAME                 PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ext4-pool            rbd.csi.ceph.com   Delete          Immediate           true                   5d12h
xfs-pool (default)   rbd.csi.ceph.com   Delete          Immediate           true                   5d12h

Great, our storageClasses were setup as expected! Now we will need to create Ceph pools to match our storageClasses so that we can use them with our kubernetes workloads.

Create Ceph pools

Duration: 5:00

List Ceph pools

$ juju run-action --wait ceph-mon/leader list-pools
unit-ceph-mon-1:
  UnitId: ceph-mon/1
  id: "2"
  results:
    message: |
      1 device_health_metrics
  status: completed
  timing:
    completed: 2020-12-29 19:08:31 +0000 UTC
    enqueued: 2020-12-29 19:08:30 +0000 UTC
    started: 2020-12-29 19:08:30 +0000 UTC

Create the Ceph xfs-pool

$ juju run-action --wait ceph-mon/leader create-pool name=xfs-pool
unit-ceph-mon-1:
  UnitId: ceph-mon/1
  id: "5"
  results:
    Stderr: |
      pool 'xfs-pool' created
      set pool 2 size to 3
      set pool 2 target_size_ratio to 0.1
      enabled application 'unknown' on pool 'xfs-pool'
  status: completed
  timing:
    completed: 2020-12-29 19:42:26 +0000 UTC
    enqueued: 2020-12-29 19:42:19 +0000 UTC
    started: 2020-12-29 19:42:19 +0000 UTC

List the Ceph pools again to verify that the new pool is created:

$ juju run-action --wait ceph-mon/leader list-pools
unit-ceph-mon-1:
  UnitId: ceph-mon/1
  id: "9"
  results:
    message: |
      1 device_health_metrics
      2 xfs-pool
  status: completed
  timing:
    completed: 2020-12-29 19:50:14 +0000 UTC
    enqueued: 2020-12-29 19:50:13 +0000 UTC
    started: 2020-12-29 19:50:13 +0000 UTC 

Congratulations, we have created our new Ceph pool and now we are ready to use them with kubernetes!

Verify Ceph backed PersistentVolumeClaim functionality

Duration: 5:00

Create a PersistentVolumeClaim

Use the following claim.json file to create a PersistentVolumeClaim:

$ cat claim.json
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
     "name": "myvol"
  },
  "spec": {
      "accessModes": [
          "ReadWriteOnce"
      ],
      "resources": {
          "requests": {
             "storage": "4Gi"
          }
      },
      "storageClassName": "ceph-xfs"
   }
}
$ kubectl apply -f claim.json

Check the status of the PersistentVolumeClaim

$ kubectl get pvc
NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myvol   Bound    pvc-8987ad38-3888-4dd8-94c1-39868792c37e   4Gi        RWO            ceph-xfs       35m

Create a ReplicationController that uses the Ceph backed PVC

Use the following pod.yaml file to create a ReplicationController:

$ cat pod.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: server
spec:
  replicas: 1
  selector:
      role: server
  template:
      metadata:
        labels:
          role: server
      spec:
        containers:
        - name: server
          image: nginx
          volumeMounts:
            - mountPath: /var/lib/www/html
              name: myvol
        volumes:
          - name: myvol
            persistentVolumeClaim:
              claimName: myvol
  $ kubectl apply -f pod.yaml

Check the status of the ReplicationController pod

$ kubectl get pods
NAME                                         READY   STATUS    RESTARTS   AGE
csi-rbdplugin-hsnjl                          3/3     Running   0          6d9h
csi-rbdplugin-md8zd                          3/3     Running   0          6d9h
csi-rbdplugin-nhc6t                          3/3     Running   0          6d9h
csi-rbdplugin-provisioner-549c6b54c6-2ts2x   6/6     Running   0          6d9h
csi-rbdplugin-provisioner-549c6b54c6-8f7v9   6/6     Running   0          6d9h
csi-rbdplugin-provisioner-549c6b54c6-l59nr   6/6     Running   1          6d9h
server-48g2s                                 1/1     Running   0          39m
$ kubectl describe pod server-48g2s 
Name:         server-48g2s
Namespace:    default
Priority:     0
Node:         node06ob100/172.27.100.105
...
Volumes:
  myvol:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  myvol
    ReadOnly:   false
  default-token-bptwd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bptwd
    Optional:    false
...
Events:
  Type    Reason                  Age   From                     Message
  ----    ------                  ----  ----                     -------
  Normal  Scheduled               41m   default-scheduler        Successfully assigned default/server-48g2s to node06ob100
  Normal  SuccessfulAttachVolume  41m   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8987ad38-3888-4dd8-94c1-39868792c37e"
  Normal  Pulling                 41m   kubelet                  Pulling image "nginx"
  Normal  Pulled                  41m   kubelet                  Successfully pulled image "nginx" in 7.936656052s
  Normal  Created                 41m   kubelet                  Created container server
  Normal  Started                 41m   kubelet                  Started container server

Log in to the container and check that the volume is mounted

$ kubectl exec -it server-48g2s -- bash
root@server-48g2s:/# 
root@server-48g2s:/# df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/rbd0       4.0G   33M  4.0G   1% /var/lib/www/html
root@server-48g2s:/# exit

Now our pod has an RBD mount!

Cleanup the ReplicationController

$ kubectl delete replicationcontrollers/server
replicationcontroller "server" deleted
$ kubectl get replicationcontrollers
No resources found.

:warning: To delete the myvol pvc the replicationcontroller server must be deleted beforehand!

Wrap-up

Congratulations, you now have a highly-available multi-node Kubernetes with Ceph backed storage to orchestrate your containers.

ⓘ To test your understanding of this tutorial, complete the following steps by yourself:

  1. Create the ext4-pool
  2. Create a PersistentVolumeClaim that is backed by the ext4-pool
  3. Create a ReplicationController that uses the ext4-pool backed PVC