Deploying ceph with rook

Deploying Rook with microk8s and lxd

This document is meant to specify how we can have a ceph cluster up and running by using a stack consisting of Rook (for ceph deployment), microk8s (for the kubernetes cluster) and lxd (for container management).

Let’s start by installing both microk8s and lxd:

sudo snap install lxd
sudo lxd init
snap install microk8s --classic

For the microk8s snap, most of the commands are of the form: ‘microk8s.kubectl …’; as such, we’re going to add an alias to make things shorter:

alias kctl=microk8s.kubectl

We can also make most commands not require root access with the following (replace ‘ubuntu’ with your username):

sudo usermod -a -G microk8s ubuntu
sudo chown -f -R ubuntu ~/.kube
newgrp microk8s

A couple of configuration changes are needed on the microk8s side before getting started. These include a couple of extensions:

microk8s enable ha-cluster
microk8s enable dns
microk8s enable rbac

In addition, we need to look at the file /var/snap/microk8s/current/args/kube-apiserver and add the config option “allow-privileged=true” if not already present.

Troubleshooting microk8s

Getting microk8s to work properly can be tricky. Following the above steps may not be enough, depending on your environment. To check that microk8s is running correctly, we can use:

microk8s.status

And see if the message “microk8s is running” appears, alongside the enabled extensions. If not, we can run microk8s.inspect and check for any obvious errors. A somewhat frequent one involves the memory cgroup not being enabled. To solve this, we have to edit the /etc/default/grub file at the line that starts with “GRUB_CMDLINE_LINUX=” and set it to: GRUB_CMDLINE_LINUX=“cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=0”

Afterwards, running

sudo update-grub

and then rebooting the machine.

If issues persist, a combination of microk8s.stop and microk8s.start can help. Refreshing the snap to a previous version is also a possibility.

Running a microk8s cluster on lxd

In order to create a cluster with lxd containers, we first need to set up a profile:

lxc profile create microk8s

And then edit the profile accordingly via the following:

wget https://raw.githubusercontent.com/ubuntu/microk8s/master/tests/lxc/microk8s.profile -O microk8s.profile
cat microk8s.profile | lxc profile edit microk8s
rm microk8s.profile

Now we’ll launch an lxd container:

lxc launch -p default -p microk8s ubuntu:22.04 rook-node1

On it, we’ll install microk8s much like we did on the host.

With all that set up, we go back to the host and run the following:

microk8s.add-node

This will yield something like the following:

From the node you wish to join to this cluster, run the following:

microk8s join 10.245.160.115:25000/507ba48fae3d5047fbbaba30be1fefd7/b88e57485a71

On the lxd container, we thus run the above command. If all goes well, running the following on the host:

kctl get nodes

Should output something like:

NAME STATUS ROLES AGE VERSION
rook-node1 Ready 45m v1.23.13-2+ce91f4e8e36dcb
lmlg Ready 53m v1.23.13-2+ce91f4e8e36dcb
rook-node2 Ready 13s v1.23.13-2+ce91f4e8e36dcb

If the above command only outputs the host node, a common source of errors is having different (incompatible) versions of microk8s. Having them run the same version (as shown in this example) makes things less error prone.

It should be noted that if the nodes are remote, or not readily accessible, we’ll most likely need to edit the /etc/hosts file or fiddle with DNS settings so that the whole cluster is reachable.

(Optional) Using VM’s with microk8s

If we’re running on a machine with enough resources to spare, we can dispense with lxd and instead use virtual machines to power our cluster. With multipass, it’s as simple as doing:

multipass launch --cpus 4 --mem 12G --disk 60G --name rook-node1 daily:22.04

And installing microk8s and joining the cluster as described above.

It should be pointed out that multipass typically launches VM’s with a single block device, so if we want to use them as hosts for OSDs, we need to be able to add devices dynamically. Doing so is a somewhat lengthy process, so here’s a step by step guide of doing it:

First, we must make sure that libvirt is being used as the backend for multipass. To do so, we must install libvirt in our system, then run the following commands:

 sudo snap connect multipass:libvirt
 multipass stop --all
 multipass set local.driver=libvirt

Next, we need to create the needed pool and volumes, replacing the variables $my-pool-name, $my-pool-path, $my-vol-name and the capacity as needed:

 sudo virsh pool-define-as $my-pool-name --type=dir "--target=$my-pool-path"
 sudo virsh pool-start $my-pool-name
 sudo virsh vol-create-as --pool=$my-pool-name --name=$my-vol-name       --capacity=35GB --format=raw

Finally, we attach the volume to the VM, replacing the name as needed:

 sudo virsh attach-device --live $vm-name /dev/stdin <<EOF

<disk type=‘volume’ device=‘disk’>
<driver name=‘qemu’ type=‘raw’/>
<source pool=’$my-pool-name’ volume=’$my-vol-name’/>>
<serial>myserial
<target dev=‘vde’/>
</disk>
EOF

Afterwards, the volume should be attached to a mount point such as “/dev/vdc”, with the specified size.

Deploying Ceph with Rook

With our cluster in place, we’re ready to deploy Ceph. We’re going to check out the Rook repo and use their example files:

git clone --single-branch --branch v1.9.2 https://github.com/rook/rook.git
cd rook/deploy/examples

Before proceeding with the deployment, there are some changes we need to apply to the templates:
In the file operator.yaml, the line ‘# ROOK_CSI_KUBELET_DIR_PATH: “/var/lib/kubelet”’ must be uncommented and edited. It has to point to the kubelet directory for microk8s. This should be: /var/snap/microk8s/common/var/lib/kubelet.
If our cluster size is less than 3 nodes, we have to edit the cluster.yaml file. Specifically, we have to look for the option “allowMultiplePerNode” and set it to “true”.

WIth those changes done, we proceed:

kctl create -f crds.yaml -f common.yaml -f operator.yaml

At this point, we need to wait until the operator pod is running before proceeding - i.e once the command kctl get pods -n rook-ceph outputs something like:

NAME READY STATUS RESTARTS AGE
rook-ceph-operator-799cd7d684-mbnbq 1/1 Running 0 1m

Then we’re ready to move on. The next step involves deploying the cluster itself and waiting until all components are settled. To do this, we’ll issue the command kctl create -f cluster.yaml.

If all went well, we should eventually see something like the following:

NAME READY STATUS RESTARTS AGE
rook-ceph-operator-799cd7d684-mbnbq 1/1 Running 0 17m
csi-rbdplugin-8pz9g 3/3 Running 0 15m
csi-cephfsplugin-2mbv2 3/3 Running 0 15m
csi-cephfsplugin-provisioner-7577bb4d59-sqs8d 6/6 Running 0 15m
csi-rbdplugin-provisioner-847b498845-x6nnn 6/6 Running 0 15m
rook-ceph-mon-a-7884c5468c-s5s9f 1/1 Running 0 14m
rook-ceph-mon-b-5fbf7659d5-xv987 1/1 Running 0 13m
rook-ceph-mon-c-77d4cb8767-nq7gw 1/1 Running 0 13m
rook-ceph-mgr-a-796d464bf9-9cm5t 2/2 Running 0 12m
rook-ceph-mgr-b-5b9d46966c-88trq 2/2 Running 0 12m
rook-ceph-crashcollector-eevee-58fb86b969-2rjz4 1/1 Running 0 11m
rook-ceph-osd-0-5fb584cb49-gvxh5 1/1 Running 0 11m
rook-ceph-osd-1-bc69b64db-pxfvt 1/1 Running 0 11m
rook-ceph-osd-2-qh81c75ec-oweys 1/1 Running 0 11m
rook-ceph-osd-prepare-eevee-g62vn 0/1 Completed 0 10m

1 Like

For verification the rook toolbox can be useful:

microk8s.kubectl create -f toolbox.yaml

# wait, then:

microk8s.kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-4.4$ ceph -s
...

In order to run custom images with Rook, we can edit the configuration map at rook-ceph-operator-config via the following command:

kubectl -n $ROOK_OPERATOR_NAMESPACE edit configmap rook-ceph-operator-config

In particular, we want to edit the following variables:

ROOK_CSI_CEPH_IMAGE
ROOK_CSI_REGISTRAR_IMAGE
ROOK_CSI_PROVISIONER_IMAGE
ROOK_CSI_ATTACHER_IMAGE
ROOK_CSI_RESIZER_IMAGE
ROOK_CSI_SNAPSHOTTER_IMAGE
ROOK_CSIADDONS_IMAGE

But note that the first one is the truly important one. The others are used in add-ons mostly. By modifying it, we can make them point to a local Docker registry with an Ubuntu based image.

To deploy the cluster with a custom image rather than by migration, you can update the cluster.yaml file to point at the correct image: sed -i "s|image: rook/ceph:.*|image: $img|g" cluster.yaml

Also, to change a running cluster kubectl patch works as well.

kubectl patch CephCluster -n rook-ceph my-cluster --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"some/image:latest\"}}}"