How to replicate charmed ceph resources using charm microceph

This guide demonstrates how to deploy charm microceph alongside charmed ceph cluster and use microceph to orchestrate replication for cephfs workloads. This tutorial does not cover RBD replication, however, the setup from this tutorial can be used to replicate RBD resources as well.

Prerequisites

  • A charmed ceph cluster in a juju model called primary
  • Another juju model called secondary
  • juju substrate as LXD

Primary Model

Let’s take a look at the primary charmed ceph cluster.

juju status --relations
Model    Controller  Cloud/Region           Version  SLA          Timestamp
primary  meicon      mystack/mystack        3.6.11   unsupported  07:36:46Z

App       Version  Status       Scale  Charm      Channel       Rev  Exposed  Message
ceph-mon  19.2.3   active           3  ceph-mon   squid/stable  372  no       Unit is ready and clustered
ceph-osd  19.2.3   active           3  ceph-osd   squid/stable  734  no       Unit is ready (1 OSD)

Unit         Workload     Agent  Machine  Public address  Ports  Message
ceph-mon/3   active       idle   7        10.149.4.169           Unit is ready and clustered
ceph-mon/4*  active       idle   8        10.149.4.146           Unit is ready and clustered
ceph-mon/5   active       idle   9        10.149.4.171           Unit is ready and clustered
ceph-osd/3   active       idle   10       10.149.4.226           Unit is ready (1 OSD)
ceph-osd/4   active       idle   11       10.149.4.168           Unit is ready (1 OSD)
ceph-osd/5*  active       idle   12       10.149.4.173           Unit is ready (1 OSD)

Machine  State    Address       Inst id                               Base          AZ                   Message
0        started  10.149.4.237  74e421b5-c40a-44f3-acd0-6ab361836e30  ubuntu@24.04  availability-zone-3  ACTIVE
1        started  10.149.4.169  0e76a1fc-571a-4d65-a216-997d70704f6a  ubuntu@24.04  availability-zone-1  ACTIVE
2        started  10.149.4.146  66530a31-2d10-4733-86b4-3b1a08f13085  ubuntu@24.04  availability-zone-3  ACTIVE
3        started  10.149.4.171  c2e02427-6771-48a0-aef5-d0c8812100a6  ubuntu@24.04  availability-zone-2  ACTIVE
10       started  10.149.4.226  64c0c104-1ef6-464a-bd2c-9af081f68a74  ubuntu@24.04  availability-zone-2  ACTIVE
11       started  10.149.4.168  feb9ab80-e405-426c-a591-801c4761b800  ubuntu@24.04  availability-zone-1  ACTIVE
12       started  10.149.4.173  09f449cb-21a2-43f7-b272-773ca59740fa  ubuntu@24.04  availability-zone-3  ACTIVE

Integration provider  Requirer       Interface  Type     Message
ceph-mon:mon          ceph-mon:mon   ceph       peer
ceph-mon:osd          ceph-osd:mon   ceph-osd   regular

The charmed ceph cluster is composed of primarily the ceph-mon and ceph-osd applications along with possible workload applications (like ceph-radosgw, ceph-fs etc.)

Deploy Primary MicroCeph Cluster

Now with the charmed ceph cluster active and ready, we can move ahead with integrating charm microceph in the charmed ceph cluster.

To do this, we will deploy the microceph charm in the primary model with charm config wait-to-adopt set to true. This will prevent the microceph charm from bootstrapping a new ceph cluster.

Note:
The snap channel is set to squid/edge for a change that is still to land on squid/stable.

juju deploy microceph primary --channel latest/edge -n 1 --config "wait-to-adopt=true" --config "snap-channel=squid/edge"

Wait for juju to provision the machines and the initialise the agent. The juju status should look something like this:

juju status --relations
Model    Controller  Cloud/Region           Version  SLA          Timestamp
primary  meicon      mystack/mystack        3.6.11   unsupported  07:36:46Z

App       Version  Status       Scale  Charm      Channel       Rev  Exposed  Message
ceph-mon  19.2.3   active           3  ceph-mon   squid/stable  372  no       Unit is ready and clustered
ceph-osd  19.2.3   active           3  ceph-osd   squid/stable  734  no       Unit is ready (1 OSD)
primary            maintenance      1  microceph  latest/edge   221  no       (bootstrap) Service not bootstrapped

Unit         Workload     Agent  Machine  Public address  Ports  Message
ceph-mon/3   active       idle   7        10.149.4.169           Unit is ready and clustered
ceph-mon/4*  active       idle   8        10.149.4.146           Unit is ready and clustered
ceph-mon/5   active       idle   9        10.149.4.171           Unit is ready and clustered
ceph-osd/3   active       idle   10       10.149.4.226           Unit is ready (1 OSD)
ceph-osd/4   active       idle   11       10.149.4.168           Unit is ready (1 OSD)
ceph-osd/5*  active       idle   12       10.149.4.173           Unit is ready (1 OSD)
primary/0*   maintenance  idle   6        10.149.4.237           (bootstrap) Service not bootstrapped

Machine  State    Address       Inst id                               Base          AZ                   Message
6        started  10.149.4.237  74e421b5-c40a-44f3-acd0-6ab361836e30  ubuntu@24.04  availability-zone-3  ACTIVE
7        started  10.149.4.169  0e76a1fc-571a-4d65-a216-997d70704f6a  ubuntu@24.04  availability-zone-1  ACTIVE
8        started  10.149.4.146  66530a31-2d10-4733-86b4-3b1a08f13085  ubuntu@24.04  availability-zone-3  ACTIVE
9        started  10.149.4.171  c2e02427-6771-48a0-aef5-d0c8812100a6  ubuntu@24.04  availability-zone-2  ACTIVE
10       started  10.149.4.226  64c0c104-1ef6-464a-bd2c-9af081f68a74  ubuntu@24.04  availability-zone-2  ACTIVE
11       started  10.149.4.168  feb9ab80-e405-426c-a591-801c4761b800  ubuntu@24.04  availability-zone-1  ACTIVE
12       started  10.149.4.173  09f449cb-21a2-43f7-b272-773ca59740fa  ubuntu@24.04  availability-zone-3  ACTIVE

Integration provider  Requirer       Interface  Type     Message
ceph-mon:mon          ceph-mon:mon   ceph       peer
ceph-mon:osd          ceph-osd:mon   ceph-osd   regular
primary:peers         primary:peers  ceph-peer  peer

MicroCeph charm status informs the operator about the still pending service bootstrap.

Adopt the charmed ceph cluster using microceph

The microceph charm can consume the admin endpoint provided by ceph-mon, this allows microceph to bootstrap with an existing ceph cluster. Proceed with integrating the ceph-mon application with microceph.

juju integrate ceph-mon:admin primary:adopt-ceph

The microceph application will go to blocked/executing state first, suggesting that the application is waiting for the remote ceph cluster data to arrive.

juju status --relations
Model    Controller  Cloud/Region           Version  SLA          Timestamp
primary  meicon      mystack/mystack        3.6.11   unsupported  07:36:46Z

App       Version  Status   Scale  Charm      Channel       Rev  Exposed  Message
ceph-mon  19.2.3   active       3  ceph-mon   squid/stable  372  no       Unit is ready and clustered
ceph-osd  19.2.3   active       3  ceph-osd   squid/stable  734  no       Unit is ready (1 OSD)
primary            blocked      1  microceph  latest/edge   221  no       (workload) Waiting for fsid(None), mon_hosts(None) and admin_key(False) from adopt-ceph relation

Unit         Workload  Agent      Machine  Public address  Ports  Message
ceph-mon/3   active    executing  7        10.149.4.169           Unit is ready and clustered
ceph-mon/4*  active    executing  8        10.149.4.146           Unit is ready and clustered
ceph-mon/5   active    executing  9        10.149.4.171           Unit is ready and clustered
ceph-osd/3   active    idle       10       10.149.4.226           Unit is ready (1 OSD)
ceph-osd/4   active    idle       11       10.149.4.168           Unit is ready (1 OSD)
ceph-osd/5*  active    idle       12       10.149.4.173           Unit is ready (1 OSD)
primary/0*   blocked   executing  6        10.149.4.237           (workload) Waiting for fsid(None), mon_hosts(None) and admin_key(False) from adopt-ceph relation

Machine  State    Address       Inst id                               Base          AZ                   Message
6        started  10.149.4.237  74e421b5-c40a-44f3-acd0-6ab361836e30  ubuntu@24.04  availability-zone-3  ACTIVE
7        started  10.149.4.169  0e76a1fc-571a-4d65-a216-997d70704f6a  ubuntu@24.04  availability-zone-1  ACTIVE
8        started  10.149.4.146  66530a31-2d10-4733-86b4-3b1a08f13085  ubuntu@24.04  availability-zone-3  ACTIVE
9        started  10.149.4.171  c2e02427-6771-48a0-aef5-d0c8812100a6  ubuntu@24.04  availability-zone-2  ACTIVE
10       started  10.149.4.226  64c0c104-1ef6-464a-bd2c-9af081f68a74  ubuntu@24.04  availability-zone-2  ACTIVE
11       started  10.149.4.168  feb9ab80-e405-426c-a591-801c4761b800  ubuntu@24.04  availability-zone-1  ACTIVE
12       started  10.149.4.173  09f449cb-21a2-43f7-b272-773ca59740fa  ubuntu@24.04  availability-zone-3  ACTIVE

Integration provider  Requirer            Interface   Type     Message
ceph-mon:admin        primary:adopt-ceph  ceph-admin  regular
ceph-mon:mon          ceph-mon:mon        ceph        peer
ceph-mon:osd          ceph-osd:mon        ceph-osd    regular
primary:peers         primary:peers       ceph-peer   peer

Eventually, it will become active once microceph bootstrap is complete. MicroCeph DOES NOT spawn any new ceph service.

juju status --relations
Model    Controller  Cloud/Region           Version  SLA          Timestamp
primary  meicon      mystack/mystack        3.6.11   unsupported  07:36:46Z

App       Version  Status  Scale  Charm      Channel       Rev  Exposed  Message
ceph-mon  19.2.3   active      3  ceph-mon   squid/stable  372  no       Unit is ready and clustered
ceph-osd  19.2.3   active      3  ceph-osd   squid/stable  734  no       Unit is ready (1 OSD)
primary            active      1  microceph  latest/edge   221  no       (workload) charm is ready

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-mon/3   active    idle   7        10.149.4.169           Unit is ready and clustered
ceph-mon/4*  active    idle   8        10.149.4.146           Unit is ready and clustered
ceph-mon/5   active    idle   9        10.149.4.171           Unit is ready and clustered
ceph-osd/3   active    idle   10       10.149.4.226           Unit is ready (1 OSD)
ceph-osd/4   active    idle   11       10.149.4.168           Unit is ready (1 OSD)
ceph-osd/5*  active    idle   12       10.149.4.173           Unit is ready (1 OSD)
primary/0*   active    idle   13       10.149.4.218           (workload) charm is ready

Machine  State    Address       Inst id                               Base          AZ                   Message
7        started  10.149.4.169  0e76a1fc-571a-4d65-a216-997d70704f6a  ubuntu@24.04  availability-zone-1  ACTIVE
8        started  10.149.4.146  66530a31-2d10-4733-86b4-3b1a08f13085  ubuntu@24.04  availability-zone-3  ACTIVE
9        started  10.149.4.171  c2e02427-6771-48a0-aef5-d0c8812100a6  ubuntu@24.04  availability-zone-2  ACTIVE
10       started  10.149.4.226  64c0c104-1ef6-464a-bd2c-9af081f68a74  ubuntu@24.04  availability-zone-2  ACTIVE
11       started  10.149.4.168  feb9ab80-e405-426c-a591-801c4761b800  ubuntu@24.04  availability-zone-1  ACTIVE
12       started  10.149.4.173  09f449cb-21a2-43f7-b272-773ca59740fa  ubuntu@24.04  availability-zone-3  ACTIVE
13       started  10.149.4.218  0abe03ac-35c3-427f-ae06-a4b20bbc6c60  ubuntu@24.04  availability-zone-1  ACTIVE

Integration provider  Requirer            Interface   Type     Message
ceph-mon:admin        primary:adopt-ceph  ceph-admin  regular
ceph-mon:mon          ceph-mon:mon        ceph        peer
ceph-mon:osd          ceph-osd:mon        ceph-osd    regular
primary:peers         primary:peers       ceph-peer   peer

What just happened ?

Charm microceph bootstrapped the underlying microceph cluster using the existing ceph cluster, this allows microceph to orchestrate services and replication for the existing ceph cluster. This DOES NOT migrate the existing ceph cluster, rather the microceph cluster is independently scalable (more units can be added as per need) and works alongside the charmed ceph cluster.

Fetch ceph status from the microceph unit to validate that adoption was successful.

juju ssh primary/0 -- sudo ceph -s
  cluster:
    id:     5ffa37d4-d988-11f0-ac8f-fa163e3872c2
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum juju-9e6b53-primary-8,juju-9e6b53-primary-9,juju-9e6b53-primary-7 (age 91m)
    mgr: juju-9e6b53-primary-9(active, since 90m), standbys: juju-9e6b53-primary-8, juju-9e6b53-primary-7
    osd: 3 osds: 3 up (since 90m), 3 in (since 90m)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   81 MiB used, 15 GiB / 15 GiB avail
    pgs:     1 active+clean

and compare it with the ceph cluster status fetched from the ceph-mon application.

juju ssh ceph-mon/0 -- sudo ceph -s
  cluster:
    id:     5ffa37d4-d988-11f0-ac8f-fa163e3872c2
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum juju-9e6b53-primary-8,juju-9e6b53-primary-9,juju-9e6b53-primary-7 (age 94m)
    mgr: juju-9e6b53-primary-9(active, since 94m), standbys: juju-9e6b53-primary-8, juju-9e6b53-primary-7
    osd: 3 osds: 3 up (since 94m), 3 in (since 94m)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   81 MiB used, 15 GiB / 15 GiB avail
    pgs:     1 active+clean

CephFS

(optional) Enable mds daemon on primary microceph

Skip this step, if your charmed ceph cluster already has the ceph-fs application. Otherwise, use microceph to enable the mds daemon for the primary ceph cluster (that microceph has now adopted).

juju ssh primary/0 -- sudo microceph enable mds
juju ssh primary/0 -- sudo microceph status
MicroCeph deployment summary:
- juju-9e6b53-primary-13 (10.149.4.218)
  Services: mds
  Disks: 0

Create FS resources

Let’s go ahead and create a cephfs volume and subvolumes.

Each CephFS volume needs atleast one mds daemon to serve it. It is recommenede to have atleast one standby mds daemon for higher availability.

juju ssh primary/0 -- sudo ceph fs volume create media
    Volume created successfully (no MDS daemons created)

CephFS has an abstraction of independent directory trees called subvolume and groups called subvolumegroup to implement policies for them. A subvolume can exist independently without a subvolumegroup.

juju ssh primary/0 -- sudo ceph fs subvolume create media photos
juju ssh primary/0 -- sudo ceph fs subvolume ls media
[
    {
        "name": "photos"
    }
]

(optional) Mount subvolume

Now that we have a subvolume resource we can mount it on any machine with appropriate ceph client access (say admin) to the cluster.

juju ssh primary/0
sudo ceph fs subvolume getpath media photos
    /volumes/_nogroup/photos/0b873890-b0d9-4e77-b1b4-76995e3c2e1b
sudo mount -t ceph :/ /media/volumes/_nogroup/photos/0b873890-b0d9-4e77-b1b4-76995e3c2e1b -o name=admin,fs=vol

Replication

Deploy Secondary MicroCeph Cluster (as the replication target site)

juju add-model secondary
juju deploy microceph secondary --channel latest/edge -n 1 --storage "osd-standalone=loop,10G,3"
juju status
Model      Controller  Cloud/Region           Version  SLA          Timestamp
secondary  meicon      mystack/mystack        3.6.11   unsupported  09:54:19Z

App        Version  Status  Scale  Charm      Channel      Rev  Exposed  Message
secondary           active      1  microceph  latest/edge  221  no       (workload) charm is ready

Unit          Workload  Agent  Machine  Public address  Ports  Message
secondary/1*  active    idle   1        10.149.4.195           (workload) charm is ready

Machine  State    Address       Inst id                               Base          AZ                   Message
1        started  10.149.4.195  b53c0aab-97c7-41bb-8bcf-df3f64c892e6  ubuntu@24.04  availability-zone-2  ACTIVE

Relate Primary MicroCeph to Secondary MicroCeph Cluster

Set up remote integration between the two MicroCeph clusters:

  1. Create an offer from the primary cluster:
juju switch primary
juju offer primary:remote-provider
Application "primary" endpoints [remote-provider] available at "admin/primary.primary"
  1. Switch to the secondary model and consume the offer:
juju switch secondary
juju consume admin/primary.primary
  1. Create the remote integration relation:

Relating the two application across models, both of them will go to blocked state.
This is because for remote operations, each microceph site has to be assigned a unique name.

juju relate secondary:remote-requirer primary:remote-provider
juju status
Model      Controller  Cloud/Region           Version  SLA          Timestamp
secondary  meicon      mystack/mystack        3.6.11   unsupported  09:54:19Z

SAAS     Status   Store   URL
primary  blocked  ps6con  admin/primary.primary

App        Version  Status   Scale  Charm      Channel      Rev  Exposed  Message
secondary           blocked      1  microceph  latest/edge  221  no       (workload) config site-name not set

Unit          Workload  Agent  Machine  Public address  Ports  Message
secondary/1*  blocked   idle   1        10.149.4.195           (workload) config site-name not set

Machine  State    Address       Inst id                               Base          AZ                   Message
1        started  10.149.4.195  b53c0aab-97c7-41bb-8bcf-df3f64c892e6  ubuntu@24.04  availability-zone-2  ACTIVE
  1. Configure site-name for both clusters:

In order to proceed, add site-name config for both the applications (primary and secondary)

juju switch primary
juju config primary site-name=primary
juju switch secondary
juju config secondary site-name=secondary
...
juju status
Model      Controller  Cloud/Region           Version  SLA          Timestamp
secondary  meicon      mystack/mystack        3.6.11   unsupported  09:54:19Z

SAAS     Status  Store   URL
primary  active  ps6con  admin/primary.primary

App        Version  Status  Scale  Charm      Channel      Rev  Exposed  Message
secondary           active      1  microceph  latest/edge  221  no

Unit          Workload  Agent  Machine  Public address  Ports  Message
secondary/1*  active    idle   1        10.149.4.195

Machine  State    Address       Inst id                               Base          AZ                   Message
1        started  10.149.4.195  b53c0aab-97c7-41bb-8bcf-df3f64c892e6  ubuntu@24.04  availability-zone-2  ACTIVE
  1. Validate the remotes configured on primary cluster:

You can list the configured remotes using:

juju ssh primary/0 -- sudo microceph remote list
 ID  REMOTE NAME  LOCAL NAME
  1  secondary    primary

Similarly on the secondary site:

juju switch secondary
juju ssh secondary/0 -- sudo microceph remote list
 ID  REMOTE NAME  LOCAL NAME
  1  primary      secondary

Side-Quest (workaround) for missing mgr CLI

The command ceph fs mirror ls was missing in upstream squid release and will be fixed in the 19.2.4 point release (once available in Ubuntu 24.04 distro). This was (temporarily) resolved for MicroCeph snap but not the ceph packages, hence we need to make sure the active mgr daemon is from the microceph cluster and not from the charmed ceph cluster.

The workaround is simple, on the primary cluster, fail the current mgr daemon until the microceph managed mgr daemon is the active one. For this case, juju-9e6b53-primary-13 is a host managed by microceph.

juju ssh primary/1
ubuntu@juju-9e6b53-primary-13:~$ sudo microceph enable mgr
ubuntu@juju-9e6b53-primary-13:~$ sudo ceph -s
  cluster:
    id:     5ffa37d4-d988-11f0-ac8f-fa163e3872c2
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum juju-9e6b53-primary-8,juju-9e6b53-primary-9,juju-9e6b53-primary-7 (age 2h)
    mgr: juju-9e6b53-primary-9(active, since 2s), standbys: juju-9e6b53-primary-13, juju-9e6b53-primary-8
    mds: 1/1 daemons up
    osd: 3 osds: 3 up (since 2h), 3 in (since 2h)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 145 pgs
    objects: 25 objects, 622 KiB
    usage:   135 MiB used, 15 GiB / 15 GiB avail
    pgs:     145 active+clean

ubuntu@juju-9e6b53-primary-13:~$ sudo ceph mgr fail
ubuntu@juju-9e6b53-primary-13:~$ sudo ceph -s
  cluster:
    id:     5ffa37d4-d988-11f0-ac8f-fa163e3872c2
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum juju-9e6b53-primary-8,juju-9e6b53-primary-9,juju-9e6b53-primary-7 (age 2h)
    mgr: juju-9e6b53-primary-13(active, since 1.8937s), standbys: juju-9e6b53-primary-8, juju-9e6b53-primary-7
    mds: 1/1 daemons up
    osd: 3 osds: 3 up (since 2h), 3 in (since 2h)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 145 pgs
    objects: 25 objects, 622 KiB
    usage:   135 MiB used, 15 GiB / 15 GiB avail
    pgs:     145 active+clean

Enable replication for the CephFS resource

We have the primary cluster (charmed ceph fronted by microceph) and secondary cluster (microceph) ready.
The next thing we need to do is bring up the cephfs mirror daemon on the primary cluster.

juju ssh primary/0 -- sudo microceph enable cephfs-mirror

Let’s use the subvolume list command, on the primary cluster:

juju ssh primary/0 -- sudo ceph fs subvolume ls media
[
    {
        "name": "photos"
    }
]

At this point, the secondary cluster should not have any volume resource. Let’s go ahead and create it.
This resource will be used as the target volume for replication.

juju switch secondary
juju ssh secondary/0 -- sudo ceph fs volume create media
    Volume created successfully (no MDS daemons created)
juju ssh secondary/0 -- sudo ceph fs subvolume ls media
[]

Enable replication for that resource.

juju ssh primary/0 -- sudo microceph replication enable cephfs --remote secondary --volume media --subvolume photos
juju ssh primary/0 -- sudo microceph replication list cephfs
+--------+--------------------------+-----------+
| VOLUME | RESOURCE                 | TYPE      |
+--------+--------------------------+-----------+
| media  | /volumes/_nogroup/photos | subvolume |
+--------+--------------------------+-----------+

Validate CephFS subvolume resource on secondary

juju switch secondary
juju ssh secondary/0 -- sudo ceph fs subvolume ls media
[
    {
        "name": "photos"
    }
]

What’s next

Buy yourself a beer, it was a long and multi-step tutorial. But more importantly now that you have a configured multi-site setup available you can experiment with the microceph replication family of CLIs. And let us know about its strengths and shortcomings.

1 Like