This guide describes the procedure of removing an OSD from a Ceph cluster.
This article is intended to provide guidance for removing an OSD in legacy charms. For the latest charms in the quincy/stable channel, a new remove-disk action is introduced to enable a simpler procedure. There is a Quincy version of this page available.
The following instructions are written for Juju version 3.0 and above. If you are using a release of Juju prior to 3.0, you should:
- Use
juju run-actioninstead ofjuju runto run action scripts on the unit. - Use
juju runinstead ofjuju execto execute remote commands on the unit.
-
Before removing an OSD unit, we first need to ensure that the cluster is healthy:
juju ssh ceph-mon/leader sudo ceph status -
Identify the target OSD
Check OSD tree to map OSDs to their host machines:
juju ssh ceph-mon/leader sudo ceph osd treeSample output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.09357 root default -5 0.03119 host finer-shrew 2 hdd 0.03119 osd.2 up 1.00000 1.00000 ...Assuming that we want to remove
osd.2. As shown in the output, it is hosted on the machinefiner-shrew.Check which unit is deployed on this machine:
juju statusSample output:
... Unit Workload Agent Machine Public address Ports Message ... ceph-osd/1* blocked idle 1 192.168.122.48 No block devices detected using current configuration ... Machine State DNS Inst id Series AZ Message ... 1 started 192.168.122.48 finer-shrew focal default Deployed ...In this case,
ceph-osd/1is the unit we want to remove.Therefore, the target OSD can be identified by the following properties:
OSD_UNIT=ceph-osd/1 OSD=osd.2 OSD_ID=2 -
Take the target OSD out of the cluster and check cluster health again:
juju run-action --wait $OSD_UNIT osd-out osds=$OSD_ID juju ssh ceph-mon/leader sudo ceph statusNote: Generally, taken an OSD
outof the cluster will trigger weight rebalancing that migrates placement groups (PGs) out of the target OSD. However, special cases where some PGs stuck inactive+remappedstate may occur. When encountering such a problem, you can rejoin the OSD back to the cluster and manually reweight it to 0:juju run-action --wait $OSD_UNIT osd-in osds=$OSD_ID juju run-action --wait ceph-mon/leader change-osd-weight osd=$OSD_ID weight=0After the reweighting, the PGs will be migrated to other available OSDs. Hence, you can safely proceed to the next step without marking the target OSD
out. For more information, please refer to Ceph documentation. -
Before stopping and destroying the target OSD, we need to make sure it is safe to do so:
juju run --unit ceph-mon/leader ceph osd ok-to-stop $OSD_ID juju run --unit ceph-mon/leader ceph osd safe-to-destroy $OSD_IDYou should only proceed to the next steps if both checks are passed.
-
Stop the OSD daemon.
juju run-action --wait $OSD_UNIT stop osds=$OSD_ID -
Confirm that the target OSD is down:
juju ssh ceph-mon/leader sudo ceph osd tree downSample output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.06238 root default -5 0 host finer-shrew 2 hdd 0 osd.2 down 0 1.00000 -
Purge the OSD:
juju run-action --wait ceph-mon/leader purge-osd osd=$OSD_ID i-really-mean-it=yesThis action removes the OSD from the cluster map and OSD map. It also removes its authentication key.
-
(Optional) If the unit hosting the target OSD does not have other active OSDs attached and you would like to delete it, you can do so by running:
juju remove-unit $OSD_UNITNote: This step should be taken with extra caution. If there are active OSDs on the unit, removing it will produce unexpected errors.
-
Ensure the cluster is in healthy state after being scaled down:
juju ssh ceph-mon/leader sudo ceph status