This guide describes the procedure of removing an OSD from a Ceph cluster.
This article is intended to provide guidance for removing an OSD in legacy charms. For the latest charms in the quincy/stable
channel, a new remove-disk
action is introduced to enable a simpler procedure. There is a Quincy version of this page available.
-
Before removing an OSD unit, we first need to ensure that the cluster is healthy:
juju ssh ceph-mon/leader sudo ceph status
-
Identify the target OSD
Check OSD tree to map OSDs to their host machines:
juju ssh ceph-mon/leader sudo ceph osd tree
Sample output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.09357 root default -5 0.03119 host finer-shrew 2 hdd 0.03119 osd.2 up 1.00000 1.00000 ...
Assuming that we want to remove
osd.2
. As shown in the output, it is hosted on the machinefiner-shrew
.Check which unit is deployed on this machine:
juju status
Sample output:
... Unit Workload Agent Machine Public address Ports Message ... ceph-osd/1* blocked idle 1 192.168.122.48 No block devices detected using current configuration ... Machine State DNS Inst id Series AZ Message ... 1 started 192.168.122.48 finer-shrew focal default Deployed ...
In this case,
ceph-osd/1
is the unit we want to remove.Therefore, the target OSD can be identified by the following properties:
OSD_UNIT=ceph-osd/1 OSD=osd.2 OSD_ID=2
-
Take the target OSD out of the cluster and check cluster health again:
juju run-action --wait $OSD_UNIT osd-out osds=$OSD_ID juju ssh ceph-mon/leader sudo ceph status
Note: Generally, taken an OSD
out
of the cluster will trigger weight rebalancing that migrates placement groups (PGs) out of the target OSD. However, special cases where some PGs stuck inactive+remapped
state may occur. When encountering such a problem, you can rejoin the OSD back to the cluster and manually reweight it to 0:juju run-action --wait $OSD_UNIT osd-in osds=$OSD_ID juju run-action --wait ceph-mon/leader change-osd-weight osd=$OSD_ID weight=0
After the reweighting, the PGs will be migrated to other available OSDs. Hence, you can safely proceed to the next step without marking the target OSD
out
. For more information, please refer to Ceph documentation. -
Before stopping and destroying the target OSD, we need to make sure it is safe to do so:
juju run --unit ceph-mon/leader ceph osd ok-to-stop $OSD_ID juju run --unit ceph-mon/leader ceph osd safe-to-destroy $OSD_ID
You should only proceed to the next steps if both checks are passed.
-
Stop the OSD daemon.
juju run-action --wait $OSD_UNIT stop osds=$OSD_ID
-
Confirm that the target OSD is down:
juju ssh ceph-mon/leader sudo ceph osd tree down
Sample output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.06238 root default -5 0 host finer-shrew 2 hdd 0 osd.2 down 0 1.00000
-
Purge the OSD:
juju run-action --wait ceph-mon/leader purge-osd osd=$OSD_ID i-really-mean-it=yes
This action removes the OSD from the cluster map and OSD map. It also removes its authentication key.
-
(Optional) If the unit hosting the target OSD does not have other active OSDs attached and you would like to delete it, you can do so by running:
juju remove-unit $OSD_UNIT
Note: This step should be taken with extra caution. If there are active OSDs on the unit, removing it will produce unexpected errors.
-
Ensure the cluster is in healthy state after being scaled down:
juju ssh ceph-mon/leader sudo ceph status