Erasure coded pools

pmatulis · October 15, 2020, 2:32am

Erasure coded pools use a technique that allows for the same resiliency as replicated pools, yet require less space to achieve (i.e. storage overhead is reduced). An object is split into data “chunks”, and parity “chunks” are then generated. All chunks are distributed amongst a user-configurable number of OSDs.

Note:

Erasure coded pools require more memory and CPU cycles than replicated pools do. They are also only supported starting with Ceph Luminous.

Erasure coding is enabled by modifying a charm’s pool type. This only affects the pool(s) associated with that charm:

pool-type: erasure-coded

The current list of Juju charms that support erasure coded pools is given here:

ceph-fs
ceph-radosgw
cinder-ceph
glance
nova-compute

For these charms there are many configuration options pertinent to erasure coding, but the two that are most widely used are:

ec-profile-k: 1
ec-profile-m: 2

The values given above are the defaults.

Note:

The full list of options related to erasure coding are covered in the Ceph erasure coding section of the OpenStack Charms Deployment Guide.

Option ec-profile-k gives the number of equal-sized data chunks an object is split into. Option ec-profile-m gives the number of parity chunks that are generated. Based on the recovery algorithm, ec-profile-m also ends up being the number of failed OSDs that a cluster can sustain without incurring data loss. The sum of ec-profile-k and ec-profile-m is the number of OSDs that an object will be distributed amongst.

Important:

The sum of ec-profile-k and ec-profile-m must never surpass the number of OSDs in the cluster.

Unlike the ‘replicated’ pool type, to use erasure coded pools some configuration and judgement is required.

Consider the following essential options and their respective default values:

pool-type: erasure-coded
ec-profile-k: 1
ec-profile-m: 2

A cluster thus configured will effectively distribute an object amongst three OSDs and be able to sustain the loss of two of them. The original object is not split (k=1) so the parity chunks are simply distributed to the two other OSDs (m=2). The space required is 300% of the original data (i.e. a storage overhead of 200%). This is the storage overhead of a replicated pool with three OSDs, but such a cluster would also acquire a processing overhead in terms of CPU, memory, and network traffic.

Note:

All data and parity chunks for any given cluster are of the same size. Data chunks can be used to reconstruct an object and parity chunks can be used to rebuild a lost data chunk.

Now consider the following:

pool-type: erasure-coded
ec-profile-k: 10
ec-profile-m: 4

A cluster thus configured will effectively distribute an object amongst 14 OSDs and be able to sustain the loss of four of them, and the storage overhead is just 40%.

The challenge is to strike a balance between these factors:

number of available OSDs
number of OSD failures without data loss (larger is good)
storage overhead (smaller is good)

The given charm’s documentation will provide more detailed information on how erasure coded Ceph pools pertain to it. Also see Ceph erasure coding in the OpenStack Charms Deployment Guide for in-depth information on how erasure coding applies to charms in general.