Microcluster/Microcloud Quorum Recovery

Project Microcluster/Microcloud
Status Drafting
Author(s) @whershberger
Approver(s) @tomp @maria-seralessandri @masnax
Release LTS
Internal ID LX080

Abstract

This specification describes a mechanism to perform cluster recovery in the
event of dqlite quorum loss for Microcluster and Microcluster-based projects:

  • A recovery API to be provided by microcluster
  • The recovery CLI to be used in Microcloud

Rationale

The dqlite database provided by Microcluster ensures fault-tolerance and high-availability using Raft. The protocol requires a quorum of cluster members (a majority of voters) in order to read/write to the database. In the event of a catastrophic failure of a majority of cluster voters, the database on any remaining members will become unavailable. A mechanism is needed in order to recover database functionality on any remaining cluster members.

This mechanism also allows for IP address changes for any/all nodes in the cluster.

Constraints

Per canonical/lxd#13524, the cluster recovery process must be performed on exactly one member, after which the dqlite database directory must be copied from the recovered member to all other surviving members. This is because dqlite’s recovery process forces consensus by cricumventing the usual Raft mechanism; the DB copy is nessesary to ensure consistency between the remaning cluster members.

dqlite also requires that all databases are shut down during recovery. While it may be possible to ensure this with microcluster still running, it would require exposing additional API endpoints over the network to coordinate/control the database state.

Microcluster Specification

The following public structures/methods will be created:

type LocalMember struct {
	DqliteID uint64
	Address  string
	Role     string

	Name string
}
// microcluster/app.go

func (m *MicroCluster) GetLocalClusterMembers() ([]LocalMember, error)

// RecoverFromQuorumLoss can be used to recover database access when a quorum of
// members is lost and cannot be recovered (e.g. hardware failure).
// This function requires that:
//   - All cluster members' databases are not running
//   - The current member has the most up-to-date raft log (usually the member
//     which was most recently the leader)
//
// RecoverFromQuorumLoss will take a database backup before attempting the
// recovery operation.
//
// RecoverFromQuorumLoss should be invoked _exactly once_ for the entire cluster.
// This function creates a gz-compressed tarball
// path.Join(m.FileSystem.StateDir, "recovery_db.tar.gz"). This tarball should
// be manually copied by the user to the state dir of all other cluster members.
//
// On start, Microcluster will automatically check for & load the recovery
// tarball. A database backup will be taken before the load.
func (m *MicroCluster) RecoverFromQuorumLoss(members []LocalMember) error

Upgrade Handling

Microcloud Specification

API changes

None

CLI changes

The following commands will be added to microcloud:

  • microcloud cluster edit
$ microcloud cluster edit
You should only run this command if:
 - A quorum of cluster members is permanently lost
 - You are *absolutely* sure all microcloud instances are stopped (sudo snap stop microcloud)
 - This instance has the most up to date database

Do you want to proceed? (yes/no): 

Cluster info is opened in the editor:

# Member roles and addresses can be modified. Unrecoverable nodes should be
# given the role `spare`.
#
# `voter` - Voting member of the database. A majority of voters is a quorum.
# `stand-by` - Non-voting member of the database; can be promoted to voter.
# `spare` - Not a member of the database.
#
# The edit is aborted if:
# - the number of members changes
# - the name of any member changes
# - the id of any member changes
# - the address of any member changes
# - no changes are made
members:
- name: c1
  role: voter
  address: 192.0.2.101:9443
  id: 5908841199928984794
- name: c2
  role: voter
  address: 192.0.2.102:9443
  id: 14814744052722818096
- name: c3
  role: voter
  address: 192.0.2.103:9443
  id: 7765948834043589852

If no changes are made:

Cluster edit aborted; no changes made

If changes were made:

Cluster changes applied; new database state saved to /var/snap/microcloud/common/recovery_db.tar.gz

*Before* starting any cluster member, copy /var/snap/microcloud/common/recovery_db.tar.gz to /var/snap/microcloud/common/recovery_db.tar.xz on all remaining cluster members.

Microcloud will load this file during startup.

Database changes

None

Upgrade handling

None

Future work

  1. It would be ideal to expose a CLI interface to easily determine the latest dqlite segment (as is done in LXD). “This instance has the most up to date database” and “most recently the leader” are a bit hand-wavy for my taste. Such a change should be trivial.
  2. Implement the ability to change member addresses during recovery. This requires:
    a. An update of the trust store (partially implemented)
    b. An update of daemon.yaml
    c. An update of the internal_cluster_members table after the database is accessible (a la patches in LXD?)
    d. An update of dqlite yaml files info.yaml and cluster.yaml?

Updated the spec per my discussion with Max today:

  • Add dqlite node IDs to the trust store
  • Remove microcloud cluster import and automatically load the recovery db from the state dir on startup
  • Take automatic database backups before executing dqlite recovery and before importing the recovery db

I still need to confirm if snap confinement will allow microcloud to write to /var/snap/microcloud/common, as Max indicated that it would not.

Updated the spec following implementation:

  • Do not add dqlite IDs to the trust store
  • Support changing member addresses under “Future Work” instead of integrated into the spec.
  • Replace xz with gz, as gzip is available in the golang stdlib