Secure cluster join in MicroCloud

Project MicroCloud
Status Drafting
Author(s) @jpelizaeus
Approver(s) @tomp @maria-seralessandri
Release 1.x / 2.x
Internal ID LX073

Abstract

Allow all members of a MicroCloud to securely join the cluster. Grant both the joining member and cluster the possibility to verify its peer
and do not transfer critical information like API secrets across the network.

Rationale

In its current version MicroCloud offers a convenient approach to bootstrap clusters built using LXD, MicroCeph and MicroOVN. By controlling the entire process, MicroCloud can generate join tokens for each of the services and distribute them across the cluster members. This allows forming additional clusters for each of the services without additional manual intervention.

As MicroCloud itself offers many additional settings to configure its behavior, additional support for a preseed configuration file has been added. This allows and administrator to skip the setup dialog and apply the configuration for an entire MicroCloud in one step.

Having this one step configuration mechanism requires MicroCloud to make decisions on the administrators behalf. One of them is to accept additional cluster members that have been selected or configured (using preseed) by the administrator. As there is no additional step involved to ensure the integrity of either one of the joining peers, this might lead to security risks as currently the network is considered to be a trusted party.

Specification

This specification supersedes the already existing cluster join mechanism with proactive tasks that have to be executed on each cluster member individually to ensure integrity.
Furthermore the necessary modifications on MicroCluster are mapped out to separate the changes specifically required for MicroCloud from the ones that can be added to MicroCluster directly. This helps downstream importers of MicroCluster to also make use of the updated join mechanism (e.g. MicroCeph, MicroOVN and MicroK8s).

Existing Mechanism

In the latest release of MicroCloud the cluster join mechanism is largely dependent on mDNS in order to discover its peers, share relevant connection details and to bootstrap the final cluster. Therefore on each node of the cluster a microcloudd daemon is running that both broadcasts its own set of details onto the network but also receives details sent by others in the same network.
This ultimately allows to construct a picture of the available resources so that MicroCloud can offer the administrator a straightforward question and answer input dialog:

MicroCloud mDNS

The details broadcasted on the network by each of the daemons consist of the following:

  • The current version of the mDNS broadcast/lookup format (currently 1.0)
  • The hostname of the node
  • The address of the node’s MicroCloud API endpoint
  • The nodes network interface over which the broadcast was sent
  • A list of services (MicroCloud, LXD, MicroCeph and MicroOVN) present on this node
  • An authentication secret that can be used to access this nodes API endpoint using the X-MicroCloud-Auth header

Using those information the initial microcloudd requests further details on network interfaces and storage disks from the local and peer LXD servers for later selection by the administrator.

Discovery

At the beginning MicroCloud requires three independent microcloudd daemons to be running on a shared network. As a node running microcloudd could potentially have more than a single network interface, each microcloudd broadcasts its details on each of the network interfaces which are available on the underlying node that have an IP address configured.
An administrator will then pick any of the microcloudd daemons to be the initial one that will be used to bootstrap the MircoCloud. It’s the responsibility of this microcloudd to listen only to broadcasts on the network(s) selected by the administrator:

Select an address for MicroCloud's internal traffic:
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +----------------------------------------+--------+
       |                ADDRESS                 | IFACE  |
       +----------------------------------------+--------+
> [x]  | 10.237.170.93                          | enp5s0 |
  [ ]  | fd42:e287:8e5c:b221:216:3eff:fe6f:45cb | enp5s0 |
       +----------------------------------------+--------+
...
Limit search for other MicroCloud servers to 10.237.170.93/24? (yes/no) [default=yes]:

After selecting the network(s) on which MicroCloud should discover any potential peers, MicroCloud prompts the user with a list of peers that have been discovered on the respective network interface:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+----------------+
       | NAME | IFACE  |      ADDR      |
       +------+--------+----------------+
> [x]  | m3   | enp5s0 | 10.237.170.140 |
  [x]  | m2   | enp5s0 | 10.237.170.61  |
       +------+--------+----------------+

After this the administrator is guided through multiple dialogs allowing further configuration of the final MicroCloud in regards to storage and networking.

As a last step MicroCloud instructs its peers to form a cluster for each of the services (MicroCloud, LXD, MicroCeph and MicroOVN). In order to allow access from the initial MicroCloud node to every other node in the cluster, each node has broadcasted an API secret that is now used to invoke RPC requests on the cluster’s nodes by setting the X-MicroCloud-Auth header.
As part of those requests MicroCloud initiates the cluster forming process for the various services.

Cluster forming

MicroCloud is using MicroCluster under the hood to form a cluster of nodes. The mechanism on the MicroCluster side currently relies on the fact that the join token is considered to be a secret. In addition a node joining using this secret is automatically trusted to be the one for which the secret has been created.
The joining node however is checking if the fingerprint of the clusters certificate (embedded into the token) is matching the one returned by the cluster API when issuing the join request:

MicroCluster forming

The response of the join request contains the cluster’s certificate and key and a list of certificates from all the nodes who have already joined the cluster. This list is used to extend the truststore of the newly joined node.
The nodes own certificate is added automatically to the truststore.

After adding the other peer’s certificates to the nodes truststore, it will start its API and join the already existing dqlite cluster
using mutual TLS with the certificates that have been obtained during the join process.

Updated Mechanism

As the current mechanism fully trusts the local LAN, every microcloudd broadcasts an authentication secret and trusts the broadcasts received from other peers in the network.
This can lead to man in the middle (MITM) attacks as the broadcasts itself aren’t protected and could potentially be read and modified by somebody sitting in the same network.

By removing the authentication secret from the broadcast message, the initial microcloudd cannot anymore talk to its peers as there isn’t anymore any trust relationship. This breaks the disk and network discovery as well as the final cluster forming as they are currently making use of this secret.

Instead this communication could already make use of the mutual TLS that currently gets established during the final cluster forming. By moving the exchange of trust right after the discovery of peers and extending it with a proactive human verification option, it can be ensured
that the nodes in the cluster are the ones they pretend to be over mDNS.
Instead of forming the MicroCloud’s MicroCluster at the end to establish the base for mTLS, a new temporary trust store is build up during the cluster forming. This store can be used for any follow up tasks to discover the required information and to finally form the clusters for MicroCloud, LXD, MicroCeph and MicroOVN.

To allow exchanging the public keys for the temporary trust store, the already existing mDNS protocol is extended with a HMAC to prevent broadcasting a secret across the network but still allow the sender and receiver to validate and trust the received payloads.

Discovery option A (reduced mDNS)

Discovery option A (reduced mDNS)

MicroCluster discovery option A

Every microcloudd broadcasts a reduced list of details on the network including its own certificate’s fingerprint:

type ServerInfo struct {
    // The current version of the mDNS broadcast/lookup format
    Version     string
    // The hostname of the node
    Name        string
    // The address of the node's MicroCloud API endpoint
    Address     string
    // The interface used on the sending side
    Interface   string
    // A list of services (e.g. LXD, MicroCeph, MicroOVN) present on this node
    Services    []types.ServiceType
    // The node's certificate fingerprint
    Fingerprint string
}

The initial questions to pick a network for discovery are kept as is:

Select an address for MicroCloud's internal traffic:
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +----------------------------------------+--------+
       |                ADDRESS                 | IFACE  |
       +----------------------------------------+--------+
> [x]  | 10.60.35.134                           | enp5s0 |
  [ ]  | fd42:8a02:2309:cea5:216:3eff:fe3e:e5c0 | enp5s0 |
       +----------------------------------------+--------+
...
Limit search for other MicroCloud servers to 10.60.35.134/24? (yes/no) [default=yes]:

The following dialog gets updated as microcloudd now has to create join tokens for each of the peers that have been discovered. As such tokens should only be created if the peer is a legitimate one, the administrator can now check the provided fingerprints for validity and compare those against the actual certificate of the respective peer:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+--------------+-------------+
       | NAME | IFACE  |     ADDR     | Fingerprint |
       +------+--------+--------------+-------------+
> [x]  | m3   | enp5s0 | 10.60.35.113 | aabbccdd... |
  [ ]  | m2   | enp5s0 | 10.60.35.242 | eeff0011... |
       +------+--------+--------------+-------------+

Based on the selection microcloudd creates a join token for each of them.

To allow joining the peer interactively, the dialog gets updated with the join command the administrator has to run on the peer:

The join token for m3 (aabbccdd...) has been created.
Run the following command on m3 to join this MicroCloud:

microcloud join aabbccddeeff001122334455...

As the initial microcloudd cannot anymore run the RPC request using the authentication secret, the administrator now has to enter the peer and run the new microcloud join command:

microcloud join [token]

The token itself contains the address of the initial microcloudd’s API endpoint to initiate the join process. As there isn’t yet any trust relation from the joining peer to the initial microcloudd, we cannot validate the certificate of the remote side against the certificates in the peer’s truststore. Since the token contains the fingerprint of the originating microcloudd, the joining peer can compare the fingerprints to make sure it doesn’t join a malicious MicroCloud.

On the initial microcloudd the administrator could get prompted with an updated dialog that now requires approving the join request from the peer:

Allow m3 (aabbccdd...) to join the MicroCloud? (yes/no) [default=yes]:

As checking the fingerprint could have already been done as part of selecting the peer after the mDNS discovery, this step should be considered optional as it doesn’t really provide any additional security.

If the peer is accepted, the initial microcloudd will join this node into the underlying MicroCluster to establish mutual TLS. Using this trust the microcloudd can request further information from the peer and initiate the forming of the other clusters used for LXD, MicroCeph and MicroOVN.

Afterwards the dialog again gets updated to the initial screen displaying the remaining peers to repeat the process:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+--------------+-------------+
       | NAME | IFACE  |     ADDR     | Fingerprint |
       +------+--------+--------------+-------------+
> [x]  | m2   | enp5s0 | 10.60.35.242 | eeff0011... |
       +------+--------+--------------+-------------+

When adding a new peer later the exact same process is repeated.

Discovery option B (without mDNS)

Discovery option B (without mDNS)

MicroCluster discovery option B

This option is working completely without mDNS as the decision which peer should join the MicroCloud is fully up to the administrator.
Instead the administrator is asked to provide a general join token. This token contains the address of the initial microcloudd alsongside the secret. This ensures that an administrator doesn’t have to enter the address of the already existing cluster when joining a peer.

By default if no secret is provided the initial microcloudd will create a randome generated secret that can be used in the
subsequent steps to join all of the peers. There isn’t a specific secret for each peer:

MicroCloud generates a secure join secret by default.
Would you like to use the default join secret: (yes/no) [default=yes]: no

If the administrator selects no, the next question will ask to enter the join secret:

Specify a general join secret:

The next dialog will then show the same join command as for option A. Keep note that this is a general join command that can be reused various times for each of the peers that the administrator wants to add to the MicroCloud:

The join token has been created.
Run the following command on all the nodes to join this MicroCloud:

microcloud join aabbccddeeff001122334455...

Press any key to continue...

As in option A the administrator now has to enter the peer(s) and run the new microcloud join command:

microcloud join [token]

The command blocks until the request got approved on the cluster side.

After pressing any key on the initial node the dialog gets updated with a table listing all the pending cluster join requests so that the administrator can allow access for each of them. The table updates regularly to represent the latest list of join requests:

Scanning join requests ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------------+-------------+
       | NAME |     ADDR     | Fingerprint |
       +------+--------------+-------------+
> [x]  | m1   | 10.60.35.241 | aabbccdd... |
  [ ]  | m2   | 10.60.35.242 | eeff0011... |
  [x]  | m3   | 10.60.35.243 | 22334455... |
       +------+--------------+-------------+

Based on the provided fingerprint the administrator can compare if the peer is the actual one that he wants to join into the cluster.
Another approach would be to use the same dialog updates as in option A prompting for each new peer as soon as the join request comes in on the initial microcloudd.

Discovery option B2 (session level join token)

Discovery option B2 (session level join token)

As option B this one is working completely without requiring mDNS. The essential difference compared to option B is the introduction of a session in which the join token is valid. This allows keeping the secret in memory until the point where the cluster forming is complete after which the secret gets discarded on all ends.

A new session is started by the initial microcloudd after running the microcloud init command. As in option B by default a random secret will be generated but the administrator can also provide a custom one.
The joining side learns about the secret after initiating the join process using microcloud join <token>. The command blocks until the request got approved and the local microcloudd will discard the secret after having successfully joined the cluster.
If there aren’t any more peers that the administrator wants to add into the MicroCloud, the initial microcloudd discards the secret too and continues on with the remainder of the configuration.

When adding additional peers after the initial cluster forming, the microcloud add command runs through the same secret dialog again asking the administrator to either provide a new secret manually or to use a random auto generated one.
The process on the peer is the same as described above.

Discovery option C (HMAC and mDNS)

Discovery option C (HMAC and mDNS)

MicroCluster discovery option C

In this option mDNS is kept and the message which gets broadcasted is reduced by the secret and extended with a hash-based message authentication code called HMAC. This MAC allows the receiving side to validate the broadcast message as it can compute the same MAC using the given message and a shared secret that is only known to trusted peers of MicroCloud:

type ServerInfoHMAC struct {
    // MAC generated from HMAC(secret, Info)
    HMAC string
    // The existing ServerInfo struct
    Info ServerInfo
}

type ServerInfo struct {
    // The current version of the mDNS broadcast/lookup format
    Version     string
    // The hostname of the node
    Name        string
    // The address of the node's MicroCloud API endpoint
    Address     string
    // The interface used on the sending side
    Interface   string
    // A list of services (e.g. LXD, MicroCeph, MicroOVN) present on this node
    Services    []types.ServiceType
    // The node's certificate fingerprint
    Fingerprint string
}

In order to distribute such a secret every microcloudd has to be made aware of this shared secret. Therefore another command microcloud config set core.secret xyz is added which has to be executed on each of the peers that will form the MicroCloud cluster.

Using this secret both sides can validate each other as messages exchanged between them always contain the MAC for validation on the receiving side. This is a direct replacement for the X-MicroCloud-Auth header as the initial microcloudd can set the MAC on the Authorization header when trying to run RPC requests on the peers to collect network and disk details and to form the other clusters.

The question and answer dialog is kept as is. As in option A the fingerprints of the discovered peers are shown to allow comparing those with the ones on the actual peers.

On the initial microcloudd the administrator could get prompted with an updated dialog that now requires approving the join request from the peer. This is equivalent to option A and optional as it wouldn’t really improve the overall security.

When forming all the other service’s MicroClusters, the requests can be allowed automatically. For this MicroCloud can check if the join request is originating from a host that we have already manually validated when forming the MicroCloud’s MicroCluster.

Discovery option C2 (session level join token)

Discovery option C2 (session level join token)

As option C this one is keeping mDNS in order to learn about potential peers and to exchange the intial set of information. The essential difference compared to option C is the introduction of a session in which the HMAC secret is valid. This allows keeping the secret in memory until the point where the cluster forming is complete after which the secret gets discarded on all ends.

For option C2 there isn’t any need for a configuration system that allows setting and unsetting of a secret in MicroCloud. Instead the same question and answer dialog as in B2 can be used to either use an auto generated random secret or let the administrator specify a different one.
On the joining side the session can be started by running the microcloud join <token> command which will set the HMAC secret for its broadcasts.
The command blocks until the initial microcloudd instructs this peer to join into the existing cluster.
Starting from this point onwards the secret is discarded on the joining side.
If there aren’t any more peers that the administrator wants to add into the MicroCloud, the initial microcloudd discards the secret too and continues on with the remainder of the configuration.

Adding additional peers is equal to option B2.

Discovery option C3 (KDF, HMAC and mDNS)

:star2: This option is currently considered to be the most preferred one.

Discovery option C3

As all the other options in the C* group, option C3 is relying on HMAC to sign the messages exchanged between the initial microcloudd and potential peers so that each side can verify any received contents. For added security a key derivation function (KDF) is used together with a salt that allows having a stronger secret when computing the message’s HMAC.

The overall idea is to allow establishing a mTLS connection as soon as possible so that both ends can talk via an encrypted channel to exchange further information. Therefore the discovery ensures that both sides exchange their own public key rather quick so that if a new HTTPS connection gets opened up from one end to the other, we can rely on TLS to perform a proper exchange and to setup a secure session.

In case of preseed the steps that require human interaction are skipped and only the HMAC comparison is performed on both ends. Additionally there is no random password generated on the initial microcloudd. Instead the password has to be generated by the administrator and injected accordingly when running either microcloud init or microcloud join.

The initial public key exchange is what is depicted in the next six steps which are also shown in the graphic above.

Startup

An administrator starts the cluster forming process by reading a randomly generated password displayed by the initial microcloudd and setting it on any of the potential candidates. The password itself is a concatentation of strings which have been selected randomly from a given word list (e.g. EFF wordlist for random passphrases). There are various approaches for the word lists. One of them might be picking a list that only contains words which have a unique three-character prefix (see example list) so that an administrator only has to type in the first three characters of each word and the remainder of the characters can be guessed using auto completion.

The length of the password is based on the number of words selected from the words list. It takes around n^k/2 guesses to crack the password where n is the length of the overall word list and k the number of words choosen from the list. Picking between 4-6 words from a list with a length of around 5000 should be sufficient. The password is displayed on the initial microcloudd and has to be typed in on any other microcloudd that should join the cluster.

Using this password the candidate microcloudd can derive a key using a random salt with an appropriate length. We have chosen argon2, but other KDFs (HKDF, scrypt) might work too:

// Nonce S, which is a salt for password hashing applications.
// May have any length from 8 to 2^(32)-1 bytes.
// 16 bytes is recommended for password hashing.
// Salt must be unique for each password.
// See https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-argon2-03#section-3.1
salt := make([]byte, 16)
rand.Read(salt)

// The draft RFC recommends[2] time=1, and memory=64*1024 is a sensible number.
// If using that amount of memory (64 MB) is not possible in some contexts
// then the time parameter can be increased to compensate.
// The number of threads can be adjusted to the numbers of available CPUs.
// See https://pkg.go.dev/golang.org/x/crypto/argon2
//
// func IDKey(password, salt []byte, time, memory uint32, threads uint8, keyLen uint32) []byte
key := argon2.IDKey([]byte("some words from the password list"), salt, 1, 64*1024, 4, 32)

As each candidate microcloudd will pick another random salt, the initial microcloudd can derive the key only after receiving the first broadcast which also includes the salt. This is covered in the next section.

For the purpose of human validation, the respective local microcloudd will also print it’s fingerprint on startup either when running microcloud init or microcloud join for visual comparison on the other end.

Broadcast candidate

As both ends need to be aware of each other, the candidate microcloudd has to broadcast its intent to join an existing MicroCloud cluster. This intent (ServerInfoSigned in the next section) is sent via mDNS and the body contains at least the following information:

  • The local public key
  • The version of MicroCloud
  • The name of the node
  • The local address of the API
  • The random salt
  • The HMAC

The value of the HMAC field is created by taking the contents of ServerInfo (see the section below) and creating the MAC by using the key which was computed by the KDF.

Authenticate candidate

After receiving a broadcast from any potential candidate, the initial microcloudd first has to validate the contents of the mDNS payload. This is required to filter out candidates that don’t have the same version of MicroCloud as well as the ones that have sent a payload with an HMAC that cannot be reproduced on the receiving side. Those have to be marked with extra care as this could be the result of a MITM attack.

Using the salt and the random password that has been generated by the initial microcloudd during the startup, the exact same key can now be derived using the same KDF as on the candidate side. Now the HMAC over ServerInfo can be computed using the key and compared to the one that got sent over the wire as part of ServerInfoSigned. If the HMACs match, the administrator has the possibility to approve the join request from the candidate. Candidates with invalid versions and non matching HMAC are “greyed out” and cannot be selected. A reason for this is provided in the Note column:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+----------------+-------------+--------------+
       | NAME | IFACE  |      ADDR      | Fingerprint |     Note     |
       +------+--------+----------------+-------------+--------------+
> [x]  | m3   | enp5s0 | 10.237.170.140 | aabbccddeef |              |
       | m2   | enp5s0 | 10.237.170.61  | ff001122334 | Invalid HMAC |
       +------+--------+----------------+-------------+--------------+

After the candidate is accepted, its public key gets added to the local microcloudd’s temporary trust store (bound to its address) which allows for certificate validation of new mTLS connections during the remainder of the cluster forming. Now when opening up a new mTLS connection from the initial microcloudd to the candidate, the certificate provided from the other end has to match the one which is tracked in the local temporary trust store.

Authenticate cluster

The last step of the discovery allows the candidate to also verify that it is joining the right cluster. As the initial microcloudd already knows the address of the candidate (from the mDNS payload), a new HTTPS request is made to the API of the candidate. As the mDNS payload from before, the request’s body contains:

  • The local public key
  • The version of MicroCloud
  • The name of the node

The HMAC of the request body is sent alongside the Authentication header. In case of the mDNS broadcast the HMAC was required to be part of the message itself as there isn’t any concept of headers.

After receiving the request, the candidate now computes the HMAC itself using the key from before and the contents of the request’s body. If the HMAC doesn’t match, this might be an indication for a MITM attack. As the protocol doesn’t foresee multiple clusters contacting the same candidate, such a mismatch is ignored but an appropriate warning message is logged to the daemon’s log of the candidate. Also if the version doesn’t match, the request should be ignored too. The initial microcloudd should have never contacted the candidate in the first place if the versions don’t match.

If both the version and HMAC matches, the administrator is asked to approve the request from the cluster:

Scanning for response ...

Would you like to join m1 (fingerprint): (yes/no) [default=yes]:

After the cluster is accepted, its public key gets added to the local microcloudd’s temporary trust store (bound to its address) which allows for certificate validation of new mTLS connections during the remainder of the cluster forming. Now if the candidate receives a new mTLS connection from the initial microcloudd, it can verify the provided public key based on the entry in its local trust store.

The response of the HTTPS request indicates a successful pairing and marks the end of the discovery/authentication protocol. This also marks the end of the session and discards the random password on each end.

Cluster forming

The initial microcloudd can now use mTLS to retrieve further information from the candidate and both ends can validate the other side based on their temporary trust store entries. Furthermore join tokens are created on the initial microcloudd for each of the services (LXD, MicroCeph and MicroOVN). Those tokens are now sent through the encrypted channel in order to form each of the services MicroCluster.

Cleanup

During the cleanup stage both ends discard their temporary trust store as the service’s MicroClusters are formed and the trust is established in each MicroCluster’s own truststore.

Daemon and API changes

This section only describes the required changes for discovery option C3.

MicroCloud

A new API extension is added that indicates the change in how MicroCloud performs the discovery/authenticaton.

mDNS broadcast

The mDNS payload is extended with the following information. Check the previous sections on some more explanations:

type ServerInfoSigned struct {
    // MAC generated from HMAC(key, ServerInfo)
    HMAC string
    // The existing ServerInfo struct
    ServerInfo
}

type ServerInfo struct {
    // The current version of the mDNS broadcast/lookup format
    Version     string
    // The hostname of the node
    Name        string
    // The address of the node's MicroCloud API endpoint
    Address     string
    // The interface used on the sending side
    Interface   string
    // A list of services (e.g. LXD, MicroCeph, MicroOVN) present on this node
    Services    []types.ServiceType
    // The node's public certificate for mTLS
    Certificate string
    // The random salt
    Salt        string
}

Temporary trust store

MicroCloud will maintain a temporary trust store on both ends that gets filled up with the public key of the respective peer. This temporary trust store has to be made available to MicroCluster so that the custom API endpoints of MicroCloud can use this temporary store instead.

There is an open proposal in the MicroCluster repo (#120) which makes the authentication handler public so that an importer of MicroCluster (like MicroCloud) can inject it’s own trust store information for every custom API endpoint. A more detailed description on the specifics can be found here.

In any case the X-MicroCloud-Auth header is being removed as there isn’t anymore a secret being broadcasted.
Requests to any peers of the MicroCloud have to be made using a mTLS connection which can be trusted on both ends using the temporary trust store.
This is only possible if both ends have successfully finished the discovery/authentication protocol.

Joining existing services

As part of #259 MicroCloud grows support to reuse existing MicroCeph and MicroOVN clusters. The process behind relies on one of the microcloudd within the existing cluster being able to create a join token on the peer that allows joining into the already existing remote cluster(s).

By using discovery option C3 this concept wouldn’t be blocked as microcloudd would continue to use the same paths of communication to reuse the existing clusters.

MicroCluster

To have as little impact as possible on other active importers of MicroCluster (e.g. MicroCeph, MicroOVN), the modifications for the temporary trust store won’t affect any of the existing setups. It’s the choice of the importer to make use of this added functionality using the temporary trust store.

CLI changes

This section only describes the required changes for discovery option C3.

MicroCloud

Join command

A new command microcloud join gets added to allow a peer joining into MicroCloud.
This command is also the starting point after which the candidate’s microcloudd will start to broadcast its information via mDNS.
The command prompts the administrator to enter the random password that got displayed by the initial microcloudd.
The command blocks until the request got approved on both sides.

Session timeout

In addition a new --session-timeout flag is added to both the init and join subcommands. It allows exiting the session at time x so that both ends discard their temporary trust stores and forget about the random password. Afterwards a new discovery/authentication session has to be started by running the init and join subcommands again on both ends.
The default session timeout value is set to ten minutes.

UX

To approve the requests on both ends, the dialogs displaying the information have to be extended to allow “greying out” invalid requests and displaying the peers fingerprint as well as a notification in case a peer cannot be selected.

Preseed

In case of preseed both the microcloud init and microcloud join commands can load the random password from the preseed file that is already passed in via stdin.
As the actual preseed information only has to be set on the initial microcloudd using microcloud init, the joiner will only load the random password from the file passed via stdin.

Database changes

No database changes expected.

Packaging changes

No packaging changes expected.

Open points for discussion

  • Which word lists are preferred and how many words should we pick
    • there might also be license obligations tied to the list
  • What are the preferred input parameters for the KDF
    • For argon2 we can pick
      • length of the password
      • length of the salt
      • number of passes over the memory (time)
      • number of threads
      • key length
  • In case of preseed, do we need to use a KDF or can the administrator pick a long enough secret to be used directly for the HMAC
  • Are there other potential MITM scenarios we haven’t yet addressed in this design
  • Do we require a timeout after which the discovery/authentication exits automatically and discards any secrets, salts and the temporary trust store
    • How long should this timeout be (10 minutes?)

Great write-up, thanks for this!

I believe @sdeziel1 mentioned there might be some issues with keyboard-and-mouse systems and copying strings from one system to another becomes very difficult. I’m not sure if this is something that greatly affects current/future MicroCloud users.

The problem is, if we don’t manually input a secret string/join token at some point, then there’s no way to verify whether an mDNS payload actually came from a genuine system. Since we can’t trust the local network, then we must expect any bad actor can just listen for the payload and broadcast the same thing, and we could mistakenly trust the spoofed server instead.

All that said, I do like option C the best because it doesn’t break the flow of the initialization process, and the secret can be selected by the user so the keyboard-and-mouse case is less of a problem.

Thanks for jogging my memory. By those keyboard-and-mouse systems, I was referring to KVM (Keyboard, Video, Mouse) consoles sometimes present in server racks. Those give you console access to each of the servers in the rack but you cannot copy-n-paste between them. This means we should aim for easily (repeatedly) typed input.

(Still not done reading this spec so more feedback to come later).

To add on to what I was thinking for option C, maybe microcloud init could look a bit like this?


   Scanning for eligible servers ...
   Please enter the following on any systems you want to join the cluster.

     microcloud cluster verify adjective-noun

   Space to select; enter to confirm; type to filter results.
   Up/down to move; right to select all verified; left to select none.
          +---------+--------+---------------+------------+
          |  NAME   | IFACE  |     ADDR      |   STATUS   |
          +---------+--------+---------------+------------+
   > [x]  | micro3  | enp5s0 | 203.0.113.171 |  verified  |
     [x]  | micro4  | enp5s0 | 203.0.113.172 |  verified  |
     [ ]  | micro2  | enp5s0 | 203.0.113.170 | unverified |
     [ ]  | micro5  | enp5s0 | 203.0.113.173 | unverified |
          +---------+--------+---------------+------------+

So how this would work is like follows:

  • when all MicroCloud daemons start, they continuously listen for mDNS payloads
  • first node runs microcloud init and broadcasts that it is looking to form a cluster
  • when other nodes receive this payload, they then broadcast basic information (the same info as today, but without the X-MicroCloud-Auth secret included).
  • the first node now consumes the minimal payloads from the other systems. At the same time, it generates a human-readable secret that must be entered on every joining system before we can proceed by running microcloud cluster verify <secret>. This secret is only displayed locally on the first node.
  • when other nodes run microcloud cluster verify <secret>, they change the payload they are broadcasting to instead be hashed with the secret, and include any other sensitive information that we need to set up the Authorization request header for requests going from the first node to joiners. The second “sensitive” payload can actually be sent directly via HTTP back to the first node as well, so we don’t even need to broadcast the hashed payload over the local network.

The table of systems that the first node finds from mDNS lookup will have a column STATUS that reports the verification status of any particular node. If it is still broadcasting the raw minimal payload, it will be considered unverified. If it’s broadcasting a payload that is hashed with the secret, it will be considered verified. The table will only allow selecting verified systems.

This way, KVM systems like @sdeziel1 mentioned can easily input the verification as it is consistent and human-readable, and the user doesn’t expose any sensitive information openly over the local network. As well, the user gets immediate feedback on the first node about when it can actually proceed with the initialization.


For the preseed, you’ve mentioned preloading the certificates on each system but that is itself a form of user interaction per system so I’m not sure if it makes a difference if we just use microcloud cluster verify <secret> on each system prior to running the preseed. I’m not sure if we need two separate verification mechanisms here.


Using another API endpoint

To prevent having long running requests, the existing public POST /cluster/1.0/cluster endpoint could return right after validating the token and marking the new peer as pending.
By adding a new GET /cluster/1.0/cluster/{member} endpoint the joining side can perform regular polling until the join request got allowed by an administrator.

For a bit of background on the PENDING cluster status in microcluster, that actually means that at least some nodes in the cluster will not yet accept requests originating from the PENDING node, as they do not yet have a truststore entry, which is used for API authentication. At the moment, newly joined nodes will remain pending until the next heartbeat synchronizes the truststore across all nodes. However, due to go-dqlite issues causing extremely resource-intensive heartbeats, the heartbeats occur over very long intervals. I have been working on instantly distributing truststore entries to all nodes in the cluster when a new node joins to work around the heartbeat issue, so that might pose some problems for this approach.


So some key points that we need to address:

  • Is one “secret” per initialization process enough, or should we have a unique “secret” per joining node?

  • How long should the “secret” live? The proposal in the spec involves generating the “secret” before calling microcloud init, but if instead we automatically generate it when running microcloud init then we can control its lifecycle more thoroughly. If it’s generated by running microcloud init, then it should expire if the initialization is cancelled, but how should we inform other nodes that the initialization was cancelled, assuming they have already been verified by the user?

Thank you Julian for the spec. Option C seems to me the most viable as well. We should find a solution that doesn’t exclude automation, since for large clusters the administrator cannot add manually all nodes one by one. For option C I have a few comments:

  1. Additional information on how the secret is stored securely on the nodes with microcloud config set secret
  2. Some expiration for the secret is needed, it cannot last forever and also it is the same for all the nodes of the cluster. If in a second moment a node needs to be added to the cluster a different secret should be used and not the initial one

For Option A, I dont believe the spec as it currently stands explains how one can manually verify the cert fingerprint of a joining node matches the one expected? We would need a way for the joining node to locally display its fingerprint right?

Option B sounds similar to LXD’s long-lived trust password model, which allows setting a shared password that allows joining members to add their certificate into the cluster’s trust not.
It is not recommended anymore and is why short-lived per-member join tokens were added to avoid the leaking of the shared password presenting a security issue.

https://documentation.ubuntu.com/lxd/en/latest/authentication/#authentication-trust-pw

The difference is that the existing cluster has to verify the joiner in this case, whereas that doesn’t need to occur with LXD’s trust password model.

This is similar to option B, except that apparently this secret is long-lived whereas in option B it was only valid during the microcloud init call and wasn’t persisted to the database. Is this correct?

This option seems even more similar to LXD’s not-recommended long-lived shared join password approach, except it does still require confirmation from the existing cluster.

@maria-seralessandri agreed, but as I understand it, there are a couple of “flavours” of automation available to us.

  1. Retain the option for MicroCloud to deploy itself from pre-seed files without interaction. This may not be possible given the requirement for confirming each side of the join process.
  2. Arbitrated deployment by way of something like Juju. @sdeziel1 mentioned the other day that if MicroCloud itself cannot deploy itself in an automated manner, it may still be possible to provide automated deployments if using something like Juju which can replicate the manual verification steps required.

If I understood it right your suggestion adds the following on top of C:

  • The initial MicroCloud daemon broadcasts its intent to form a cluster. This will cause the peers to start broadcasting
  • The peers respond either with a “plain” or “hashed” broadcast depending on whether the administrator already executed the microcloud cluster verify <secret> command on the peer to set the secret
  • The initial MicroCloud marks a peer as verified (trusted) if it receives a hashed broadcast

What would be the benefit of letting the initial MicroCloud daemon tell its local network that it wants to form a cluster if an admin anyway has to go to each of the nodes to enter the secret?
The only point I can think of is letting the peer validate that it doesn’t join a malicious cluster. But this functionality we already have in MicroCluster as the token contains the clusters fingerprint which can be validated by the joining side.

I like the idea of generating the secret in MicroCloud directly so that we have more control over it. When installing the snap on each of the nodes the secret will of course be different but with this approach it only needs to be changed on the joining side and can be leaved untouched on the initial MicroCloud which reduces the amount of required steps.

That is a good point. The certificate gets created automatically in MicroCluster’s state directory but AFAIK there is currently no straightforward command to retrieve this information on the joining side.

@jpelizaeus

@masnax and I were discussing in our 1:1 the possibility of avoiding needing microcloud config set core.secret xyz and persisting the secret to disk.

And instead having microcloud init generate a per-invocation secret, that would then be used as an argument/interactive to a microcloud join <secret> command which would block until the join was completed. That joining members would only start broadcasting when microcloud join was running.

And when the the microcloud [init|join] commands end they would forget their invocation secret.

Thanks I see, so that would be option B without having a persistent token? As today the token/secret could contain the join information (address) which would make having mDNS obsolete. And the join request from the peer to the initial MicroCloud daemon could then already be a HTTPS request using the secret/HMAC for verification purposes.

What would be the benefit of letting the initial MicroCloud daemon tell its local network that it wants to form a cluster if an admin anyway has to go to each of the nodes to enter the secret?

The initial MicroCloud still needs to somehow become aware of the joining nodes, and the joining nodes need to become aware of the initial node. Otherwise we would have to manually type in addresses as part of microcloud cluster verify <secret> <init node's address>.

Today, as soon as a MicroCloud snap is installed, it begins advertising its address and services over the local network. I’m proposing making this less noisy by instead making all nodes listen by default, and only trigger the advertisement after microcloud init has been executed somewhere on the local network.

Without this, all nodes (including the initial node, because it didn’t have to be the initial node) would have to perpetually broadcast their intent anyway, or we drop mDNS entirely and the user specifies the init node’s address directly to each joiner.

By keeping mDNS, we can still maintain some determination of compatibility of all nodes when running microcloud init, before going to each node and verifying them. We can see right away which nodes can even join the cluster, or have the same services, and we don’t have to log into each one first.

I see what you mean. But if we use a secret/join token that embeds the address of the MicroCloud (like it is currently done for MicroCluster), the administrator doesn’t have to manually type it in on the joining side and the MicroCloud doesn’t need to broadcast it to potential peers.
However this collides with what @sdeziel1 wrote in regards to the KVM consoles as such a secret wouldn’t be easy to type in these environments.

In regards to the service discovery, we can potentially perform this as part of the initial join request from the peer to the MicroCloud by extending the request body.

@tomp @masnax I have extended the spec with options B2 and C2 to address your feedback on using short lived secrets within a so called session so that we don’t have to persist anything to disk.

If we consider the LAN to be untrusted/hostile, we then need a solution that is resistant to MiTM. This problem space has years of research and many failed attempts along the way so I think we should use something tried and true. Here are some existing solutions I’m aware of:

The Bluetooth one seems particularly attractive to secure an unprotected mDNS conversation doing all the heavy lifting of copying certs and keys around.

So I hope I’ve got the flow correct here:

B2 interactive setup:

  • First node runs microcloud init

    • A token including the secret and the address of the init node is generated

    • The init node waits for joiners to contact it. The user chooses when to continue through the setup.

  • All nodes must run microcloud join <token>

    • The joiner reaches out to the init node over HTTPS, with info about joining the cluster.
  • Through the setup, until the nodes are clustered, they are trusted by the init node using the token.

Potential Issues with B2

  • Cumbersome to KVM setup due to having to copy the encoded secret to each joiner.

  • If microcloud init is aborted, the joiners will continue to perpetually trust the invalid token. But if the tokens have an expiry, they can expire before the user has finished the interactive setup. We would need to poll the init node from the joiners regularly.

  • We need some way to handle mismatch of installed services on each node. Currently, MicroCloud’s mDNS record will filter out any nodes that don’t have the same set of of services, or offer to the user if they want to skip that service. This would have to be implemented on each joiner instead when we call microcloud join <token>. Or we would have to be stricter about what service combinations are required.

C2 interactive setup:

  • The first node runs microcloud init

    • It broadcasts its intent to form a cluster over mDNS.

    • A plaintext, human readable password is generated.

    • The init node begins looking up eligible systems over mDNS and displays them to the user.

  • The joiners enter microcloud join <password>

    • Each joiner generates an HMAC payload encoded with the password, and broadcasts it over mDNS.
  • The init node receives the mDNS payloads, and decodes them with the password.

    • The user makes a final confirmation of which nodes they want in the cluster
  • Through the setup, until the nodes are clustered, they are trusted by the init node using the password.

Potential issues with C2

  • C2 actually handles the issues from B2 rather well:

    • The passwords are human-readable so KVMs have a viable solution.

    • Because the init node is broadcasting its intent to form a cluster, the password will only be trusted as long as the broadcast is ongoing.

    • Because we have some minimal information from each node prior to running microcloud join <password>, it’s easier to spot issues and config mismatches before logging into every joiner.

  • Biggest issue I see is that it includes mDNS so it’s a more complex system than B2

Preseed

  • In both B2 and C2, we would have to make some compromises for preseed authentication:
    • In the case of B2, it’s not enough for the user to specify the secret because the init node’s address and available services must also be encoded. This means the user will have to run microcloud init --preseed first, which will then print out the token that the user must use in microcloud cluster join <token> on each joiner.

    • For C2, we could either do the same as above, or the user can supply their own password directly in the preseed file.

The EFF publishes a few lists of words that are easy and quick to memorize/type/autocomplete. Those are meant to be use in Diceware type of passwords but I think would make a good basis for the “authenticated exchange” PSK validation we intend on doing.

https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases

The short word list sounds interesting and should be easily embedded into the daemon.

In Bluetooth SSP those verification numbers are actually derived from a hash that gets computed on both ends based on information that gets exchanged during the pairing.

In case of “Numeric Comparison protocol” (check section 7.2.1 of https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-54/out/en/br-edr-controller/security-specification.html#UUID-045cba38-3e1c-51b9-a02f-75356c6829c1), the numeric verification number is created by taking the last 32 bit from the hash function g(...)'s output (see section 7.7.2 in the same link) and dividing it by 10⁶ to always get six numbers.

I like the idea but it’s not an arbitrary string. Instead it’s based upon information provided by both ends with enough randomness.

1 Like