Secure cluster join in MicroCloud

Project LXD
Status Drafting
Author(s) @jpelizaeus
Approver(s)
Release
Internal ID LX073

Abstract

Allow all members of a MicroCloud to securely join the cluster. Grant both the joining member and cluster the possibility to verify its peer
and do not transfer critical information like API secrets across the network.

Rationale

In its current version MicroCloud offers a convenient approach to bootstrap clusters built using LXD, MicroCeph and MicroOVN. By controlling the entire process, MicroCloud can generate join tokens for each of the services and distribute them across the cluster members. This allows forming additional clusters for each of the services without additional manual intervention.

As MicroCloud itself offers many additional settings to configure its behavior, additional support for a preseed configuration file has been added. This allows and administrator to skip the setup dialog and apply the configuration for an entire MicroCloud in one step.

Having this one step configuration mechanism requires MicroCloud to make decisions on the administrators behalf. One of them is to accept additional cluster members that have been selected or configured (using preseed) by the administrator. As there is no additional step involved to ensure the integrity of either one of the joining peers, this might lead to security risks as currently the network is considered to be a trusted party.

Specification

This specification supersedes the already existing cluster join mechanism with proactive tasks that have to be executed on each cluster member individually to ensure integrity.
Furthermore the necessary modifications on MicroCluster are mapped out to separate the changes specifically required for MicroCloud from the ones that can be added to MicroCluster directly. This helps downstream importers of MicroCluster to also make use of the updated join mechanism (e.g. MicroCeph, MicroOVN and MicroK8s).

Existing Mechanism

In the latest release of MicroCloud the cluster join mechanism is largely dependent on mDNS in order to discover its peers, share relevant connection details and to bootstrap the final cluster. Therefore on each node of the cluster a microcloudd daemon is running that both broadcasts its own set of details onto the network but also receives details sent by others in the same network.
This ultimately allows to construct a picture of the available resources so that MicroCloud can offer the administrator a straightforward question and answer input dialog:

MicroCloud mDNS

The details broadcasted on the network by each of the daemons consist of the following:

  • The current version of the mDNS broadcast/lookup format (currently 1.0)
  • The hostname of the node
  • The address of the node’s MicroCloud API endpoint
  • The nodes network interface over which the broadcast was sent
  • A list of services (MicroCloud, LXD, MicroCeph and MicroOVN) present on this node
  • An authentication secret that can be used to access this nodes API endpoint using the X-MicroCloud-Auth header

Using those information the initial microcloudd requests further details on network interfaces and storage disks from the local and peer LXD servers for later selection by the administrator.

Discovery

At the beginning MicroCloud requires three independent microcloudd daemons to be running on a shared network. As a node running microcloudd could potentially have more than a single network interface, each microcloudd broadcasts its details on each of the network interfaces which are available on the underlying node that have an IP address configured.
An administrator will then pick any of the microcloudd daemons to be the initial one that will be used to bootstrap the MircoCloud. It’s the responsibility of this microcloudd to listen only to broadcasts on the network(s) selected by the administrator:

Select an address for MicroCloud's internal traffic:
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +----------------------------------------+--------+
       |                ADDRESS                 | IFACE  |
       +----------------------------------------+--------+
> [x]  | 10.237.170.93                          | enp5s0 |
  [ ]  | fd42:e287:8e5c:b221:216:3eff:fe6f:45cb | enp5s0 |
       +----------------------------------------+--------+
...
Limit search for other MicroCloud servers to 10.237.170.93/24? (yes/no) [default=yes]:

After selecting the network(s) on which MicroCloud should discover any potential peers, MicroCloud prompts the user with a list of peers that have been discovered on the respective network interface:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+----------------+
       | NAME | IFACE  |      ADDR      |
       +------+--------+----------------+
> [x]  | m3   | enp5s0 | 10.237.170.140 |
  [x]  | m2   | enp5s0 | 10.237.170.61  |
       +------+--------+----------------+

After this the administrator is guided through multiple dialogs allowing further configuration of the final MicroCloud in regards to storage and networking.

As a last step MicroCloud instructs its peers to form a cluster for each of the services (MicroCloud, LXD, MicroCeph and MicroOVN). In order to allow access from the initial MicroCloud node to every other node in the cluster, each node has broadcasted an API secret that is now used to invoke RPC requests on the cluster’s nodes by setting the X-MicroCloud-Auth header.
As part of those requests MicroCloud initiates the cluster forming process for the various services.

Cluster forming

MicroCloud is using MicroCluster under the hood to form a cluster of nodes. The mechanism on the MicroCluster side currently relies on the fact that the join token is considered to be a secret. In addition a node joining using this secret is automatically trusted to be the one for which the secret has been created.
The joining node however is checking if the fingerprint of the clusters certificate (embedded into the token) is matching the one returned by the cluster API when issuing the join request:

MicroCluster forming

The response of the join request contains the cluster’s certificate and key and a list of certificates from all the nodes who have already joined the cluster. This list is used to extend the truststore of the newly joined node.
The nodes own certificate is added automatically to the truststore.

After adding the other peer’s certificates to the nodes truststore, it will start its API and join the already existing dqlite cluster
using mutual TLS with the certificates that have been obtained during the join process.

Updated Mechanism

As the current mechanism fully trusts the local LAN, every microcloudd broadcasts an authentication secret and trusts the broadcasts received from other peers in the network.
This can lead to man in the middle attacks as the broadcasts itself aren’t protected and could potentially be read and modified by somebody sitting in the same network.

By removing the authentication secret from the broadcast message, the microcloudd cannot anymore talk to its peers as there isn’t anymore any trust relationship. This breaks the disk and network discovery as well as the final cluster forming as they are currently making use of this secret.

Instead this communication could already make use of the mutual TLS that currently gets established during the final cluster forming. By moving the exchange of trust right after the discovery of peers and replacing it with a proactive human verification option, it can be ensured
that the nodes in the cluster are the ones they pretend to be over mDNS.
Instead of forming the MicroCloud’s MicroCluster at the end, this will be done already as part of the discovery so that the established mutual TLS can be used for the follow up tasks to discover the required information and to form the remainder of clusters for LXD, MicroCeph and MicroOVN.

Another option could be the extension of the already existing mDNS protocol with a HMAC to prevent broadcasting a secret across the network but still allow the sender and receiver to trust each other.

Discovery option A (reduced mDNS)

MicroCluster discovery option A

Every microcloudd broadcasts a reduced list of details on the network including its own certificate’s fingerprint:

type ServerInfo struct {
    // The current version of the mDNS broadcast/lookup format
    Version     string
    // The hostname of the node
    Name        string
    // The address of the node's MicroCloud API endpoint
    Address     string
    // The interface used on the sending side
    Interface   string
    // A list of services (e.g. LXD, MicroCeph, MicroOVN) present on this node
    Services    []types.ServiceType
    // The node's certificate fingerprint
    Fingerprint string
}

The initial questions to pick a network for discovery are kept as is:

Select an address for MicroCloud's internal traffic:
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +----------------------------------------+--------+
       |                ADDRESS                 | IFACE  |
       +----------------------------------------+--------+
> [x]  | 10.60.35.134                           | enp5s0 |
  [ ]  | fd42:8a02:2309:cea5:216:3eff:fe3e:e5c0 | enp5s0 |
       +----------------------------------------+--------+
...
Limit search for other MicroCloud servers to 10.60.35.134/24? (yes/no) [default=yes]:

The following dialog gets updated as microcloudd now has to create join tokens for each of the peers that have been discovered. As such tokens should only be created if the peer is a legitimate one, the administrator can now check the provided fingerprints for validity and compare those against the actual certificate of the respective peer:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+--------------+-------------+
       | NAME | IFACE  |     ADDR     | Fingerprint |
       +------+--------+--------------+-------------+
> [x]  | m3   | enp5s0 | 10.60.35.113 | aabbccdd... |
  [ ]  | m2   | enp5s0 | 10.60.35.242 | eeff0011... |
       +------+--------+--------------+-------------+

Based on the selection microcloudd creates a join token for each of them.

To allow joining the peer interactively, the dialog gets updated with the join command the administrator has to run on the peer:

The join token for m3 (aabbccdd...) has been created.
Run the following command on m3 to join this MicroCloud:

microcloud join aabbccddeeff001122334455...

As the initial microcloudd cannot anymore run the RPC request using the authentication secret, the administrator now has to enter the peer and run the new microcloud join command:

microcloud join [token]

The token itself contains the address of the initial microcloudd’s API endpoint to initiate the join process. As there isn’t yet any trust relation from the joining peer to the initial microcloudd, we cannot validate the certificate of the remote side against the certificates in the peer’s truststore. Since the token contains the fingerprint of the originating microcloudd, the joining peer can compare the fingerprints to make sure it doesn’t join a malicious MicroCloud.

On the initial microcloudd the administrator could get prompted with an updated dialog that now requires approving the join request from the peer:

Allow m3 (aabbccdd...) to join the MicroCloud? (yes/no) [default=yes]:

As checking the fingerprint could have already been done as part of selecting the peer after the mDNS discovery, this step should be considered optional as it doesn’t really provide any additional security.

If the peer is accepted, the initial microcloudd will join this node into the underlying MicroCluster to establish mutual TLS. Using this trust the microcloudd can request further information from the peer and initiate the forming of the other clusters used for LXD, MicroCeph and MicroOVN.

Afterwards the dialog again gets updated to the initial screen displaying the remaining peers to repeat the process:

Scanning for eligible servers ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------+--------------+-------------+
       | NAME | IFACE  |     ADDR     | Fingerprint |
       +------+--------+--------------+-------------+
> [x]  | m2   | enp5s0 | 10.60.35.242 | eeff0011... |
       +------+--------+--------------+-------------+

When adding a new peer later the exact same process is repeated.

Discovery option B (without mDNS)

MicroCluster discovery option B

This option is working completely without mDNS as the decision which peer should join the MicroCloud is fully up to the administrator.
Instead the administrator is asked to provide a general join token. This token contains the address of the initial microcloudd alsongside the secret. This ensures that an administrator doesn’t have to enter the address of the already existing cluster when joining a peer.

By default if no secret is provided the initial microcloudd will create a randome generated secret that can be used in the
subsequent steps to join all of the peers. There isn’t a specific secret for each peer:

MicroCloud generates a secure join secret by default.
Would you like to use the default join secret: (yes/no) [default=yes]: no

If the administrator selects no, the next question will ask to enter the join secret:

Specify a general join secret:

The next dialog will then show the same join command as for option A. Keep note that this is a general join command that can be reused various times for each of the peers that the administrator wants to add to the MicroCloud:

The join token has been created.
Run the following command on all the nodes to join this MicroCloud:

microcloud join aabbccddeeff001122334455...

Press any key to continue...

As in option A the administrator now has to enter the peer(s) and run the new microcloud join command:

microcloud join [token]

The command blocks until the request got approved on the cluster side.

After pressing any key on the initial node the dialog gets updated with a table listing all the pending cluster join requests so that the administrator can allow access for each of them. The table updates regularly to represent the latest list of join requests:

Scanning join requests ...
Space to select; enter to confirm; type to filter results.
Up/down to move; right to select all; left to select none.
       +------+--------------+-------------+
       | NAME |     ADDR     | Fingerprint |
       +------+--------------+-------------+
> [x]  | m1   | 10.60.35.241 | aabbccdd... |
  [ ]  | m2   | 10.60.35.242 | eeff0011... |
  [x]  | m3   | 10.60.35.243 | 22334455... |
       +------+--------------+-------------+

Based on the provided fingerprint the administrator can compare if the peer is the actual one that he wants to join into the cluster.
Another approach would be to use the same dialog updates as in option A prompting for each new peer as soon as the join request comes in on the initial microcloudd.

Discovery option B2 (session level join token)

:star2: This option (besides C2) is currently considered to be the most preferred one.

As option B this one is working completely without requiring mDNS. The essential difference compared to option B is the introduction of a session in which the join token is valid. This allows keeping the secret in memory until the point where the cluster forming is complete after which the secret gets discarded on all ends.

A new session is started by the initial microcloudd after running the microcloud init command. As in option B by default a random secret will be generated but the administrator can also provide a custom one.
The joining side learns about the secret after initiating the join process using microcloud join <token>. The command blocks until the request got approved and the local microcloudd will discard the secret after having successfully joined the cluster.
If there aren’t any more peers that the administrator wants to add into the MicroCloud, the initial microcloudd discards the secret too and continues on with the remainder of the configuration.

When adding additional peers after the initial cluster forming, the microcloud add command runs through the same secret dialog again asking the administrator to either provide a new secret manually or to use a random auto generated one.
The process on the peer is the same as described above.

Discovery option C (HMAC and mDNS)

MicroCluster discovery option C

In this option mDNS is kept and the message which gets broadcasted is reduced by the secret and extended with a hash-based message authentication code called HMAC. This MAC allows the receiving side to validate the broadcast message as it can compute the same MAC using the given message and a shared secret that is only known to trusted peers of MicroCloud:

type ServerInfoHMAC struct {
    // MAC generated from HMAC(secret, Info)
    HMAC string
    // The existing ServerInfo struct
    Info ServerInfo
}

type ServerInfo struct {
    // The current version of the mDNS broadcast/lookup format
    Version     string
    // The hostname of the node
    Name        string
    // The address of the node's MicroCloud API endpoint
    Address     string
    // The interface used on the sending side
    Interface   string
    // A list of services (e.g. LXD, MicroCeph, MicroOVN) present on this node
    Services    []types.ServiceType
    // The node's certificate fingerprint
    Fingerprint string
}

In order to distribute such a secret every microcloudd has to be made aware of this shared secret. Therefore another command microcloud config set core.secret xyz is added which has to be executed on each of the peers that will form the MicroCloud cluster.

Using this secret both sides can validate each other as messages exchanged between them always contain the MAC for validation on the receiving side. This is a direct replacement for the X-MicroCloud-Auth header as the initial microcloudd can set the MAC on the Authorization header when trying to run RPC requests on the peers to collect network and disk details and to form the other clusters.

The question and answer dialog is kept as is. As in option A the fingerprints of the discovered peers are shown to allow comparing those with the ones on the actual peers.

On the initial microcloudd the administrator could get prompted with an updated dialog that now requires approving the join request from the peer. This is equivalent to option A and optional as it wouldn’t really improve the overall security.

When forming all the other service’s MicroClusters, the requests can be allowed automatically. For this MicroCloud can check if the join request is originating from a host that we have already manually validated when forming the MicroCloud’s MicroCluster.

Discovery option C2 (session level join token)

:star2: This option (besides B2) is currently considered to be the most preferred one.

As option C this one is keeping mDNS in order to learn about potential peers and to exchange the intial set of information. The essential difference compared to option C is the introduction of a session in which the HMAC secret is valid. This allows keeping the secret in memory until the point where the cluster forming is complete after which the secret gets discarded on all ends.

For option C2 there isn’t any need for a configuration system that allows setting and unsetting of a secret in MicroCloud. Instead the same question and answer dialog as in B2 can be used to either use an auto generated random secret or let the administrator specify a different one.
On the joining side the session can be started by running the microcloud join <token> command which will set the HMAC secret for its broadcasts.
The command blocks until the initial microcloudd instructs this peer to join into the existing cluster.
Starting from this point onwards the secret is discarded on the joining side.
If there aren’t any more peers that the administrator wants to add into the MicroCloud, the initial microcloudd discards the secret too and continues on with the remainder of the configuration.

Adding additional peers is equal to option B2.

Daemon and API changes

MicroCloud

For both options A and B the forming of the MicroCloud’s MicroCluster is done already as part of the discovery. This allows setting up mutual TLS across the clusters peers right from the beginning. This trust relationship is then used to request hardware details from all of the peers and to initiate RPC requests to form the remainder of the clusters (LXD, MicroCeph and MicroOVN). This has the drawback that errors during the question and answer dialog will require a reset of the cluster to start from scratch.

For discovery option B microcloudd doesn’t anymore require mDNS to exchange information which will cause the removal of those bits from the code.

In any case the X-MicroCloud-Auth header is being removed as there isn’t anymore a secret being broadcasted.
Requests to any peers of the MicroCloud have to be made using the authentication methods described earlier.

Service cluster forming

With the changes proposed for the discovery, MicroCloud performs human validation of the peers that want to join the cluster. This information is cached so that it can be used in a later step when MicroCloud forms the actual MicroClusters.

Only when forming the MicroCloud’s MicroCluster, the cached information can be used to approve the incoming join requests. Follow up cluster creations (for the other services) are auto approved if they are originating from the same host. This can be checked if the peer’s MicroCloud passes it’s own certificate in the cluster join requests of the services.

When running MicroCluster’s JoinCluster() on the joining peer using the provided token, MicroCloud passes it’s own MicroCluster’s certificate (or fingerprint) in the functions initConfig map[string]string parameter so that the receiving end can check if the certificate matches the one that has been trusted earlier as part of the human validation.

The required logic can be implemented by downstream projects like MicroCloud using the new functions shown in the next section.

Joining existing services

As part of #259 MicroCloud grew support to reuse existing MicroCeph and MicroOVN clusters. The process behind relies on one of the microcloudd within the existing cluster being able to create a join token on the peer that allows joining into the already existing remote cluster(s).

By using discovery option C this concept wouldn’t be blocked as microcloudd would continue to use the same paths of communication to reuse the existing clusters.

MicroCluster

In order to address our use case (making the overall MicroCloud join process more secure) and additional requests from other teams (e.g. #96), MicroCluster has to grow support to validate cluster join requests also on the receiving (existing cluster) side.

By doing the heavy lifting on MicroCluster and exporting the new functionality, we can make use of it ourselves in MicroCloud and can additionally offer the same concepts to other projects like MicroK8s.

The workflow of joining a MicroCluster is kept until the point were the existing cluster side has to proactively grant the join request. A downstream
project (like MicroCloud) should be able to intercept the authentication to forward the decision making to the administrator.
There are two options how this could be achieved which will be shown in the next section.

Using another hook

Add another hook called OnClusterJoin that is triggered when the MicroCluster receives a new join request on
its public POST /cluster/1.0/cluster endpoint.
This new hook would receive the ingress request and has to return either true or false to indicate if the join request gets allowed or not:

type Hooks struct {
    ...

    OnClusterJoin func(s *state.State, r *http.Request) (bool, error)
}

A downstream project could then inject any type of logic that allows human interaction to allow this join request.
However this might lead to long running requests as it depends on how fast the hook returns its decision.

Using another API endpoint

To prevent having long running requests, the existing public POST /cluster/1.0/cluster endpoint could return right after validating the token and marking the new peer as pending.
By adding a new GET /cluster/1.0/cluster/{member} endpoint the joining side can perform regular polling until the join request got allowed by an administrator.

This endpoint would essentially return what is currently returned by POST /cluster/1.0/cluster after successfully joining the cluster:

  • The cluster’s key pair and
  • a list of the already existing cluster members including their certificates

Instead of TokenResponse the response could be of type ClusterResponse embedding ClusterMemberLocal which should have an additional status field:

type ClusterResponse struct {
    ClusterMemberLocal
    ClusterCert    types.X509Certificate `json:"cluster_cert" yaml:"cluster_cert"`
    ClusterKey     string                `json:"cluster_key" yaml:"cluster_key"`
    ClusterMembers []ClusterMemberLocal  `json:"cluster_members" yaml:"cluster_members"`
}

type ClusterMemberLocal struct {
    Name        string                `json:"name" yaml:"name"`
    Address     types.AddrPort        `json:"address" yaml:"address"`
    Certificate types.X509Certificate `json:"certificate" yaml:"certificate"`
    Status      string                `json:"status" yaml:"status"`
}

If the join request is still pending, the API does not yet return the cluster’s information including certificate, key and a list of the already existing members.

Authentication on this API endpoint is performed via mutual TLS using the certificate of the peer that got recorded as part of the initial join request to POST /cluster/1.0/cluster. This way only the peer itself can request it’s own status.
As the peers certificate isn’t yet in the truststore of the cluster, there has to be special handling to authenticate the polling requests based
on the peers certificate in the database.

An administrator can use the PUT /cluster/1.0/cluster/{name} endpoint to update the status of a peer by accepting the join request.
The endpoint is currently not expecting any request body.
Additionally the endpoint has to be extended as currently when called it re-execs the MicroCloud daemon with a fresh state directory.

The MicroCluster type has to grow support for two additional methods that support listing and approving pending cluster join requests:

func (m *MicroCluster) ListJoinRequests(ctx context.Context) ([]internalTypes.ClusterMemberLocal, error)
func (m *MicroCluster) ApproveJoinRequest(ctx context.Context, clusterMember string) error

Using those methods, downstream projects like MicroCloud can implement their own logic when it comes to approving join requests.
By using the ClusterMemberLocal type that includes the peers certificate, a downstream project can easily retrieve the fingerprint for easier validation of the join request.

Preseed changes

Currently MicroCloud allows unattended installations using a preseed file from stdin when running the initialization using microcloud init --preseed.
By enforcing human validation of the joining peers, this breaks the concept as there isn’t any option for human interaction when running an automatic deployment.

To overcome this constraint the MicroCloud’s MicroCluster certificate on the peer could potentially be pre-provisioned in the state directory so that MicroCloud does not create a certificate and key by itself but instead picks the one which already exists.
This allows being aware of the actual certificate that will be used in the join process so that the fingerprint of the joining node could be set as part of the preseed file when adding a new system:

systems:
- name: m1
  fingerprint: aabbccddee...
  ...
- name: m2
  fingerprint: ff11002233...
  ...

If the join requests from the peer is received, the initial microcloudd can then decide based on the information from the request and the fingerprint defined in the preseed file, if the peer is allowed to join the cluster or not.

This approach would be valid for all discovery options A, B and C.

CLI changes

MicroCloud

For discovery option A and B a new command microcloud join gets added to allow a peer joining into MicroCloud.
This new command blocks until the request got approved on the receiver side.

For discovery option C the administrator has to set a shared secret to use for HMAC.
As this has to be done on all of the MicroCloud peers, another set of commands is added to manage a local MicroCloud’s configuration:

  • microcloud config set: To set configuration keys like core.secret
  • microcloud config unset: To unset configuration keys

To prevent leaking the secret there doesn’t have to be an option to show the configuration as the secret can be unset or updated.

Database changes

MicroCluster

In order to track join requests we have to indicate a pending state so that an administrator can approve individual requests made to an existing cluster.
For this we could potentially reuse the role column in the internal_cluster_members table and indicate non approved join requests with pending.

Another option is adding a new status column which would be used specifically for this use case.

Packaging changes

No packaging changes expected.

Great write-up, thanks for this!

I believe @sdeziel1 mentioned there might be some issues with keyboard-and-mouse systems and copying strings from one system to another becomes very difficult. I’m not sure if this is something that greatly affects current/future MicroCloud users.

The problem is, if we don’t manually input a secret string/join token at some point, then there’s no way to verify whether an mDNS payload actually came from a genuine system. Since we can’t trust the local network, then we must expect any bad actor can just listen for the payload and broadcast the same thing, and we could mistakenly trust the spoofed server instead.

All that said, I do like option C the best because it doesn’t break the flow of the initialization process, and the secret can be selected by the user so the keyboard-and-mouse case is less of a problem.

Thanks for jogging my memory. By those keyboard-and-mouse systems, I was referring to KVM (Keyboard, Video, Mouse) consoles sometimes present in server racks. Those give you console access to each of the servers in the rack but you cannot copy-n-paste between them. This means we should aim for easily (repeatedly) typed input.

(Still not done reading this spec so more feedback to come later).

To add on to what I was thinking for option C, maybe microcloud init could look a bit like this?


   Scanning for eligible servers ...
   Please enter the following on any systems you want to join the cluster.

     microcloud cluster verify adjective-noun

   Space to select; enter to confirm; type to filter results.
   Up/down to move; right to select all verified; left to select none.
          +---------+--------+---------------+------------+
          |  NAME   | IFACE  |     ADDR      |   STATUS   |
          +---------+--------+---------------+------------+
   > [x]  | micro3  | enp5s0 | 203.0.113.171 |  verified  |
     [x]  | micro4  | enp5s0 | 203.0.113.172 |  verified  |
     [ ]  | micro2  | enp5s0 | 203.0.113.170 | unverified |
     [ ]  | micro5  | enp5s0 | 203.0.113.173 | unverified |
          +---------+--------+---------------+------------+

So how this would work is like follows:

  • when all MicroCloud daemons start, they continuously listen for mDNS payloads
  • first node runs microcloud init and broadcasts that it is looking to form a cluster
  • when other nodes receive this payload, they then broadcast basic information (the same info as today, but without the X-MicroCloud-Auth secret included).
  • the first node now consumes the minimal payloads from the other systems. At the same time, it generates a human-readable secret that must be entered on every joining system before we can proceed by running microcloud cluster verify <secret>. This secret is only displayed locally on the first node.
  • when other nodes run microcloud cluster verify <secret>, they change the payload they are broadcasting to instead be hashed with the secret, and include any other sensitive information that we need to set up the Authorization request header for requests going from the first node to joiners. The second “sensitive” payload can actually be sent directly via HTTP back to the first node as well, so we don’t even need to broadcast the hashed payload over the local network.

The table of systems that the first node finds from mDNS lookup will have a column STATUS that reports the verification status of any particular node. If it is still broadcasting the raw minimal payload, it will be considered unverified. If it’s broadcasting a payload that is hashed with the secret, it will be considered verified. The table will only allow selecting verified systems.

This way, KVM systems like @sdeziel1 mentioned can easily input the verification as it is consistent and human-readable, and the user doesn’t expose any sensitive information openly over the local network. As well, the user gets immediate feedback on the first node about when it can actually proceed with the initialization.


For the preseed, you’ve mentioned preloading the certificates on each system but that is itself a form of user interaction per system so I’m not sure if it makes a difference if we just use microcloud cluster verify <secret> on each system prior to running the preseed. I’m not sure if we need two separate verification mechanisms here.


Using another API endpoint

To prevent having long running requests, the existing public POST /cluster/1.0/cluster endpoint could return right after validating the token and marking the new peer as pending.
By adding a new GET /cluster/1.0/cluster/{member} endpoint the joining side can perform regular polling until the join request got allowed by an administrator.

For a bit of background on the PENDING cluster status in microcluster, that actually means that at least some nodes in the cluster will not yet accept requests originating from the PENDING node, as they do not yet have a truststore entry, which is used for API authentication. At the moment, newly joined nodes will remain pending until the next heartbeat synchronizes the truststore across all nodes. However, due to go-dqlite issues causing extremely resource-intensive heartbeats, the heartbeats occur over very long intervals. I have been working on instantly distributing truststore entries to all nodes in the cluster when a new node joins to work around the heartbeat issue, so that might pose some problems for this approach.


So some key points that we need to address:

  • Is one “secret” per initialization process enough, or should we have a unique “secret” per joining node?

  • How long should the “secret” live? The proposal in the spec involves generating the “secret” before calling microcloud init, but if instead we automatically generate it when running microcloud init then we can control its lifecycle more thoroughly. If it’s generated by running microcloud init, then it should expire if the initialization is cancelled, but how should we inform other nodes that the initialization was cancelled, assuming they have already been verified by the user?

Thank you Julian for the spec. Option C seems to me the most viable as well. We should find a solution that doesn’t exclude automation, since for large clusters the administrator cannot add manually all nodes one by one. For option C I have a few comments:

  1. Additional information on how the secret is stored securely on the nodes with microcloud config set secret
  2. Some expiration for the secret is needed, it cannot last forever and also it is the same for all the nodes of the cluster. If in a second moment a node needs to be added to the cluster a different secret should be used and not the initial one

For Option A, I dont believe the spec as it currently stands explains how one can manually verify the cert fingerprint of a joining node matches the one expected? We would need a way for the joining node to locally display its fingerprint right?

Option B sounds similar to LXD’s long-lived trust password model, which allows setting a shared password that allows joining members to add their certificate into the cluster’s trust not.
It is not recommended anymore and is why short-lived per-member join tokens were added to avoid the leaking of the shared password presenting a security issue.

https://documentation.ubuntu.com/lxd/en/latest/authentication/#authentication-trust-pw

The difference is that the existing cluster has to verify the joiner in this case, whereas that doesn’t need to occur with LXD’s trust password model.

This is similar to option B, except that apparently this secret is long-lived whereas in option B it was only valid during the microcloud init call and wasn’t persisted to the database. Is this correct?

This option seems even more similar to LXD’s not-recommended long-lived shared join password approach, except it does still require confirmation from the existing cluster.

@maria-seralessandri agreed, but as I understand it, there are a couple of “flavours” of automation available to us.

  1. Retain the option for MicroCloud to deploy itself from pre-seed files without interaction. This may not be possible given the requirement for confirming each side of the join process.
  2. Arbitrated deployment by way of something like Juju. @sdeziel1 mentioned the other day that if MicroCloud itself cannot deploy itself in an automated manner, it may still be possible to provide automated deployments if using something like Juju which can replicate the manual verification steps required.

If I understood it right your suggestion adds the following on top of C:

  • The initial MicroCloud daemon broadcasts its intent to form a cluster. This will cause the peers to start broadcasting
  • The peers respond either with a “plain” or “hashed” broadcast depending on whether the administrator already executed the microcloud cluster verify <secret> command on the peer to set the secret
  • The initial MicroCloud marks a peer as verified (trusted) if it receives a hashed broadcast

What would be the benefit of letting the initial MicroCloud daemon tell its local network that it wants to form a cluster if an admin anyway has to go to each of the nodes to enter the secret?
The only point I can think of is letting the peer validate that it doesn’t join a malicious cluster. But this functionality we already have in MicroCluster as the token contains the clusters fingerprint which can be validated by the joining side.

I like the idea of generating the secret in MicroCloud directly so that we have more control over it. When installing the snap on each of the nodes the secret will of course be different but with this approach it only needs to be changed on the joining side and can be leaved untouched on the initial MicroCloud which reduces the amount of required steps.

That is a good point. The certificate gets created automatically in MicroCluster’s state directory but AFAIK there is currently no straightforward command to retrieve this information on the joining side.

@jpelizaeus

@masnax and I were discussing in our 1:1 the possibility of avoiding needing microcloud config set core.secret xyz and persisting the secret to disk.

And instead having microcloud init generate a per-invocation secret, that would then be used as an argument/interactive to a microcloud join <secret> command which would block until the join was completed. That joining members would only start broadcasting when microcloud join was running.

And when the the microcloud [init|join] commands end they would forget their invocation secret.

Thanks I see, so that would be option B without having a persistent token? As today the token/secret could contain the join information (address) which would make having mDNS obsolete. And the join request from the peer to the initial MicroCloud daemon could then already be a HTTPS request using the secret/HMAC for verification purposes.

What would be the benefit of letting the initial MicroCloud daemon tell its local network that it wants to form a cluster if an admin anyway has to go to each of the nodes to enter the secret?

The initial MicroCloud still needs to somehow become aware of the joining nodes, and the joining nodes need to become aware of the initial node. Otherwise we would have to manually type in addresses as part of microcloud cluster verify <secret> <init node's address>.

Today, as soon as a MicroCloud snap is installed, it begins advertising its address and services over the local network. I’m proposing making this less noisy by instead making all nodes listen by default, and only trigger the advertisement after microcloud init has been executed somewhere on the local network.

Without this, all nodes (including the initial node, because it didn’t have to be the initial node) would have to perpetually broadcast their intent anyway, or we drop mDNS entirely and the user specifies the init node’s address directly to each joiner.

By keeping mDNS, we can still maintain some determination of compatibility of all nodes when running microcloud init, before going to each node and verifying them. We can see right away which nodes can even join the cluster, or have the same services, and we don’t have to log into each one first.

I see what you mean. But if we use a secret/join token that embeds the address of the MicroCloud (like it is currently done for MicroCluster), the administrator doesn’t have to manually type it in on the joining side and the MicroCloud doesn’t need to broadcast it to potential peers.
However this collides with what @sdeziel1 wrote in regards to the KVM consoles as such a secret wouldn’t be easy to type in these environments.

In regards to the service discovery, we can potentially perform this as part of the initial join request from the peer to the MicroCloud by extending the request body.

@tomp @masnax I have extended the spec with options B2 and C2 to address your feedback on using short lived secrets within a so called session so that we don’t have to persist anything to disk.

If we consider the LAN to be untrusted/hostile, we then need a solution that is resistant to MiTM. This problem space has years of research and many failed attempts along the way so I think we should use something tried and true. Here are some existing solutions I’m aware of:

The Bluetooth one seems particularly attractive to secure an unprotected mDNS conversation doing all the heavy lifting of copying certs and keys around.

So I hope I’ve got the flow correct here:

B2 interactive setup:

  • First node runs microcloud init

    • A token including the secret and the address of the init node is generated

    • The init node waits for joiners to contact it. The user chooses when to continue through the setup.

  • All nodes must run microcloud join <token>

    • The joiner reaches out to the init node over HTTPS, with info about joining the cluster.
  • Through the setup, until the nodes are clustered, they are trusted by the init node using the token.

Potential Issues with B2

  • Cumbersome to KVM setup due to having to copy the encoded secret to each joiner.

  • If microcloud init is aborted, the joiners will continue to perpetually trust the invalid token. But if the tokens have an expiry, they can expire before the user has finished the interactive setup. We would need to poll the init node from the joiners regularly.

  • We need some way to handle mismatch of installed services on each node. Currently, MicroCloud’s mDNS record will filter out any nodes that don’t have the same set of of services, or offer to the user if they want to skip that service. This would have to be implemented on each joiner instead when we call microcloud join <token>. Or we would have to be stricter about what service combinations are required.

C2 interactive setup:

  • The first node runs microcloud init

    • It broadcasts its intent to form a cluster over mDNS.

    • A plaintext, human readable password is generated.

    • The init node begins looking up eligible systems over mDNS and displays them to the user.

  • The joiners enter microcloud join <password>

    • Each joiner generates an HMAC payload encoded with the password, and broadcasts it over mDNS.
  • The init node receives the mDNS payloads, and decodes them with the password.

    • The user makes a final confirmation of which nodes they want in the cluster
  • Through the setup, until the nodes are clustered, they are trusted by the init node using the password.

Potential issues with C2

  • C2 actually handles the issues from B2 rather well:

    • The passwords are human-readable so KVMs have a viable solution.

    • Because the init node is broadcasting its intent to form a cluster, the password will only be trusted as long as the broadcast is ongoing.

    • Because we have some minimal information from each node prior to running microcloud join <password>, it’s easier to spot issues and config mismatches before logging into every joiner.

  • Biggest issue I see is that it includes mDNS so it’s a more complex system than B2

Preseed

  • In both B2 and C2, we would have to make some compromises for preseed authentication:
    • In the case of B2, it’s not enough for the user to specify the secret because the init node’s address and available services must also be encoded. This means the user will have to run microcloud init --preseed first, which will then print out the token that the user must use in microcloud cluster join <token> on each joiner.

    • For C2, we could either do the same as above, or the user can supply their own password directly in the preseed file.

The EFF publishes a few lists of words that are easy and quick to memorize/type/autocomplete. Those are meant to be use in Diceware type of passwords but I think would make a good basis for the “authenticated exchange” PSK validation we intend on doing.

https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases

The short word list sounds interesting and should be easily embedded into the daemon.

In Bluetooth SSP those verification numbers are actually derived from a hash that gets computed on both ends based on information that gets exchanged during the pairing.

In case of “Numeric Comparison protocol” (check section 7.2.1 of https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-54/out/en/br-edr-controller/security-specification.html#UUID-045cba38-3e1c-51b9-a02f-75356c6829c1), the numeric verification number is created by taking the last 32 bit from the hash function g(...)'s output (see section 7.7.2 in the same link) and dividing it by 10⁶ to always get six numbers.

I like the idea but it’s not an arbitrary string. Instead it’s based upon information provided by both ends with enough randomness.

1 Like