TLS Fine-grained authorization

markylaing · October 3, 2024, 11:34am

Project	LXD
Status	Completed
Author(s)	@markylaing
Approver(s)	@tomp @mionaalex @egelinas @maria-seralessandri
Release	6.2
Internal ID	LX088

Abstract

This specification details the addition of fine-grained authorization for TLS identities in LXD. The current status of authorization in LXD will be outlined, and the challenges in adding fine-grained authorization for TLS identities identified. A solution will be proposed that will include:

Adding a new identity type denoting a certificate whose permissions can only be managed via groups (and cannot be restricted/unrestricted or assigned to projects).
A new mechanism for adding trusted TLS identities.
Expansion of /1.0/auth APIs to include creation of TLS identities, and deletion of TLS and OIDC identities.

Rationale

The specification for Identity and Access Management in LXD has now been implemented for OIDC authenticated identities. However, it is not yet possible to grant fine-grained permissions to TLS identities, and TLS clients can only be restricted on a per-project basis (see Restricted TLS certificates). Furthermore, the addition of the /1.0/auth/identities API and lxc auth identity command presents an issue for usability and documentation. With the pre-existing /1.0/certificates API and lxc config trust command, there are now two ways of interacting with the TLS certificates. Additionally, there are now two ways to restrict user access to resources in LXD (via groups and permissions, or via restricted certificates and their projects).

Specification

Fine-grained authorization for TLS clients

The implementation of the initial specification for fine-grained access management accounted for fine-grained access control for TLS clients. Currently, certificate identities are prevented from being added to groups by a stanza in the PUT and PATCH handlers of /1.0/auth/identities/{authenticationMethod}/{nameOrIdentifier}. We could remove this check to allow existing client certificate identities to be added to groups, but this would not take effect because authorization handling for identities of this kind is still performed by the TLS authorization driver, which checks only the restricted status of the certificate and their project list. Instead, we will add a new identity type:

IdentityTypeCertificateClient = "Client certificate"

Unlike other client certificate identity types which include “(restricted)” or “(unrestricted)” in their names. This identity type is “restricted” by default because it will not have any permissions unless added to a group. As we want this to be the new default, there will not be a “(fine-grained)” qualifier.

The restrictions in PUT and PATCH methods on /1.0/auth/identities/{authenticationMethod}/{nameOrIdentifier} will be changed to allow identities of this type to be added to groups. The OpenFGA authorizer will also be updated such that it does not delegate permission handling to the TLS authorizer for identities of this type. Permission management should then work identically for TLS certificates as it does for OIDC clients, with the exception that identity provider groups will not be considered[^1].

Creation of fine-grained TLS identities

The existing mechanism for creating a client certificate is to run lxc config trust add <certificate.crt>. If a certificate file is provided, the client sends a POST request to /1.0/certificates with the base64 encoded raw bytes of the certificate in the certificate field of the request body. If the client has sufficient permission, the identity will be created. If a certificate is not provided, the client sends a POST request to /1.0/certificates with the token field set to true. If the client has sufficient permissions, the server will create a “certificate add” operation, and return a “certificate add” token to the client. Both the certificate add token and certificate add operation contain the same random secret. The certificate add operation additionally contains the request body that was sent when the token was created, this is used later when the token is used to ensure that the new client will have the “restricted” property and project list specified by the privileged client. The “certificate add” token is sent out-of-band to another client, who can use the token to create a certificate. When the untrusted or unprivileged client uses the token (on lxc remote add) the client sends a final POST request to /1.0/certificates containing the token. If an operation is found with a matching secret, a certificate (either base64 encoded in the request body, or sent by the client during the TLS handshake) is created with privileges matching that of the original request.

The initial design for certificate add tokens made use of long-lived operations so that the feature could be backported to an LTS release (with no schema changes). This procedure has worked well so far, but there are a few outstanding issues:

If the cluster member containing the operation is restarted, the operation metadata is lost and the token will no longer work (regardless of its expiry time).
If the initial token request contains a list of projects and one or more of those projects is deleted before the token is used, the token will be invalidated. (Creating the certificate will fail due to a foreign key constraint violation).

To retrofit this process for use with the creation of fine-grained TLS identities, we would need to conditionally modify the contents of the certificate add operation metadata (e.g. to include a list of groups rather than a list of projects). Given the caveats of the operation-based approach and the risk that comes with modifying it, a new method for creating fine-grained TLS identities will be used.

A new POST /1.0/auth/identities/tls endpoint will be added. Which will accept the following request body:

// IdentitiesTLSPost contains required information for the creation of a TLS identity.
//
// swagger:model
//
// API extension: access_management_tls.
type IdentitiesTLSPost struct {
	// Name associated with the identity
	// Example: foo
	Name string `json:"name" yaml:"name"`

	// Trust token (used to add an untrusted client)
	// Example: blah
	TrustToken string `json:"trust_token" yaml:"trust_token"`

	// Whether to create a certificate add token
	// Example: true
	Token bool `json:"token" yaml:"token"`

	// The base64 encoded public certificate of the identity
	Certificate string `json:"certificate" yaml:"certificate"`

	// Groups is the list of groups for which the identity is a member.
	// Example: ["foo", "bar"]
	Groups []string `json:"groups" yaml:"groups"`
}

As with the certificates API, if the caller has can_create_identities on server, they may set the certificate field to the base64 encoded x509 certificate of the new client. This will create the new identity immediately, and refresh the identity cache on all cluster members so that the new identity is able to authenticate. If the caller sets the token field to true, a new identity will be created, with

IdentityTypeCertificateClientPending = "Client certificate (pending)"

The metadata for identities of this type will contain a random secret and an expiration time. The identifier for identities of this type will be a v4 UUID. The server will return a bas64 encoded certificate add token:

// IdentityTokenTLS contains a token that can be used by an untrusted client to gain trust with a LXD system.
//
// swagger:model
//
// API extension: access_management_tls.
type IdentityTokenTLS struct {
	// TrustToken is a base64 encoded CertificateAddToken.
	TrustToken string `json:"trust_token" yaml:"trust_token"`
}

An additional field in the certificate add token will indicate to the client that this token is for use with /1.0/auth/identities/tls:

type CertificateAddToken struct {
	/* Existing fields ... */

	// Type indicates the identity type that this token
	// was issued for.
	// Example: Client certificate
	//
	// API extension: access_management_tls
	Type string `json:"type" yaml:"type"`
}

When an untrusted client uses the issued token to add a LXD remote, the client will conditionally send the token to the new endpoint based on the contents of the Type field and the access_management_tls API extension.

Clients that have not yet updated will continue to send their tokens to the /1.0/certificates endpoint. To return a useful error in this case, the server will check the validity of tokens sent to this endpoint against both certificate add operations and pending TLS identities. If a pending TLS identity is found, a 400 Bad Request error will be returned containing a message that the client needs to be updated.

On receipt of a request to /1.0/auth/identities/tls from an untrusted caller, the handler will query for a pending identity whose secret matches the secret in the trust_token field. If a pending identity is found that has not expired, it will be upgraded to a “Client certificate”. The fingerprint of the certificate that was sent during the TLS handshake will be used as the identifier for the identity, and the PEM encoded certificate will be set in the identity metadata (for mTLS). The identity cache will be updated on all members so that the new identity is able to authenticate.

A new cluster task will be added to remove pending TLS identities that have passed their expiry.

This approach has a number of advantages over the previous operation-based mechanism:

Pending TLS identities can be edited. If an administrator issues a token but sets an incorrect group, the pending identity can be edited directly (previously the token would need to be revoked and a new token issued).
Revoking a token means deleting the TLS identity.
If a group is deleted before the token is used, the token is still valid.
Metadata is persisted to the database, so the token will be valid if the issuing member is restarted.

Deletion of identities

Two new endpoints will be added: DELETE /1.0/auth/identities/tls/{nameOrIdentifier}, and DELETE /1.0/auth/identities/oidc/{nameOrIdentifier}. Where a name is provided, it must be unique. Deleting a TLS identity will revoke trust entirely. Deleting an OIDC identity will remove the identity from all groups, but will not revoke trust since this is managed by the external IdP.

Handling overlap of certificate and identity APIs

With the addition of creation and deletion of TLS identities to the /1.0/auth API, there is a risk of conflating usage and confusion to our users. Additionally, the lxc config trust command and certificate APIs are used extensively by orchestration tools including our own terraform provider.

To mitigate risk, the /1.0/certificates API will not be deprecated. The long-term plan will be for this endpoint to be used exclusively to list server certificates and manage the cluster join process. In the meantime, it will continue to be used for managing the legacy certificate types (client, metrics, server). New fine-grained TLS identities will not be returned by the /1.0/certificates API, and therefore not viewable via lxc config trust list. The /1.0/auth/identities API will continue to show all identities in the system, but only fine-grained TLS identities, pending TLS identities, and OIDC identities will be editable.

In our authorization model, we currently have both certificate and identity types defined. This means that overlapping permissions can be granted on a certificate. For example, if can_view is granted against a certificate, then the certificate would be viewable in the certificates API, but not in the identities API. To fix this, the database definition of the entity types will be modified such that the certificate entity type applies to legacy certificate types, and the identity entity type applies to fine-grained/pending TLS, and OIDC identities. Access management in the identities API will be updated to account for this.

Eventually, we will deprecate the “restricted” and “projects” fields of a certificate. When we do this, we can release a tool that will migrate legacy client certificates to fine-grained TLS identities[^2].

Client certificate rotation

A client may want to change the certificate that they use to authenticate against LXD. This may be a periodic rotation, or perhaps their private key was compromised. This is currently possible in the certificates API with a call to PUT /1.0/certificates/{fingerprint}. If the certificate is restricted, then only the certificate field may be changed.

To have feature parity with the certificates API, the identities API must also allow this for fine-grained TLS identities. To do this, a new field will be added to the request body for PUT /1.0/auth/identities/{authenticationMethod}/{nameOrIdentifier}:

type IdentityPut struct {
	/* Existing fields ... */

	// TLSCertificate is a base64 encoded x509 certificate.
	//
	// API extension: identity_management
	TLSCertificate string `json:"tls_certificate" yaml:"tls_certificate"`
}

When calling the endpoint, the client must either have can_edit on the identity, or they must be authenticated via TLS as the identity whose certificate is being updated. If the identity is updating themselves and does not have can_edit, then they may only provide the tls_certificate field.

In order for the certificate to be modifiable via the CLI, we will need to return it in the API response for identities. To do this, we will add a field to the Identity API type:

type Identity struct {
	/* Existing fields ... */

	// TLSCertificate is a PEM encoded x509 certificate.
	//
	// API extension: identity_management
	TLSCertificate string `json:"tls_certificate" yaml:"tls_certificate"`
}

Lastly, we need to allow a user to delete their own identity. We will allow this by adding a contextual tuple when querying the embedded OpenFGA server. This is the same mechanism by which all identities are able to view themselves.

User story on first use

Currently, gaining access to LXD over HTTPS involves three commands (or fewer if HTTPS API was enabled during lxd init:

# LXD Host machine
$ lxc config set core.https_address=:8443
$ lxc config trust add --name me
# Client machine
$ lxc remote add my-remote <token>

Fine-grained TLS identities have no permissions by default, so the equivalent steps would be:

# LXD host machine
$ lxc config set core.https_address=:8443
$ lxc auth group create administrators
$ lxc auth group permission add administrators server admin
$ lxc auth identity create tls/me --group administrators
# Client machine
$ lxc remote add my-remote <token>

To keep an equivalent number of steps, we can have an administrators group predefined in LXD that is always present and cannot be deleted. Then on first set up we will have:

# LXD host machine
$ lxc config set core.https_address=:8443
$ lxc auth identity create tls/me --group administrators
# Client machine
$ lxc remote add my-remote <token>

API Changes

POST /1.0/auth/identities/tls
DELETE /1.0/auth/identities/{authenticationMethod}/{nameOrIdentifier}
PUT /1.0/auth/identities/{authenticationMethod}/{nameOrIdentifier}: Additional field for certificate update.

The /1.0/certificates API will be unchanged.

CLI Changes

lxc auth identity create [<remote>:]tls/<name> [<certificate_file>] [[--group <name>]]. Creates a fine-grained auth TLS identity (either directly if a pem encoded certificate file is given, or indirectly via token).
lxc auth identity delete [<remote>:]<authentication_method>/<name_or_identifier>. Deletes the identity.

Database Changes

None.

Upgrade Handling

As mentioned above, token validation on POST /1.0/certificates will additionally check if the given token relates to a pending TLS identity and return an error to indicate that the client needs to update their CLI.

Future Work

A tool for migrating existing client certificates to fine-grained TLS identities, plus migration and improved handling of metrics certificates.
Deprecation of restricted and projects fields in certificates.
Improved PKI integration, perhaps to allow Identity Provider Group mappings for TLS identities.

[^1]: If client certificates are managed via PKI, we could consider extracting Identity Provider Groups from a certificate attribute assertion value. This is outside the scope of the specification.

[^2]: This would be a good opportunity to migrate metrics certificates as well. We should reconsider metrics certificates in relation to the service_account type in the authorization model.