Introducing the Deployment API

gabrielmougard · September 22, 2023, 2:27pm


Project	LXD
Status	Drafting
Author(s)	@gabrielmougard
Approver(s)
Release
Internal ID

Abstract:

This document outlines an approach to enable the development of self-sustaining software applications capable of adapting to dynamic operational conditions through the utilization of a Deployments API.

The crux of this proposal revolves around creating a specialized, bidirectional Deployments API, serving as a universal conduit for seamless interaction between platforms, such as LXD, and application workloads, like Microk8s.

This Deployments API is pivotal, concentrating on providing a standardized, platform-agnostic interaction mechanism that enables automatic scaling, instigated either by the platform or the application workloads.

The proposed Deployments API, with its dual components—Platform API (LXD in our case) and Governor facing API—enables customized control over instance allocation, ensuring optimized resource distribution and responsiveness to scaling requests and offers. Let us define a few important terms for a better understanding:

A Governor is a component that has special access to the platform’s deployments API, allowing it to control the quantity of instances in a deployment. It is responsible for managing the scaling of instances, responding to requests and demands from the platform for scaling, and ensuring optimal allocation of resources. In short, it helps in maintaining and adjusting the deployment size as per the needs, acting as a mediator between the application workloads and the platform they run on.
In the context of a deployment, managed by a Governor, let’s also define the Deployment Shape (or simply, Shape). A shape is a predefined template within a deployment on a platform. It defines the configuration for instances that are spawned within the deployment. Each Shape specifies the setup for the instances, tailoring them to fit specific needs within the application workload. A deployment can contain multiple shapes, allowing for a variety of instance configurations within the same deployment to meet different requirements of the application services. In essence, a Shape outlines how instances in a deployment should be configured and managed.

In this document, we’ll only focus on the Platform specific management API and the Governor facing API.

Rationale:

Modern software frameworks demands solutions that are adaptable and able to meet varied and dynamic needs. As an example, an e-commerce platform will require:

Web Servers: To host the website, handle user requests, and serve content.
Database Servers: To store and manage product information, user data, and transaction records.
Caching Servers: To cache frequent queries and improve data retrieval performance.
Background Workers: To process tasks like order processing, email notifications, and other asynchronous jobs.
Search Servers: To provide fast and accurate search results to the users.
Load Balancers: To distribute incoming network traffic across multiple servers to ensure no single server is overwhelmed with too much traffic.
Security Services: To manage authentication, authorization, and protect against malicious attacks.

Each of these components could have distinct scaling needs, availability requirements, resource consumptions, and operating environments, making the overall workload multifaceted.

The Deployments API aims at managing such a scenario. We’d initially grant a cloud substrate (or platform) like LXD, the autonomy and adaptability to react to these environmental variables and demands.

Then, the Deployments API will be an input for governing softwares, to maximize the resource allocation efficiency, foster scalability, and facilitate the flawless integration of varied applications within the dynamic ecosystems of software deployment.

Specification

We can describe the deployment workflow as following:

An admin would typically create the boundaries of a deployment (the deployment (let’s say it is called dep1 for our example) itself with its shapes, each having different scaling needs and instance templates)
An admin would then create a key / certificate pair that would be added to LXD’s config trust store (lxc config trust add <my-cert.crt> --type=deployments), get the fingerprint of the newly inserted certificate and create the deployment key using this fingerprint (lxc deployment key create dep1 dep1-key1 <fingerprint> --role rw)
Then, a governor program living outisde of LXD can authenticate using TLS with the generated key pair and LXD will then detect its role on the deployment.
A governor living inside a LXD instance communicates through the devlxd vsock with a JWT signature (using the same key pair for the signature) to authenticate itself for the asked deployment.
A webhook endpoint (if specified when creating the deployment) could be used in option in order for the governor to listen for platform specific metrics. The goal is to enable the governor to scale based on resource usage (opportunistic scaling) in order to optimize the system utilization.

Database design

First, we’ll need to introduce new SQL schemas to store the deployment related records, here is a proposal:

The deployments table holds the Deployment records. Each deployment has 0 or more config key/value pairs (stored in the deployments_config table), 0 or more access tokens (stored in the deployments_keys table) and 0 or more deployment shapes (stored in the deployments_shapes table)
- An access token can be useful to restrict the use of a particular deployment through authz and RBAC rules for examples.
A Deployment Shape record hold the scale information (i.e, the scale boundaries. The current scale is simply derived from the number of associated records in deployment_shape_instances) and an instance template of the to-be-spun instances which will be part of each deployment shape (this relation is stored in the deployments_shapes_instances table). This instance_template could be a copy of an existing instance/snapshot configuration but it could also be built using an instance profile. Each shape record also has 0 or more config key/value pairs (stored in the deployments_shapes_config table).
When an instance is created / deleted as part of a deployment shape, the associated LXD controller will ensure that this operation is allowed (i.e, the number of instances after this operation fits within the scale boundaries) before effectively creating / deleting the instance and affecting its database record in deployments_shapes_instances.
Some edge cases to take into consideration:
- An instance that is already part of a deployment shape can also be deleted through the normal DELETE 1.0/instances/{instanceName} endpoint, but the aforementioned scale check will be enforced as well. In case of a scale error, the user must adapt the scale parameter of the shape (decreasing the scaling_minimum value of the shape) containing the instance before deleting it.
- We might want to introduce a force query API parameter (with its CLI equivalent) to forcefully delete a deployment shape : it will then delete all the underlying instances before deleting their parent shape. The same could apply for a deployment : we’ll then need to delete all the instances of all the shapes contained in a deployment before removing the deployment database record (the cascade delete relation will then automatically remove the associated records)

Here are some sequence diagrams of common scenarios that illustrate how the management and the governor facing API will interact with LXD, its database and a to-be-defined Governor application:

[Management] Creating a Deployment

create-deployment

[Management] Creating a Deployment Shape

create-deployment-shape

[Governor] Reconciling the Governor with a desired Deployment

[Management] Creating a Deployment Key to manage access

For the governor facing API, it is important to stress that all the calls to the platform must be authenticated using the deployment key as a governor can be outside the actual lxd cluster/server. If no deployment key exists for a deployment, then a governor won’t be able to communicate with it.

API Struct changes

Deployment related

// DeploymentPost represents the fields required to rename a LXD deployment
type DeploymentPost struct {
	// The name for the deployment
	// Example: myapp
	Name string `json:"name" yaml:"name"`
}

// DeploymentPut represents the modifiable fields of a LXD deployment
type DeploymentPut struct {
	// Description of the deployment
	// Example: My new app and its required services
	Description string `json:"description" yaml:"description"`

	// Deployment configuration map (refer to doc/deployments.md)
	// Example: {"user.mykey": "foo"}
	Config map[string]string `json:"config" yaml:"config"`

	// Governor webhook URL for provider triggered scaling requests
	// Example: https://n.n.n.n/scale
	GovernorWebhookURL string `json:"governor_webhook_url" yaml:"governor_webhook_url"`
}

// Deployment used for displaying a LXD Deployment
	DeploymentPost `json:",inline" yaml:",inline"`
	DeploymentPut  `json:",inline" yaml:",inline"`

	// DeploymentShapes keyed by name
	// Example: k8s-kubelet (map key)
	DeploymentShapes map[string]DeploymentShape `json:"deployment_shapes" yaml:"deployment_shapes"`

	// List of URLs of objects using this deployment
	// Read only: true
	// Example: ["/1.0/instances/c1", "/1.0/instances/c2"]
	UsedBy []string `json:"used_by" yaml:"used_by"`
}

Deployment Key related

// DeploymentKeysPost represents the fields required to create a LXD Deployment key
type DeploymentKeysPost struct {
	DeploymentKeyPost `json:",inline" yaml:",inline"`
	DeploymentKeyPut  `json:",inline" yaml:",inline"`

	CertificateFingerprint string `json:"certificate_fingerprint" yaml:"certificate_fingerprint"`
}

// DeploymentKeyPost represents the fields required to rename a LXD Deployment key
type DeploymentKeyPost struct {
	// The name for the deployment key
	Name string `json:"name" yaml:"name"`
}

// DeploymentKeyPut represents the modifiable fields of a deployment key
type DeploymentKeyPut struct {
	// Description of the deployment key
	// Example: Deployment key for myapp
	Description string `json:"description" yaml:"description"`

	// The role for a deployment key
	// this deployment. Could be either "admin" or "read-only"
	Role string `json:"role" yaml:"role"`
}

Deployment Shape related

// DeploymentShapesPost represents the fields required to create a LXD shape
type DeploymentShapesPost struct {
	DeploymentShapePost `json:",inline" yaml:",inline"`
	DeploymentShapePut  `json:",inline" yaml:",inline"`
}

// DeploymentShapePost represents the fields required to rename a LXD shape
type DeploymentShapePost struct {
	// The name for the shape
	// Example: myapp
	Name string `json:"name" yaml:"name"`
}

// DeploymentShapePut represents the modifiable fields of a shape template
type ShapePut struct {
	// Description of the shape
	// Example: Web servers
	Description string `json:"description" yaml:"description"`

    // Shape configuration map
	// Example: {"user.mykey": "foo"}
	Config map[string]string `json:"config" yaml:"config"`

	// Instance definition to use for instances in this set
	InstanceTemplate InstancesPost `json:"instance_template" yaml:"instance_template"`

	// Maximum allowed size of instance set
	ScalingMaximum int `yaml:"scaling_maximum" json:"scaling_maximum"`

	// Minimum allowed size of instance set
	ScalingMinimum int `yaml:"scaling_minimum" json:"scaling_minimum"`
}

// DeploymentShape represents the fields of a shape template
type DeploymentShape struct {
	DeploymentShapePost `yaml:",inline"`
	DeploymentShapePut  `yaml:",inline"`

	// Current size of instance set
	ScalingCurrent int `yaml:"scaling_current" json:"scaling_current"`
}

Deployment Instance related

// DeploymentInstancesPost represents the fields required to create an instance in an existing LXD deployment shape.
type DeploymentInstancesPost struct {
	// The shape name in which to create the instance
	// Example: k8s-kubelet
	DeploymentShapeName string `json:"shape_name" yaml:"shape_name"`

	// The instance name to use
	// Example: k8s-kubelet01
	InstanceName string `json:"instance_name" yaml:"instance_name"`
}

Management API endpoint changes:

We’ll list the Deployment API REST endpoints. Some of them will be annotated with the MANAGEMENT label (meaning that its use is targeted toward the platform admin) and some others with GOVERNOR (meaning that it will be called from the governor application)

deployments related:

GET /1.0/deployments
- List the deployments (filtering with projects and deployment names available). No recursion URL parameter will return a list of the deployment URLs, recursion=1 will return the list of the deployment API objects.
POST /1.0/deployments
- Add a deployment
DELETE /1.0/deployments/{deploymentName}
- Delete a deployment
GET /1.0/deployments/{deploymentName}
- Get the specific details of a deployment
PUT /1.0/deployments/{deploymentName}
- Update a deployment (the update replace and existing deployment with the new one)
PATCH /1.0/deployments/{deploymentName}
- Update a deployment (the update merge the existing deployment’s config with the new one)
POST /1.0/deployments/{deploymentName}
- Rename a deployment

deployment_keys related:

GET /1.0/deployments/{deploymentName}/keys
- List a deployment’s keys. No recursion URL parameter will return a list of the deployment keys URLs, recursion=1 will return the list of the deployment key API objects.
POST /1.0/deployments/{deploymentName}/keys
- Add a deployment key
DELETE /1.0/deployments/{deploymentName}/keys/{keyName}
- Delete a deployment key
GET /1.0/deployments/{deploymentName}/keys/{keyName}
- Get the specific details of a deployment key
POST /1.0/deployments/{deploymentName}/keys/{keyName}
- Rename a deployment key

deployment_shapes related:

GET /1.0/deployments/{deploymentName}/shapes
- List the shapes for a given deployment. No recursion URL parameter will return a list of the shape URLs for a given deployment, recursion=1 will return the list of the shape API objects for a given deployment.
POST /1.0/deployments/{deploymentName}/shapes
- Creates a new shape within a deployment.
DELETE /1.0/deployments/{deploymentName}/shapes/{shapeName}
- Delete a shape.
GET /1.0/deployments/{deploymentName}/shapes/{shapeName}
- Gets a specific shape.
PUT /1.0/deployments/{deploymentName}/shapes/{shapeName}
- Updates a specific shape.
POST /1.0/deployments/{deploymentName}/shapes/{shapeName}
- Rename a specific shape.

Governor-facing API endpoint changes:

deployment_shape_instances related:

GET /1.0/deployments/{deploymentName}/shapes/{shapeName}/instances
- List instances in a shape. No recursion URL parameter will return a list of the instance URLs, recursion=1 will return the list of the instance API objects.
POST /1.0/deployments/{deploymentName}/shapes/{shapeName}/instances/{instanceName}
- Creates a new instance (with the name instanceName) within this shape. If the shape current size allow a one unit add-up, then the to-be-created instance will be launched using the shape instance_template. Should the new request exceed the maximum number (scaling_maximum), the request will fail.
DELETE /1.0/deployments/{deploymentName}/shapes/{shapeName}/instances/{instanceName}
- Deletes an existing instance (with the name instanceName) within this shape. If the shape current size can be shrinked from one unit, then the instance will be effectively deleted. Should the new request subceed the minimum number (scaling_minimum), the request will fail.
PUT /1.0/deployments/{deploymentName}/shapes/{shapeName}/instances/{instanceName}/state
- Update the state of an instance (start / stop / etc.) within this shape.

CLI Changes:

We will introduce the following CLI changes:

deployment
├─ list [--format]
├─ show [<remote>:]<deployment_name>
├─ get [<remote>:]<deployment_name> <key/property> [--property]
├─ set [<remote>:]<deployment_name> (<key1>=<value1> <key2>=<value2> ...) [--property]
├─ unset [<remote>:]<deployment_name> (<key1/property1> <key2/property2> ...) [--property]
├─ create [<remote>:]<new_deployment_name> 
├─ edit [<remote>:]<deployment_name>
├─ rename [<remote>:]<deployment_name> <new_deployment_name>
├─ delete [<remote>:]<deployment_name>
├─ key
     ├─ list [<remote>:]<deployment_name>
     ├─ show [<remote>:]<deployment_name> <key_name>
     ├─ get [<remote>:]<deployment_name> <key_name> <key/property> [--property]
     ├─ create [<remote>:]<deployment_name> <new_key_name> <certificate_full_fingerprint> [--role]
     ├─ rename [<remote>:]<deployment_name> <key_name> <new_key_name>
     ├─ delete [<remote>:]<deployment_name> <key_name>
├─ shape
     ├─ list [--format]
     ├─ show [<remote>:]<deployment_name> <shape_name>
     ├─ create [<remote>:]<deployment_name> <new_shape_name> [--from-profile <instance_profile>, --from-image <remote:alias>, --scaling-min <min>, --scaling-max <max>, --vm]
     ├─ get [<remote>:]<deployment_name> <shape_name> <key/property> [--property]
     ├─ set [<remote>:]<deployment_name> <shape_name> (<key1>=<value1> <key2>=<value2> ...) [--property]
     ├─ unset [<remote>:]<deployment_name> <shape_name> (<key1/property1> <key2/property2> ...) [--property]
     ├─ edit [<remote>:]<deployment_name> <shape_name>
     ├─ rename [<remote>:]<deployment_name> <shape_name> <new_shape_name>
     ├─ delete [<remote>:]<deployment_name> <shape_name>
     ├─ instance
           ├─ launch [<remote>:]<deployment_name> <shape_name> <instance_name>
           ├─ delete [<remote>:]<deployment_name> <shape_name> <instance_name>
           ├─ list [<remote>:]<deployment_name> <shape_name> [--format]

link to WIP PR: https://github.com/canonical/lxd/pull/12284

tomp · September 26, 2023, 12:33pm

@gabrielmougard please can you also include the governor facing API in this specification too as we discussed in our 1:1 the other day?

gabrielmougard · September 27, 2023, 7:48am

@tomp I tried to distinguish the governor facing and the management API. I added a bunch of diagrams to also ease the reader understanding as there are many communication flows here…

I also had a question regarding the Deployment shape (@dinmusic you can also give a feedback on this):

let’s imagine a scenario where we have some instances deployed within a shape (therefore complying to the instance_template of their shape). Now, we decide to update this shape and more particularly, the instance_template field. What would happen to the deployed instances ? There would clearly be a discrepancy between their configurations and the new instance_template… Should we just forbid the update of the shape instance_template field if there are running instances in the shape or could we do something smarter and somehow “convert” (maybe “rebalance” is a better term, like when affecting a root node in a tree (the template) we need to rebalance the leaf nodes (the instances) accordingly) the running instances to match their new template ?

dinmusic · September 27, 2023, 3:21pm

My understanding of shapes suggests that every instance of a particular shape should be identical. Therefore, when modifications are made to the instance template, they should be either applied to all associated instances or not applied at all. However, applying a “potentially dangerous” change could be done sequentially, to ensure that certain number of (healthy) instances is always running.

tomp · September 27, 2023, 3:36pm

Yes, either that or we dont apply the changes to existing instances, and only use them for new instances of that shape. That would allow the governor to recycle old instances out and rebuild using the newly modified shape.

This would potentially allow individual instances to be modified after creation to diverge from its initial shape.

Also, because the shape’s InstanceTemplate is made up of a InstancesPost it may have profile(s) from the project of which changes to them will be applied immediately (the same as if it was any other instance).

I think for now at least modifications to the shape should not be applied to existing instances of that shape. This way we can still support modifying individual instances if needed.

gabrielmougard · September 28, 2023, 10:23am

@tomp I was working on the Deployment Key and I realized that the concept of role (I guess either admin (create, delete update, get, list) or read-only (get, list)) make sense at the deployment shape level because that’s where the instances will live and that’s where the scale logic is. Plus, I suppose these keys will be used by the governors (which can do only instance level operations within a shape anyway). So, instead of

type DeploymentKeyPut struct {
	// Description of the deployment key
	Description string `json:"description" yaml:"description"`

	// Role can be "control" or empty string
	Role string `json:"role" yaml:"role"`

	// Access key
	AccessKey string `json:"access_key" yaml:"access_key"`
}

Could we have the following ?

type DeploymentKeyPut struct {
	// Description of the deployment key
	Description string `json:"description" yaml:"description"`

	// The roles for a shape (keyed by shape name) within
	// this deployment. Could be either "admin" or "read-only"
	// Example: {"shape1": "admin", "shape2": "read-only}
	Roles map[string]string `json:"roles" yaml:"roles"`

	// Access key
	AccessKey string `json:"access_key" yaml:"access_key"`
}

It’d give us more granularity per deployment doing so. But then, what happen if the Roles map is empty ? Shall we just return an error or set a default read-only role for all the shapes within that deployment ?

tomp · September 28, 2023, 10:29am

Aside from granularity, what is the use case for this you’re thinking of?

gabrielmougard · September 28, 2023, 10:32am

@tomp These keys will be given to governors so that we can regulate their accesses. A governor is only operating on instances (scale up / down within a shape and eventually getting informations about the instances within the shapes of its deployment) and instances are shape related not deployment related. I don’t know if this makes sense…

The use case I’m thinking of is the following:

An admin creates a deployment (dep1) and 3 shapes within this deployment (dep1/shape1, dep1/shape2, dep1/shape3) with each having different needs (instance template, etc.)
This admin is maybe unsure that a certain shape (let’s say dep1/shape3) will behave correctly for an end user (unsure about the template config or just don’t want the governor to manage the instance scaling within it but still would like to fetch informations from it) but would like to experiment the scaling on this particular shape himself.
Then he could create two deployment keys: one for the end user (dep1/key-end-user), and one for him (dep1/key-experiment) that have the following respective properties:

{
   "name":"key-end-user",
   "roles": {"shape1": "admin", "shape2": "admin", "shape3": "read-only"}
}

and

{
   "name":"key-experiment",
   "roles": {"shape1": "admin", "shape2": "admin", "shape3": "admin"}
}

Then they can share the same governor that target the same deployment but manage instances within shape3 differently.

tomp · September 28, 2023, 11:06am

Hrm, I’m not sure this complexity is necessary. There should only be one active governor per deployment. So if that governor is given a key with admin permissions it should be able to manage instances for all shapes.

If you want to test scaling for a particular shape you could create a test deployment that is isolated from the production deployment, which would be safer anyway.

Also, if you create new shapes in a deployment you then have to remember to also update the keys’ shape roles too.

If we need something like that in the future we could extenx the keys struct to have a Shapes []string field that restricts the key’s access to particular shapes.

yosu-cadilla · September 30, 2023, 2:34pm

May I suggest to start defining the logical and philosophical reasons why and objectives for this API.
Having that outline not only would help inspire some more people but also make it simpler to make decisions going forward.

tomp · October 4, 2023, 7:18am

@gabrielmougard Please can you split the governor-facing API route definitions from the management API route definitions (into different sub-sections) to make it easier for the reader. I think this will assist people who are looking to implement just the governor-facing API in their products (even if in the case of LXD these API routes on the external side will be merged together).

Also as you make changes to the proof-of-concept implementation please can you keep this specification in sync so it reflects what is implemented.

Cheers

hifron · October 17, 2023, 6:33pm

Typical approach should be that some setup requires some workload with some backup workload already running, but scalable on another possibilities and with admin notifications and metrics and maybe hard limitations, but without much configs which depends on services like Canonical MetalAsAService maas.io. But that also mean a lot of costs and some prewarp are maybe necessary when cold start is not so a option, but warp speed also not accessible and then cold start should be from consumer point a problem in the network and not problem in the costs for such service enterprise…

If with MicroStack there are some services running, its also problem of such services and should not be accessible outside such service endpoint even not to admin requests if not applicable…

There are a lot of interest in this area and I am not expert on this, but looking at long standing projects like camel.apache.org working also with a lot of things look promising in some way…