Recreating container export archive structure without exporting data - a.k.a export index.yaml

pmarini · July 15, 2024, 9:15pm

I’m setting up a backup script for LXD containers with a fixed volumes structure using Proxmox Backup Server (PBS).

In order to leverage PBS deduplication capabilities, the idea is to give the proxmox-backup-client command line an exploded folder, not an archived one.

So the flow is the following (for the container and all its volumes):

make a snapshot
mount the snapshot
send the content below the mount point to the Proxmox server
local cleanup

At restore time, I’d create a tar out of the imported content in the target server to give to lxc import. However my archive is missing the file index.yaml and I don’t know how to generate it (lxc config show -e is not giving the same output). Basically I’m in the same situation as this user.

Is it possible to create index.yaml with a (combination of) command(s)? The other option is to do an export, untar the archive and then go for step 3. above. However, this is not feasible for the case in which the volume is used to store documents and can grow to large sizes (up to hundreds of GiB).

I’m using ZFS as storage backend.

pedro-rib · July 16, 2024, 3:18am

Unfortunately we don’t support a straight forward way of generating an index.yaml for an instance. For reference, this is the logic behind the index.yaml generation.

But with some effort and using yq you could build an index.yaml with the information provided by:

lxc config show instName --expanded for the config.container field
lxc storage show poolName for the config.pool field
lxc storage volume show default container/instName for the config.volume field
lxc profile show profileName # For each profile added to the container for the config.profiles field
The other fields can be hardcoded or derived from these commands’ outputs

Hope this helps. If we start supporting a simpler way of building an index.yaml I will notify here.

pmarini · July 16, 2024, 7:26am

Hello @pedro-rib, thanks a lot for the quick reply!

It would be great to have it, thinking about a flag for the export command, something like --metadata-only.

And regarding the suggested workaround, I did some tests and here are the results:

first 6 lines must be hardcoded, as you mention.
config.container is replicated with lxc config show instName up to config.container.config.description
created_at must be hardcoded
config.container.expanded_config is produced with lxc config show instName --expanded, adding expanded_ prefix manually where needed. However it produces some fields (ephemeral, profiles, stateful, description) that are not present in index.yaml and misses some fields (name, status, status_code, last_used_at, location, type, project) that are present in in index.yaml. I guess these must be hardcoded as well.
config.volume, config.pool and config.profiles can be replicated as you mention with the respective show command. Only caveat is that the used_by field is empty in index.yaml while it’s not in the output of show. Is there a way to remove it from the output or it’s done by manipulating the content with yq?

pedro-rib · July 16, 2024, 12:20pm

Thanks for the suggestion on the --metadata-only flag.
As for the created_at field, I believe you could reuse the same field from lxc storage volume show default container/instName. The status field can be derived from the volatile.last_state.power field in lxc config show instName --expanded (the status_code is 102 if the instance is stopped and 103 if it is running).
Now for the actual question, I would probably just remove it with yq, something like echo "$with_usedby" | yq eval ".config.volume.usedby = []" -

pmarini · August 27, 2024, 6:45pm

For those who would face the same issue may be it could be useful to check the Python script that I created to form a valid index.yaml file:

Create a script called create_index_yaml.py:

##########################
# This script allows you to generate a valid
# index.yaml file to be able to import an LXD instance 
# in another server
# See discussion in:
# https://discourse.ubuntu.com/t/recreating-container-export-archive-structure-without-exporting-data-a-k-a-export-index-yaml/46430/2
##########################
import yaml

import subprocess

import sys

instance_name 			= sys.argv[1]

instance_pool 			= sys.argv[2]

instance_project 		= sys.argv[3]


################
## Volume section
################
show_volume_instruction = "lxc storage volume show %s container/%s --project %s" % (instance_pool,instance_name,instance_project)

show_volume_output = yaml.safe_load(subprocess.check_output(show_volume_instruction.split()))

show_volume_output["used_by"] = []

# The field created_at as written by the command gives a formatting error at import time.
# The usefulness of this field is not clear at this point, so decided to remove it. If
# it turns out that it is an important field, that it should be formatted correctly as other
# timestamps.
show_volume_output.pop("created_at")

##############
## Container config section
##############
show_config_instruction = "lxc config show %s --project %s" % (instance_name, instance_project)

show_config_output = yaml.safe_load(subprocess.check_output(show_config_instruction.split()))

show_config_expanded_instruction = "lxc config show %s --project %s  --expanded" % (instance_name, instance_project)

show_config_expanded_output = yaml.safe_load(subprocess.check_output(show_config_expanded_instruction.split()))

show_pool_instruction = "lxc storage show %s" % (instance_pool)

show_pool_output = yaml.safe_load(subprocess.check_output(show_pool_instruction.split()))

show_pool_output["used_by"] = []


##############
## Profiles section
##############
profiles = []

for profile in show_config_output["profiles"]:
	
	show_profile_instruction = "lxc profile show %s" % (profile)
	
	show_profile_output = yaml.safe_load(subprocess.check_output(show_profile_instruction.split()))
	
	show_profile_output["used_by"] = []
	
	profiles.append(show_profile_output)
	


#################
## Form the file
#################
index_out = {}

index_out["name"] = instance_name

index_out["backend"] = show_pool_output["driver"]

index_out["pool"] = instance_pool

index_out["optimized"] 	= False

index_out["header"] 	= False

index_out["config"] 	= {}

index_out["config"]["container"] = show_config_output

index_out["config"]["container"]["name"] = instance_name

index_out["config"]["container"]["expanded_config"] = show_config_expanded_output["config"]

index_out["config"]["container"]["expanded_devices"] = show_config_expanded_output["devices"]

index_out["config"]["pool"] = show_pool_output

index_out["config"]["profiles"] = profiles

index_out["config"]["volume"] = show_volume_output

print(yaml.dump(index_out))

Call it from the shell:

INSTANCE_NAME=mycontainer

INSTANCE_POOL_NAME=mypool

INSTANCE_PROJECT=myproject

python create_index_yaml.py ${INSTANCE_NAME} ${INSTANCE_POOL_NAME} ${INSTANCE_PROJECT} > index.yaml