Fail instance creation due to fingerprint issue (LXD 5.0.3)

Well yes and no.

The fix we landed to support the Linux containers new metadata for incus was causing problems for multi format ubuntu remotes.

This is why I reverted it yesterday.

But now this seems to be causing problems with the Linux containers remote.

I’ll see if that remote is still using duplicate metadata keys (for lxd and incus) and if I’ll drop incus support from 5.0 LTS series so the duplicate metadata entries don’t cause this issue.

I think as it stands the fix rolling out is causing this issue, but fixing another one.

1 Like

@tomp :pray: Thank you so much. We have customers running production apps on LXD and we don’t want things to break. Your work is much appreciated.

So i ran snap refresh on my test clusters and it’s basically broken now, with the new revision.

I can’t even boot an instance from the API. I hope this can be fixed soon, as this would basically cause us a lot of problems. Like our business would fail, because we rely on booting up containers.

This would be catastrophic for us.

@tomp we’re also getting our own image server up and running, so problems like this can be solved from our side in the future.

However given that we need time. We would greatly appreciate it if this problem is fixed for the time being.

Can you confirm its just the images: remote that is not working right?

Is ubuntu: working OK?

Have you tried doing a snap list lxd --all and then doing sudo snap revert lxd --revision {revision}

If you switch back to LXD 5.0.3 previous revision you should be able to launch from images: again until we figure this one out.

Yes

images: is broken
ubuntu: is working

I’ll try and revert the revision

1 Like

This should fix it

https://github.com/canonical/lxd/pull/12847

Will roll it out ASAP.

1 Like

We use automated clustering setup, so while doing manual revert can work on our own setup. It won’t work for customer’s setup who rely on automation.

Thank you so much! Really appreciate the prompt response to fixing this issue.

Yes I understand, sorry about this.

So this saga started about 5 months ago when we were informed that the images.linuxcontainers.org image server was going to drop support for lxd.tar.gz metadata file and instead start serving up an incus.tar.gz file in a very similar format that LXD could still consume. We were told it wasn’t going to happen immediately so it gave us time to land support for this in https://github.com/canonical/lxd/pull/12260 before it was removed.

The fix worked well for LXD consuming from images.linuxcontainers.org but started to cause problems with LXD consuming from other Canonical remotes that offered up both combined and uncombined image formats. This wasn’t observed on the ubuntu: remote but on an internal remote used for snap core image builds.

So with the release of LXD 5.0.3 people started noticing this problem and it was fixed in https://github.com/canonical/lxd/pull/12834 (it had been in LXD 5.18 onwards but wasn’t noticed).

However as we have now seen this has reintroduced the issue with consuming images: remote because its still serving up both LXD and Incus variants of the metadata file.

Since then we’ve learned that LXD users will be blocked from consuming the images.linuxcontainers.org server, and as of now LXD 5.20 cannot consume the images: remote anymore and so it has been removed entirely (along with support for Incus metadata files).

But as LXD 5.0 LTS can still consume the images: remote for a few more months the Incus fix was left in in case the images.linuxcontainers.org had (or was going to) stop providing the LXD metadata file.

But as it has not, ill revert that change too and restore it to just checking for the LXD metadata file (https://github.com/canonical/lxd/pull/12847).

Then from 2024/05/01 all versions of LXD will lose access to images.linuxcontainers.org entirely as per https://discuss.linuxcontainers.org/t/important-notice-for-lxd-users-image-server/18479

2 Likes

Thank you @tomp thank you for the rundown on the whole situation.

We’ve been up to date on the situation with images.linuxcontainers.org this is one reason why we’re also working on our own image server so we can be in control of our own destiny.

We also hope to make our image server customizable and available to the community. We understand that losing access to the image server has already started causing problems. I guess this thread is just one of those issues.

We look forward to being a part of the solution for the LXD / incus community.

2 Likes

The fix is building to go into 5.0/candidate now:

https://launchpad.net/~canonical-lxd/+snap/lxd-5.0-candidate

We can then test it and if happy rollout to 5.0/stable.

1 Like

How would I go about testing this?

I have a few test clusters i can try this with.

Once its in 5.0/candidate for amd64 ill let you know, then you can do snap refresh lxd --channel=5.0/candidate on each cluster member in your test cluster.

It will refresh on to revision 5.0.3-ffb17cf which includes the fix.

1 Like

5.0/candidate for amd64 now has 5.0.3-ffb17cf and is working for me:

snap refresh lxd --channel=5.0/stable --cohort="+"
2024-02-09T09:41:53Z INFO Waiting for "snap.lxd.daemon.service" to stop.
lxd (5.0/stable) 5.0.3-9b310f5 from Canonical✓ refreshed

lxc image list
+-------+-------------+--------+-------------+--------------+------+------+-------------+
| ALIAS | FINGERPRINT | PUBLIC | DESCRIPTION | ARCHITECTURE | TYPE | SIZE | UPLOAD DATE |
+-------+-------------+--------+-------------+--------------+------+------+-------------+

lxc launch images:ubuntu/jammy c1
Creating c1
Error: Failed instance creation: Failed getting remote image info: Failed getting image: More than one match for the provided partial fingerprint "1d5a5e1420abf1c7ec25a56a6c1d645bd4456d1dd0b19ba92eadd5fb62b4e1e8"

snap refresh lxd --channel=5.0/candidate --cohort="+"
2024-02-09T09:42:25Z INFO Waiting for "snap.lxd.daemon.service" to stop.
lxd (5.0/candidate) 5.0.3-ffb17cf from Canonical✓ refreshed
lxc launch images:ubuntu/jammy c1
Creating c1
Starting c1                                   

I just tested it, by running snap refresh lxd --channel=5.0/candidate --cohort="+"

Everything seems to be working! Thank you @tomp

Also wanted to ask what --cohort="+" does.

@tomp so Canonical refused to cooperate on images.linuxcontainers.org and also they refused to make alternative remote, so they force us to use only Ubuntu images with LXD?

Or is there another easy way to use Debian and other images? If not, what are my options?

Thank you for any advice.

The cohort=+ should be used with clusters so that when snap store is doing a phased rollout of a snap revision that all cluster members receive the same revision.

https://documentation.ubuntu.com/lxd/en/latest/howto/snap/#keep-cluster-members-in-sync

Great ill start the stable rollout.

1 Like

On the contrary we were planning to work on helping to maintain the recipes that generated images on images.linuxcontainers.org but ultimately the license change is what triggered the block for LXD users.

We are planning to introduce a new image server for non-Ubuntu images but as you can imagine this takes time to get the infrastructure and systems setup, please see An update on the licence change and community image server

Working on 5.0/stable now:

snap install lxd --channel=5.0/stable --cohort=+
lxd (5.0/stable) 5.0.3-ffb17cf from Canonical✓ installed
lxd init --auto
lxc launch images:ubuntu/jammy c1
Creating c1
Starting c1                                   
2 Likes