Storage issues after recover

victoitor · December 4, 2023, 12:36am

Once you recover a lxd storage, it is set up with the path you entered on the restore step. In this case, I entered the path as /dev/sdb1. On a reboot the storage now shows up as /dev/sdc1 and I can’t access any of my files or change the storage source. How can I fix this?

$ lxc storage show usb-disk1
config:
  source: /dev/sdb1
description: ""
name: usb-disk1
driver: btrfs
used_by:
- /1.0/storage-pools/usb-disk1/volumes/custom/media
status: Created
locations:
- none
$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0  73,9M  1 loop /snap/core22/864
loop1    7:1    0 152,1M  1 loop /snap/lxd/26200
loop2    7:2    0 105,8M  1 loop /snap/core/16202
sda      8:0    0 119,2G  0 disk 
├─sda1   8:1    0   953M  0 part /boot/efi
└─sda2   8:2    0 118,3G  0 part /var/log
                                 /tmp
                                 /opt
                                 /root
                                 /home
                                 /
sdc      8:32   0 931,5G  0 disk 
└─sdc1   8:33   0 931,5G  0 part 
zram0  253:0    0   3,7G  0 disk [SWAP]

jpelizaeus · December 4, 2023, 10:24am

Hi @victoitor, I found this post from last year that explains the steps you have to take to manually move instances and volumes or edit the database directly. But be careful since you can easily corrupt your installation of LXD:

that https://discuss.linuxcontainers.org/t/how-to-set-a-storage-pool-to-pending-state/14327/5

tomp · December 4, 2023, 10:48am

I’d be interested to understand what the scenario was that meant that the source of the pool changed.

And did this occur before the recover and afterwards, or are you saying the pool was recovered to a configuration different than you requested?

Aleks · December 4, 2023, 11:40am

this is a very common thing when using external USB drives, they appear as more or less random /dev/sdX devices, depending on the order they were plugged in or dropped off the bus due to power issues, etc. The solution is never to use /dev/sdX but the symlinks in /dev/disk/by-* directories, for example

# ls -al /dev/disk/by-id/ata-ST4000VN006-3CW104_ZW601QEZ

lrwxrwxrwx 1 root root 9 Sep 18 15:39 /dev/disk/by-id/ata-ST4000VN006-3CW104_ZW601QEZ -> ../../sdb

this is guaranteed to always be the same for the specific physical disk as the serial numbers do not change, ever, at least in theory.

the same applies when creating zfs pools, never use /dev/sdX devices directly.

victoitor · December 4, 2023, 12:54pm

It is just as Aleks mentioned, it is an external usb drive so the drive device is inconsistent. In particular, I think it was disconnected and then reconnected while running, so it got a new label.

I created the storage on one machine and moved the drive to another machine and used lxd recover to get the storage volume back. The main issue I see (and thought was fixed by now) is that when you first set up a storage device on lxd, you can point to it with a /dev/sdX device and it will be registered on lxd with /dev/disk/by-*. When you recover a drive and point to it with /dev/sdX, the source is registered as entered, and not similar to when you first create the storage in lxd. I think this difference in behaviour between creating the storage and recovering it should be looked into as it feels like a bug.

tomp · December 4, 2023, 1:14pm

If you have recreator steps for that issue please can you log it here https://github.com/canonical/lxd/issues

victoitor · December 5, 2023, 2:37am

I reproduced the entire issue using lxd vms so it’s quite simple to reproduce and it can be found here.

The situation I’m in right now is that the storage is configured incorrectly because of the behaviour I describe in this issue. Whenever the drive name in /dev/sdX changes, I lose access to the storage and need to reboot the system and hope it gets the correct name. It should also be noted, there is a lot of content in this storage and I have no other storage big enough to move it to. Furthermore, it’s insane to think the only possible solution for this bug is to move the entire content of a storage drive to another one (which again, can be huge). There should be a solution to fix this misconfiguration without having to move massive amounts of data. Something is definitely odd with the lxd recovery tool.

victoitor · December 5, 2023, 3:23am

Since I had created all the test environment for reporting the bug, I actually found a workaround for me which I could test safely on this environment.

When the storage is unavailable, I can delete the volume and then the storage. Here I disconnected the disk on the real server to make sure that deleting the volume and the storage wouldn’t actually be made and my data would be “safe”. Reconnected the disk and did the recover process again and this time I identified the disk through its UUID. Quite a (dangerous) workaround, but at least it’s fixed with the correct configuration. Hope this bug is fixed since it’s quite nasty for those that have to go through it, like me.