LXD Cluster: Permission denied when launching VMs on a dir storage pool hosted on NFS share

Problem:
On an LXD cluster, I’m getting permission denied when launching VMs on a dir storage pool hosted on NFS share.

LXD Versions: 5.0/stable, 5.21/stable, Ubuntu 22.04

Repro steps:

  • Mount an NFS share on /lxd on each of the 4 nodes
sudo mount -t nfs -o  rw,nosuid,noacl,nodev,tcp,intr,hard,rsize=1048576,wsize=1048576 <host>:<share> /lxd
  • Create a separate directory in the NFS share for each node
mkdir -p /lxd/240
mkdir -p /lxd/241
mkdir -p /lxd/242
mkdir -p /lxd/243
  • Create LXD dir pool against /lxd
lxc storage create nfspool dir --target <node1>
lxc storage create nfspool dir --target <node2>
lxc storage create nfspool dir --target <node3>

lxc storage set nfspool source=/lxd/240 --target <node1>
lxc storage set nfspool source=/lxd/241 --target <node2>
lxc storage set nfspool source=/lxd/242 --target <node3>
  • Push the pool from PENDING to CREATED state
lxc storage create nfspool dir
  • We now see LXD has created the pool directory structure
sudo find /lxd/
/lxd/
/lxd/240
/lxd/240/buckets
/lxd/240/custom
/lxd/240/custom-snapshots
/lxd/240/images
/lxd/240/containers
[continued]
  • Verify pool state
lxc storage ls
+---------+--------+-------------+---------+---------+
|  NAME   | DRIVER | DESCRIPTION | USED BY |  STATE  |
+---------+--------+-------------+---------+---------+
| local   | lvm    |             | 22      | CREATED |
+---------+--------+-------------+---------+---------+
| nfspool | dir    |             | 0       | CREATED |
+---------+--------+-------------+---------+---------+
  • Try to launch an instance using this dir/nfs pool – fails
$ lxc launch  --vm testvm -s nfspool
Creating testvm
Error: Failed instance creation: Failed creating instance from image: Failed converting image to raw at "/var/snap/lxd/common/lxd/storage-pools/nfspool/virtual-machines/testvm/root.img": Failed to run: nice -n19 qemu-img convert -f qcow2 -O raw -T none /var/snap/lxd/common/lxd/images/ce0fb8befe9bd3c653925163e9a5971db96c3e892502fe4e98cbb963b33310c2.rootfs /var/snap/lxd/common/lxd/storage-pools/nfspool/virtual-machines/testvm/root.img: exit status 1 (qemu-img: /var/snap/lxd/common/lxd/storage-pools/nfspool/virtual-machines/testvm/root.img: error while converting raw: Could not create '/var/snap/lxd/common/lxd/storage-pools/nfspool/virtual-machines/testvm/root.img': Permission denied)

The strange part is LXD does create directories on the NFS share during initial pool creation,
so it seems to have some write access to the shares, but despite this, launching of VMs fails.

@Dag Would you be able to check if there are AppArmor accompanying those ‘Permission denied’ errors?

root@lxd0:~# date
Fri May 17 16:51:42 UTC 2024
  • Attempt VM creation specifically on node lxd0 to force logging locally
# lxc launch ubuntu:jammy --vm testvm -s nfspool --target lxd0
Creating testvm
Error: Failed instance creation: Failed creating instance from image: Error opening directory: open /var/snap/lxd/common/lxd/storage-pools/nfspool/virtual-machines/testvm: permission denied
  • Last few lines of dmesg are unrelated and prior to timestamp of test above
[Fri May 17 15:34:07 2024] audit: type=1400 audit(1715960048.452:115): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdfan0_</var/snap/lxd/common/lxd>" pid=2388 comm="apparmor_parser"
[Fri May 17 15:34:07 2024] audit: type=1400 audit(1715960048.456:116): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_forkdns-lxdfan0_</var/snap/lxd/common/lxd>" pid=2390 comm="apparmor_parser"
[Fri May 17 16:47:03 2024] FS-Cache: Loaded
[Fri May 17 16:47:03 2024] FS-Cache: Netfs 'nfs' registered for caching
[Fri May 17 16:47:03 2024] nfs: Deprecated parameter 'intr'

I didn’t see any apparmor or audit related entries in journald either

After some digging I think I may have gotten to the cause of the issue. Looking deeper into the apparmor denials, it seems this module is treating local writes to this NFS share as remote socket connections. This apparently shouldn’t happen and was addressed with an update to apparmor and modification to the kernel per the last set of bug notes here:

So with this in mind, I installed the hwe kernel and rebooted as 6.x kernel versions were noted to have this fix. So on 22.04, I did:

sudo apt update && sudo apt install linux-generic-hwe-22.04
reboot

After which, I was now able to launch VMs from the NFS pool

2 Likes

@Dag ah thank you very much, I didn’t know the situation improved/was fixed with the 6.x kernel. Good to know!