Consider changing default storage driver

Hello! I was hoping to maybe just start some discussion around the default storage driver that is presented when one does lxd init out of the gate, which is zfs. I assume that this choice was done because it’s generally “optimized” and doesn’t require extra hardware or configuration to setup, which I can understand. However, in my (and others’) testing, this storage driver is slow when one runs their first lxc command after (re)booting a system - sometimes taking as many as 10+ seconds to get command output. After that, however, it’s fine. Since this behavior was similar to how snaps used to be very slow to start initially, it was just being chalked up to that. That said, that issue with snaps has been worked on pretty extensively [link][link] (not with lxd specifically, but more generally), so after some testing with other storage drivers, we found that zfs was actually what was causing lxd to be so slow initially.

Since this tends to give off a bad impression of both snaps and lxd in general for both newcomers and regular users (reinforcing notions of poor performance), I’m wondering if it might be worth considering to set the default storage driver to dir? I understand that it’s pretty basic when compared to other drivers, but I might argue that’s sort of the point of a quickstart setup - low bar to entry, minimal fuss. Indeed, if one runs out of space on their default zfs pool (which is pretty small), they’re required to research how to expand it, whereas with dir it’s directly related to their partitions available disk space - easy to understand, easy to relate to their system, easy to interact with. lxd is quick out of the gate with dir, shedding the impression that either lxd or the snap ecosystem is to blame for that initial slowness. If folks need more advanced storage usage, they’ll be reading the docs anyway and can choose the driver that best suites their needs.

Thoughts?

1 Like

On my desktop system (ubuntu 22.04) I have two 2TB NVME SSD.

Both were formatted w BTRFS.
Ubuntu is installed on NVME1 which is where the LXD SNAP is installed.
I have a 3rd 2TB SSD that I use for the “default” LXD Storage pool

That 3rd SSD I used the Disks utility to format it using the “No Filesystem” option

When I do: $ sudo lxd init
and when it gets to the Storage question “my” default is set to BTRFS by “lxd init”.

NOTE - I did not have to select BTRFS.

The “lxd init” then format’s the “default pool” to BTRFS.

So ZFS is not always the automatic “default”

Just pick BTRFS for your LXD Init Storage Driver… or can pick what you want to use out of any of the listed storage drivers.

1 Like

You’re correct, ZFS is not always the default, but the defaulting to btrfs only occurs when the storage pool is already on a btrfs filesystem. The issue there is that btrfs is not the default recommendation for a filesystem when installing Ubuntu. Our guided partitioning tool uses ext4, and starting in 24.04 the guided partitioning tool will start offering root on ZFS again.

This is an issue that going to affect users who use the guided partitioning tool in the installer, which could be a significant number of user. By defaulting to zfs backed storage on non-btrfs filesystems, users might get a mistaken impression of the performance of LXD and thus snap. It’s worth considering the opinions of users who may not necessarily wander from the defaults provided by the guided install process.

We can say “We’ve fixed many of snap’s issues and startup speed is no longer an issue”, but as it is currently LXD sets the default storage backend to ZFS on systems where the user opted for the default guided partition layout. As a result those users end up getting the impression that we are just ignoring issues that they are still seeing in practice. Even though the cause isn’t snap itself, it still may be contributing to the poor reception of snap by the community at large.

Hi,

Please can you share your test methodology and results?

It’d be great to have some reproducer steps and examples of what you’re seeing here.
Is this in a server environment with instances starting on boot or on the desktop?

So that we can isolate why LXD start up time is impacted by using ZFS and potentially improve it.

As for changing the defaults, the lxd init --auto is already using dir by default to get up and running quickly for basic/failsafe setups. There’s also the --storage-backend option which can be used to change to a different storage driver when using --auto.

The interactive lxd init on the other hand will try and offer the best (most well featured) storage driver available on the system, but it is the user’s choice which one they pick.

Its interesting you’ve picked up on the point around user’s speed experiences, because although defaulting to dir may improve LXD’s start up times (hopefully we can do this for zfs too) it would then have a knock on effect for all storage volume operations they go on to perform in LXD (such as creating, snapshots, copying, moving, migration etc).

So we may be risking shifting the speed issues to a different part of the experience rather than improving them overall.

Indeed, if one runs out of space on their default zfs pool (which is pretty small), they’re required to research how to expand it, whereas with dir it’s directly related to their partitions available disk space - easy to understand, easy to relate to their system, easy to interact with.

This I agree is a potential problem. My understanding (and it could be wrong) was that the reason lxd init was originally proposing using the more feature rich storage driver with a loop device was to provide a quick way to get up and running with LXD whilst still demonstrating it’s rich storage features.

However I agree that by automatically picking a size of the underlying loop device based on the system’s free space it does mean that users may have to learn how to grow their loop backed storage pool sooner or later (this by the way is now supported using lxc storage set <pool> size=).

I suspect that if we did change the default to be dir (or btrfs on relevant systems as today) then it would need to be combined with changes to lxd init to better guide the user on the choices they are making (i.e using zfs with a dedicated block device is often preferable to using dir - even assuming the start up times can’t be improved). Perhaps by first asking if the user has a dedicated block device to use for storage - and then changing the driver order based on the answer to that question.

I don’t believe that just changing the default to dir would be the right thing to do for all circumstances.

@mionaalex do you have any thoughts around this topic?

Hi, sure! Nothing too crazily scientific - on my laptop running lxd with zfs storage, after first boot I just time the first lxc command I run:

:~$ time lxc list
...<snip>...
real	0m10.319s
user	0m0.058s
sys	0m0.923s

Subsequent commands run quickly. I tested this on another VM, and a few colleagues confirmed the same behavior. Switching my test VM’s storage driver to dir did not produce the same initial-command delay. I also have another laptop with lvm with the storage backend, and that too does not experience the delay.

Please could you try with a fresh LXD install with a ZFS pool but no instances created on it.

Then repeat your lxc list test - this will help to confirm if its a start up issue (i.e mounting the pool) or whether its an issue in getting info about the instances themselves in the list.

Ta

I initially suspected that zpool import’ing was the part taking time but manual testing shows it didn’t.

I just did a quick test using LXD with / without a zpool and the time to an empty lxc ls is the same.

In both tests, I used empty zpools (freshly created ones).

1 Like

Yeah I think we should be more clear on what the cause of the slow down is before we take any drastic action.

Please can a Github issue be opened about this.

Thanks