Failed to start dqlite server: entries count in preamble is zero

Hello,

I was posting about recovering my data in another thread, but I’ve now reverted to the snap snapshot of my data from before I uninstalled and reinstalled the snap. Perhaps it will be easier to fix my issues here than recover my data.

$ lxc list
Error: LXD unix socket "/var/snap/lxd/common/lxd/unix.socket" not found: Please check LXD is running
$ sudo tail -20 /var/snap/lxd/common/lxd/logs/lxd.log
time="2024-04-30T22:15:39+01:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
time="2024-04-30T22:15:39+01:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
time="2024-04-30T22:15:40+01:00" level=error msg="Failed to start the daemon" err="Failed to start dqlite server: raft_start(): io: load closed segment 0000000000295980-0000000000295980: entries batch 23 starting at byte 251936: entries count in preamble is zero"
$ snap info lxd
name:      lxd
summary:   LXD - container and VM manager
publisher: Canonical✓
store-url: https://snapcraft.io/lxd
contact:   https://github.com/canonical/lxd/issues
license:   AGPL-3.0
description: |
  LXD is a system container and virtual machine manager.
  
  It offers a simple CLI and REST API to manage local or remote instances,
  uses an image based workflow and support for a variety of advanced features.
  
  Images are available for all Ubuntu releases and architectures as well
  as for a wide number of other Linux distributions. Existing
  integrations with many deployment and operation tools, makes it work
  just like a public cloud, except everything is under your control.
  
  LXD containers are lightweight, secure by default and a great
  alternative to virtual machines when running Linux on Linux.
  
  LXD virtual machines are modern and secure, using UEFI and secure-boot
  by default and a great choice when a different kernel or operating
  system is needed.
  
  With clustering, up to 50 LXD servers can be easily joined and managed
  together with the same tools and APIs and without needing any external
  dependencies.
  
  
  Supported configuration options for the snap (snap set lxd [<key>=<value>...]):
  
    - ceph.builtin: Use snap-specific Ceph configuration [default=false]
    - ceph.external: Use the system's ceph tools (ignores ceph.builtin) [default=false]
    - criu.enable: Enable experimental live-migration support [default=false]
    - daemon.debug: Increase logging to debug level [default=false]
    - daemon.group: Set group of users that have full control over LXD [default=lxd]
    - daemon.user.group: Set group of users that have restricted LXD access [default=lxd]
    - daemon.preseed: Pass a YAML configuration to `lxd init` on initial start
    - daemon.syslog: Send LXD log events to syslog [default=false]
    - daemon.verbose: Increase logging to verbose level [default=false]
    - lvm.external: Use the system's LVM tools [default=false]
    - lxcfs.pidfd: Start per-container process tracking [default=false]
    - lxcfs.loadavg: Start tracking per-container load average [default=false]
    - lxcfs.cfs: Consider CPU shares for CPU usage [default=false]
    - lxcfs.debug: Increase logging to debug level [default=false]
    - openvswitch.builtin: Run a snap-specific OVS daemon [default=false]
    - openvswitch.external: Use the system's OVS tools (ignores openvswitch.builtin) [default=false]
    - ovn.builtin: Use snap-specific OVN configuration [default=false]
    - ui.enable: Enable the web interface [default=false]
  
  For system-wide configuration of the CLI, place your configuration in
  /var/snap/lxd/common/global-conf/ (config.yml and servercerts)
commands:
  - lxd.buginfo
  - lxd.check-kernel
  - lxd.lxc
  - lxd
services:
  lxd.activate:    oneshot, enabled, inactive
  lxd.daemon:      simple, enabled, active
  lxd.user-daemon: simple, enabled, inactive
snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     5.21/stable
refresh-date: yesterday at 22:16 BST
channels:
  5.21/stable:      5.21.1-d46c406 2024-04-29 (28460) 108MB -
  5.21/candidate:   5.21.1-d46c406 2024-04-26 (28460) 108MB -
  5.21/beta:        ↑                                       
  5.21/edge:        git-f1fea03    2024-04-29 (28503) 108MB -
  latest/stable:    5.21.1-2d13beb 2024-04-30 (28463) 107MB -
  latest/candidate: 5.21.1-2d13beb 2024-04-26 (28463) 107MB -
  latest/beta:      ↑                                       
  latest/edge:      git-89828eb    2024-04-30 (28526) 107MB -
  5.20/stable:      5.20-f3dd836   2024-02-09 (27049) 155MB -
  5.20/candidate:   ↑                                       
  5.20/beta:        ↑                                       
  5.20/edge:        ↑                                       
  5.19/stable:      5.19-8635f82   2024-01-29 (26200) 159MB -
  5.19/candidate:   ↑                                       
  5.19/beta:        ↑                                       
  5.19/edge:        ↑                                       
  5.0/stable:       5.0.3-d921d2e  2024-04-23 (28373)  91MB -
  5.0/candidate:    5.0.3-5e9b586  2024-04-26 (28461)  91MB -
  5.0/beta:         ↑                                       
  5.0/edge:         git-8cd0db9    2024-04-24 (28440) 117MB -
  4.0/stable:       4.0.9-a29c6f1  2022-12-04 (24061)  96MB -
  4.0/candidate:    4.0.9-a29c6f1  2022-12-02 (24061)  96MB -
  4.0/beta:         ↑                                       
  4.0/edge:         git-407205d    2022-11-22 (23988)  96MB -
  3.0/stable:       3.0.4          2019-10-10 (11348)  55MB -
  3.0/candidate:    3.0.4          2019-10-10 (11348)  55MB -
  3.0/beta:         ↑                                       
  3.0/edge:         git-81b81b9    2019-10-10 (11362)  55MB -
installed:          5.21.1-d46c406            (28460) 108MB -

Any help or pointers in the right direction would be appreciated!

Kind regards,

Aaron

Hi @colemiller would you mind taking a look at this one.

We’ve had a few reports of this lately, so would really appreciate some time being taken to figure out whats going on with dqlite to make this happen.

If its an issue with the way LXD is using dqlite we can make modifications as needed.

@aaron-whitehouse @colemiller has kindly been working on updating the LXD docs with some dqlite debugging steps here doc: Add a page about dqlite troubleshooting by cole-miller · Pull Request #13346 · canonical/lxd · GitHub

Its likely that the fix is to remove the extraneous segment file (which the PR document describes).

Please can you provide the output of

sudo ls /var/snap/lxd/common/lxd/database/global/ -la

Please see Ubuntu Pastebin

Thanks. So you have a backup of the database in snap saved, but if you take a backup of /var/snap/lxd/common/lxd/database/global/ too to be safe.

Then it appears that this dqlite segment file is corrupt:

-rw------- 1 root root  253952 Apr 30 22:06 0000000000295980-0000000000295980

However from @colemiller document:

If the last (highest-numbered) closed segment is corrupt, try deleting it. (Deleting closed segments before the last one will create a gap and generally prevent dqlite from starting.)

So we likely don’t want to delete just the problem segment file, but also the segments after that, of which there are two:

-rw------- 1 root root   16496 Apr 30 22:05 0000000000295981-0000000000295981
-rw------- 1 root root   16496 Apr 30 22:05 0000000000295982-0000000000295982

So lets try sudo snap stop lxd and then removing 0000000000295980-0000000000295980, 0000000000295981-0000000000295981 and 0000000000295982-0000000000295982 from /var/snap/lxd/common/lxd/database/global/

Then sudo snap start lxd

Wonderful! All seems to be working again. Many thanks.

1 Like