Can I set LXD database leader manually?

equator8848 · November 4, 2024, 5:15am

Hi, I have a lxd cluster, cluster Member crash due to OOM sometimes. When cluster database leader is changing, all cluster member hung up. If I can set LXD database leader manually, I will set database leader on stable cluster member, my cluster will more strong.

whershberger · November 4, 2024, 3:21pm

Hi, have you tried with lxd cluster edit [1]? Make sure you’re looking at the right version of the docs for your version of LXD, as cluster recovery has seen some changes recently.

[1] https://documentation.ubuntu.com/lxd/en/stable-5.21/howto/cluster_recover/

equator8848 · November 4, 2024, 4:28pm

So cool , Let me try it. thank you!

equator8848 · November 18, 2024, 1:17pm

Hi, I try it and found that it only can stopped cluster, but my cluster is running, can not stop because users are using instnace.

Aleks · November 18, 2024, 2:57pm

try running systemctl reload snap.lxd.daemon (it will not restart your containers, don’t worry) on the current leader until you get it where you want it.

for example:

[root@lxd11 ~]# lxc cluster list --format csv
lxd3,https://lxd3:8443,database,x86_64,default,,ONLINE,Fully operational
lxd4,https://lxd4:8443,database-standby,x86_64,default,,ONLINE,Fully operational
lxd10,https://lxd10:8443,"database-leader,database",x86_64,default,,ONLINE,Fully operational
lxd11,https://lxd11:8443,database,x86_64,default,,ONLINE,Fully operational
lxd12,https://lxd12:8443,database-standby,x86_64,default,,ONLINE,Fully operational
lxd13,https://lxd13:8443,,x86_64,default,,ONLINE,Fully operational

[root@lxd11 ~]# ssh lxd10 systemctl reload snap.lxd.daemon
[root@lxd11 ~]# lxc cluster list --format csv
lxd3,https://lxd3:8443,"database-leader,database",x86_64,default,,ONLINE,Fully operational
lxd4,https://lxd4:8443,database,x86_64,default,,ONLINE,Fully operational
lxd10,https://lxd10:8443,,x86_64,default,,ONLINE,Fully operational
lxd11,https://lxd11:8443,database,x86_64,default,,ONLINE,Fully operational
lxd12,https://lxd12:8443,database-standby,x86_64,default,,ONLINE,Fully operational
lxd13,https://lxd13:8443,,x86_64,default,,ONLINE,Fully operational

[root@lxd11 ~]# ssh lxd3 systemctl reload snap.lxd.daemon
[root@lxd11 ~]# lxc cluster list --format csv
lxd3,https://lxd3:8443,,x86_64,default,,ONLINE,Fully operational
lxd4,https://lxd4:8443,database,x86_64,default,,ONLINE,Fully operational
lxd10,https://lxd10:8443,database-standby,x86_64,default,,ONLINE,Fully operational
lxd11,https://lxd11:8443,"database-leader,database",x86_64,default,,ONLINE,Fully operational
lxd12,https://lxd12:8443,database,x86_64,default,,ONLINE,Fully operational
lxd13,https://lxd13:8443,database-standby,x86_64,default,,ONLINE,Fully operational

etc… etc…

equator8848 · November 18, 2024, 3:17pm

hi, Aleks, When I exec sudo lxd cluster edit, I got error Error: The LXD daemon is running, please stop it first.. when I exec systemctl reload snap.lxd.daemon, leader whill not change.

But I found a method to change leader without stoping cluster: when cluster leader is changed and cluster hung up, I exec systemctl restart snap.lxd.daemon on node client (clinet is just a client), client will be new database leader, and cluster will recovery soon.

after exec systemctl restart snap.lxd.daemon on node client

tomp · November 21, 2024, 9:44am

You cannot set the automatic cluster member roles manually

Automatic roles are assigned by LXD itself and cannot be modified by the user.

https://documentation.ubuntu.com/lxd/en/latest/explanation/clustering/#member-roles

However you may be able to influence LXD’s role placement decision by putting the problem server in its own failure domain, and the other members in their own failure domain such a member within the same failure domain gets the leader role if another member goes down.

For example, if a cluster member that currently has the database role gets shut down, LXD tries to assign its database role to another cluster member in the same failure domain, if one is available.

https://documentation.ubuntu.com/lxd/en/latest/explanation/clustering/#failure-domains

But in general I think the priority here should be to fix the OOM scenario

equator8848 · December 21, 2024, 11:49am

Hi, tomp, thank you for replying.
I miss the feature “Failure domains”, it sounds very great.
I try to use isolated, stable node as database role node.