CTDB: Create a 3-node NFS HA backed by a Clustered Filesystem

rafaeldtinoco · June 28, 2019, 12:11pm

CTDB NFS HIGH AVAILABLE

The intent of this document is NOT to demonstrate how to install a clustered filesystem environment, but to rely in an already installed one to make the NFS service high available to several NFS clients.

This document is only suitable for the following Bug fix: LP: #722201. You can either use the PPA provided in that bug, for the package being provided, or wait for the bug to be Fix Released for the Ubuntu version you’re using.

SUMMARY

2 x clustered filesystem servers (could be GlusterFS servers)
3 x nodes as clients to this Clustered Filesystem
The 3 NFS nodes will act as clients to Clustered FS
The 3 NFS nodes will act as servers for NFS service
The NFS service will be High Available among all 3 nodes
NFS clients can DNS load balance among the 3 x NFS servers
If 1 NFS server goes down, the other NFS server will act on its behalf
All NFS servers will serve the SAME FILESYSTEM
CTDB lock file MUST reside in a clustered filesystem directory

Notes:

To make things simpler, I’m using 3 LXC containers and a “bind mount” between them. This shared bind mount will serve to me as a shared filesystem among my “nodes” (containers).

Make sure you have the shared, and clustered, filesystem mounted in all NFS server nodes in the same mount-point. This will be the directory you will export in NFS service to all NFS clients.

DO USE a different network for each of these:
a) cluster filesystem network
b) public NFS network
c) private CTDB network

Make sure you place CTDB lock file in the CLUSTERED FILESYSTEM. CTDB depends on an underlying shared, among all nodes, filesystem in order to guarantee no split brain among its nodes.

INSTRUCTIONS

Install ctdb in all nodes:

(c)inaddy@ctdb01:~$ apt-get install ctdb nfs-kernel-server quota

(c)inaddy@ctdb01:~$ dpkg -L ctdb | grep "examples/nfs-kernel-server"
/usr/share/doc/ctdb/examples/nfs-kernel-server
/usr/share/doc/ctdb/examples/nfs-kernel-server/99-nfs-static-ports.conf
/usr/share/doc/ctdb/examples/nfs-kernel-server/enable-nfs.sh.gz
/usr/share/doc/ctdb/examples/nfs-kernel-server/nfs-common
/usr/share/doc/ctdb/examples/nfs-kernel-server/nfs-kernel-server
/usr/share/doc/ctdb/examples/nfs-kernel-server/services

Configure your /etc/hosts in each CTDB NFS server node:

(c)inaddy@ctdb01:~$ cat /etc/hosts
## /etc/hosts

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

172.16.0.200 ctdb01.private
172.16.0.201 ctdb02.private
172.16.0.202 ctdb03.private

192.168.0.1 ctdb01.public
192.168.0.2 ctdb02.public
192.168.0.3 ctdb03.public

## end of file

Make sure to run the helper script in all 3 NFS server nodes:

(c)inaddy@ctdb01:~$ gunzip -c /usr/share/doc/ctdb/examples/nfs-kernel-server/enable-nfs.sh.gz > enable-nfs.sh
(c)inaddy@ctdb01:~$ chmod +x ./enable-nfs.sh
(c)inaddy@ctdb01:~$ sudo ./enable-nfs.sh

This script will enable CTDB NFS HA by changing the following files:

(1) /etc/default/nfs-common                   ( replace )
(2) /etc/default/nfs-kernel-server            ( replace )
(3) /etc/services                             ( append  )
(4) /etc/sysctl.d/99-nfs-static-ports.conf    ( create  )
(5) /usr/lib/systemd/scripts/nfs-utils_env.sh ( modify  )

and disabling the following services:

(1) rpcbind
(2) nfs-kernel-server
(3) rpc.rquotad

Obs:
  - replaced files keep previous versions as file.prevctdb
  - dependant services will also be stopped

Do you agree with this change ? (N/y) => y
checking requirements...
requirements okay!

backing up /etc/default/nfs-common
backing up /etc/default/nfs-kernel-server
backing up /etc/services
backing up /usr/lib/systemd/scripts/nfs-utils_env.sh

stopping ctdb.service...
stopping quota.service...
stopping nfs-kernel-server.service...
stopping rpcbind.service...
stopping rpcbind.socket...
stopping rpcbind.target...

disabling ctdb.service...
disabling quota.service...
disabling nfs-kernel-server.service...
disabling rpcbind.service...
disabling rpcbind.socket...
disabling rpcbind.target...

replacing /etc/default/nfs-common...
replacing /etc/default/nfs-kernel-server...
replacing /etc/sysctl.d/99-nfs-static-ports.conf...

appending /usr/share/doc/ctdb/examples/nfs-kernel-server//services to /etc/services...

What is the FQDN for the public IP address of this host ?
> ctdb01.public
placing hostname ctdb01.public into /etc/default/nfs-common...
placing hostname ctdb01.public into /etc/default/nfs-kernel-server...

appending NFS_HOSTNAME to /usr/lib/systemd/scripts/nfs-utils_env.sh...
executing /usr/lib/systemd/scripts/nfs-utils_env.sh...

refreshing sysctl...

Finished! Make sure to configure properly:

    - /etc/exports (containing the clustered fs to be exported)
    - /etc/ctdb/nodes (containing all your node private IPs)
    - /etc/ctdb/public_addressess (containing public addresses)

A log file can be found at:

     - /tmp/enable-ctdb-nfs.14080.log

Remember:

- to place a recovery lock in /etc/ctdb/ctdb.conf:
    ...
    [cluster]
    recovery lock = /clustered.filesystem/.reclock
    ...

And, make sure you enable ctdb service again:

    - systemctl enable ctdb.service
    - systemctl start ctdb.service

Enjoy!

Notes:

Because of bug:

Bug #50093 “Some sysctls are ignored on boot” : Bugs : procps package : Ubuntu

reboots won’t load the needed sysctl.d/* NFS parameters. This way, please run:
(c)inaddy@ctdb01:~$ sudo sysctl --system
...
* Applying /etc/sysctl.d/99-nfs-static-ports.conf ...
fs.nfs.nlm_tcpport = 32768
fs.nfs.nlm_udpport = 32768
...
in all nodes, after installation. Do also re-generate an initramfs with “nfsd”
line added to /etc/modules file.
(c)inaddy@ctdb01:~$ cat /etc/modules
## /etc/modules
sunrpc
nfsd
## end of file

(c)inaddy@ctdb01:~$ sudo update-initramfs -k all -u
update-initramfs: Generating /boot/initrd.img-5.0.0-16-generic
to make sure that on the next reboot, the sysctl parameters will be loaded on boot time.

Rebooting all nodes at this phase, after this workaround, might be a good idea.

Make sure you’re exporting the clustered shared filesystem in all nodes:

(c)inaddy@ctdb01:~$ cat /etc/exports
/home/inaddy/work *(rw,no_root_squash,sync,no_subtree_check,fsid=1234)

Note: Do not forget to set a “fsid” to each export (must be the same in all nodes)

Let’s now change /etc/ctdb/nodes file:

(c)inaddy@ctdb01:~$ cat /etc/ctdb/nodes
172.16.0.200
172.16.0.201
172.16.0.202

in ALL nodes.

And change /etc/ctdb/public_addresses file:

(c)inaddy@ctdb01:~$ cat /etc/ctdb/public_addresses
192.168.0.1/24 eth1
192.168.0.2/24 eth1
192.168.0.3/24 eth1

in ALL nodes as well.

One last thing: Make sure you have the CTDB lock in a clustered filesystem,
shared among ALL the CTDB NFS nodes:

(c)inaddy@ctdb01:~$ cat /etc/ctdb/ctdb.conf
# See ctdb.conf(5) for documentation
#
# See ctdb-script.options(5) for documentation about event script
# options

[logging]
        # Enable logging to syslog
        location = syslog

        # Default log level
        log level = NOTICE

[cluster]
        # Shared recovery lock file to avoid split brain.  Daemon
        # default is no recovery lock.  Do NOT run CTDB without a
        # recovery lock file unless you know exactly what you are
        # doing.
        #
        # Please see the RECOVERY LOCK section in ctdb(7) for more
        # details.
        #
        recovery lock = /home/inaddy/.reclock

And, voilá. Let’s start the cluster, before enabling the NFS resource. Enable and
start ctdb.service in every cluster node.

FIRST NFS HA NODE

After you have ran the helper script, like described above, it’s time for node specific setup:

(c)inaddy@ctdb01:~$ sudo vi /etc/exports
(c)inaddy@ctdb01:~$ cat /etc/exports
/home/inaddy/work *(rw,no_root_squash,sync,no_subtree_check,fsid=1234)

(c)inaddy@ctdb01:~$ sudo vi /etc/ctdb/nodes
(c)inaddy@ctdb01:~$ sudo cat /etc/ctdb/nodes
172.16.0.200
172.16.0.201
172.16.0.202

(c)inaddy@ctdb01:~$ sudo vi /etc/ctdb/public_addresses
(c)inaddy@ctdb01:~$ sudo cat /etc/ctdb/public_addresses
192.168.0.1/24 eth1
192.168.0.2/24 eth1
192.168.0.3/24 eth1

(c)inaddy@ctdb01:~$ sudo vi /etc/ctdb/ctdb.conf 
(c)inaddy@ctdb01:~$ sudo cat /etc/ctdb/ctdb.conf
## /etc/ctdb/ctdb.conf
[logging]
        location = syslog
        log level = NOTICE
[cluster]
        recovery lock = /home/inaddy/.reclock
## end of file

(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/ctdb.conf root@ctdb02:/etc/ctdb/ctdb.conf
(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/ctdb.conf root@ctdb03:/etc/ctdb/ctdb.conf

(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/nodes root@ctdb02:/etc/ctdb/nodes
(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/nodes root@ctdb03:/etc/ctdb/nodes

(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/public_addresses root@ctdb02:/etc/ctdb/public_addresses
(c)inaddy@ctdb01:~$ sudo scp /etc/ctdb/public_addresses root@ctdb03:/etc/ctdb/public_addresses

Enable the service only in this node for now:

(c)inaddy@ctdb01:~$ systemctl enable ctdb.service
Synchronizing state of ctdb.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ctdb
Created symlink /etc/systemd/system/multi-user.target.wants/ctdb.service → /etc/systemd/system/ctdb.service.

(c)inaddy@ctdb01:~$ systemctl start ctdb.service

(c)inaddy@ctdb01:~$ systemctl status ctdb.service
● ctdb.service - CTDB
   Loaded: loaded (/etc/systemd/system/ctdb.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-06-28 02:41:31 UTC; 5s ago
     Docs: man:ctdbd(1)
           man:ctdb(7)
  Process: 14851 ExecStart=/usr/sbin/ctdbd_wrapper start (code=exited, status=0/SUCCESS)
 Main PID: 14853 (ctdbd)
    Tasks: 4 (limit: 4915)
   Memory: 6.7M
   CGroup: /system.slice/ctdb.service
           ├─14853 /usr/sbin/ctdbd
           ├─14855 /usr/lib/x86_64-linux-gnu/ctdb/ctdb-eventd -P 14853 -S 13
           ├─14906 /usr/sbin/ctdbd
           └─14913 /usr/lib/x86_64-linux-gnu/ctdb/ctdb_mutex_fcntl_helper /home/inaddy/.reclock

Note: with the default ctdb.conf, systemd journal has to be able to see ctdb daemon log messages.
you can use the following commands to check service status:

(c)inaddy@ctdb01:~$ journalctl -f -u ctdb
-- Logs begin at Sun 2019-05-05 20:59:07 UTC. --
Jun 28 02:50:36 ctdb01 ctdbd[16551]: Takeover of IP 192.168.0.2/24 on interface eth1
Jun 28 02:50:36 ctdb01 ctdb-eventd[16553]: 60.nfs: Reconfiguring service "nfs-kernel-server"...
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Takeover run completed successfully
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: ../../ctdb/server/ctdb_recoverd.c:1489 Recovery complete
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Resetting ban count to 0 for all nodes
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Just finished a recovery. New recoveries will now be suppressed for the rerecovery timeout (10 seconds)
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Disabling recoveries for 10 seconds
Jun 28 02:50:39 ctdb01 ctdbd[16551]: Starting traverse on DB ctdb.tdb (id 4127)
Jun 28 02:50:39 ctdb01 ctdbd[16551]: Ending traverse on DB ctdb.tdb (id 4127), records 0
Jun 28 02:50:46 ctdb01 ctdb-recoverd[16606]: Reenabling recoveries after timeout

working like a “tail -f”, or just check the service status:

(c)inaddy@ctdb01:~$ systemctl status ctdb
    ● ctdb.service - CTDB
       Loaded: loaded (/etc/systemd/system/ctdb.service; enabled; vendor preset: enabled)
       Active: active (running) since Fri 2019-06-28 02:47:58 UTC; 8h ago
         Docs: man:ctdbd(1)
               man:ctdb(7)
      Process: 16549 ExecStart=/usr/sbin/ctdbd_wrapper start (code=exited, status=0/SUCCESS)
     Main PID: 16551 (ctdbd)
        Tasks: 6 (limit: 4915)
       Memory: 161.6M
       CGroup: /system.slice/ctdb.service
               ├─16551 /usr/sbin/ctdbd
               ├─16553 /usr/lib/x86_64-linux-gnu/ctdb/ctdb-eventd -P 16551 -S 13
               ├─16606 /usr/sbin/ctdbd
               ├─16613 /usr/lib/x86_64-linux-gnu/ctdb/ctdb_mutex_fcntl_helper /home/inaddy/.reclock
               ├─17350 rpc.statd
               └─17377 rpc.rquotad

Jun 28 02:50:36 ctdb01 ctdbd[16551]: Takeover of IP 192.168.0.2/24 on interface eth1
Jun 28 02:50:36 ctdb01 ctdb-eventd[16553]: 60.nfs: Reconfiguring service "nfs-kernel-server"...
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Takeover run completed successfully
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: ../../ctdb/server/ctdb_recoverd.c:1489 Recovery complete
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Resetting ban count to 0 for all nodes
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Just finished a recovery. New recoveries will now be suppressed for the rere
Jun 28 02:50:36 ctdb01 ctdb-recoverd[16606]: Disabling recoveries for 10 seconds
Jun 28 02:50:39 ctdb01 ctdbd[16551]: Starting traverse on DB ctdb.tdb (id 4127)
Jun 28 02:50:39 ctdb01 ctdbd[16551]: Ending traverse on DB ctdb.tdb (id 4127), records 0
Jun 28 02:50:46 ctdb01 ctdb-recoverd[16606]: Reenabling recoveries after timeout

With a correct initialization, you can check the cluster - being created - status using the following command:

(c)inaddy@ctdb01:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200     OK (THIS NODE)
pnn:1 172.16.0.201     BANNED|UNHEALTHY|INACTIVE
pnn:2 172.16.0.202     BANNED|UNHEALTHY|INACTIVE
Generation:1206361316
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

And wait until the node you are configuring to become OK.

Note: For now, you have only the “resource manager” and virtual IPs working. You can check all your public_addresses set in the interface you configured in /etc/ctdb/public_addresses by issuing “ip addr” command.

SECOND AND THIRD NFS HA NODES

After you have ran the helper script, like described in the first part of this document, it’s time for the second and third node specific steps:

(c)inaddy@ctdb02:~$ systemctl enable ctdb.service
Synchronizing state of ctdb.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable ctdb
Created symlink /etc/systemd/system/multi-user.target.wants/ctdb.service → /lib/systemd/system/ctdb.service.
(c)inaddy@ctdb02:~$ systemctl start ctdb.service
(c)inaddy@ctdb02:~$ systemctl status ctdb

Give it a few seconds to talk to the first node and check current status:

(c)inaddy@ctdb02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200     UNHEALTHY
pnn:1 172.16.0.201     UNHEALTHY (THIS NODE)
pnn:2 172.16.0.202     DISCONNECTED|BANNED|UNHEALTHY|INACTIVE
Generation:1574308138
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:RECOVERY (1)
Recovery master:0
(c)inaddy@ctdb02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200     OK
pnn:1 172.16.0.201     UNHEALTHY (THIS NODE)
pnn:2 172.16.0.202     BANNED|UNHEALTHY|INACTIVE
Generation:2017818905
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0
(c)inaddy@ctdb02:~$ ctdb status
Number of nodes:3
pnn:0 172.16.0.200     OK
pnn:1 172.16.0.201     OK (THIS NODE)
pnn:2 172.16.0.202     BANNED|UNHEALTHY|INACTIVE
Generation:2017818905
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

Do the same to the third node until it is also flagged as OK in “ctdb status” command.

CONFIGURE THE NFS SERVICE AS HIGH AVAILABLE

Execute the following commands in the first node:

(c)inaddy@ctdb01:~$ onnode -p all systemctl stop ctdb 
(c)inaddy@ctdb01:~$ onnode -p all "ctdb event script enable legacy 60.nfs"
(c)inaddy@ctdb01:~$ onnode -p all "ctdb event script enable legacy 06.nfs"

This will stop the CTDB service in all nodes and enable the NFS resources in all of them. After doing this, enable the CTDB service one by one:

(c)inaddy@ctdb01:~$ onnode -p 0 "systemctl start ctdb"
(c)inaddy@ctdb01:~$ onnode -p 1 "systemctl start ctdb"
(c)inaddy@ctdb01:~$ onnode -p 2 "systemctl start ctdb"

Give it sometime between one node and another, so nodes can negotiate and come up with NFS services.

(c)inaddy@ctdb01:~$ onnode -p all "systemctl status ctdb"

Will tell you whether the CTDB service is activated in all nodes.

(c)inaddy@ctdb01:~$ onnode -p all "systemctl status nfs-kernel-server"

Will tell you whether the NFS service is activate in all nodes.

Note: because you have configured 3 public IP addresses in /etc/node/public_addresses, and configured eth1 to be the public IP address interface, you can now make sure your public IPs are advertised. Since we have 3 nodes up, we should have one public IP address on each of the nodes.

You can check IP addresses with:

(c)inaddy@ctdb01:~$ onnode -p all "ip addr show eth1 | grep 192.168.0"
[172.16.0.201]     inet 192.168.0.3/24 brd 192.168.0.255 scope global eth1
[172.16.0.200]     inet 192.168.0.1/24 brd 192.168.0.255 scope global eth1
[172.16.0.202]     inet 192.168.0.2/24 brd 192.168.0.255 scope global eth1

Note: You can’t guarantee on which node the public IP address will be set. And it does not matter either. As long as the NFS service is running where the public IP is located you are good.

ACCESSING THE NFS SERVICE

In this final part, since you have the CTDB cluster running, and the NFS service also running in all nodes, you are able to mount the clustered/shared filesystem from ANY of the NFS server nodes. In case one of the NFS servers fail, the virtual IP address will be moved to another NFS node and your clients won’t suffer many consequences (despite some I/O errors for the already loaded files, errors that can be recovered).

In order to balance the load among your NFS servers, you can load balance the NFS mounts through DNS, pointing the same FQDN to all 3 public IP addresses, balancing among them.

(c)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdb01.public:/home/inaddy /mnt/samedir01
(c)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdb02.public:/home/inaddy /mnt/samedir02
(c)inaddy@ctdbclient01:~$ sudo mount -t nfs -o vers=3 ctdb03.public:/home/inaddy /mnt/samedir03

Notes:

Contents of samedir01, samedir02 and samedir03 have to be exactly the same since you are mounting the SAME shared filesystem through different NFS servers.

You CANNOT use NFSv4 yet with CTDB. The files provided with this wrapper script, and examples, make sure that NFv4 is disabled.