Ubuntu HA - Cluster in Microsoft Azure Cloud

In this tutorial you will learn how to:

  • Access Microsoft Azure Shell
  • Create a new Resource Group,
  • … a new Availability Set,
  • … a new Subnet and VM Interfaces,
  • … assign a public IP address to an Interface,
  • … Customize (w/ cloud-init) and Install an Ubuntu VM;
  • Create a Pacemaker Cluster
  • Configure Microsoft Azure fencing agent
  • Configure a MySQL database in HA mode in Pacemaker
  • Configure a Lighttpd server in HA mode in Pacemaker
  • Test a failover and the fencing agent

Initial Step

In order to give commands to the Azure CLI interface you can either opt to use the Azure shell (under https://shell.azure.com) or you can install a small virtual machine by hand that will serve you as a gateway - to your about-to-be-created-cluster - and also as a place where you can install azure cli.

I’ll use Azure Shell to do my Azure Cloud management through CLI - I think it is much faster than issuing commands from other places - and a small VM called “gateway” to access my environment from the Internet.

[ DRAWING ]

Accessing Azure Shell

When first accessing Azure Shell, it will ask you to “Create storage” because it needs “Azure file share” to persist files. You can accept the default values there.

After connecting to the service, you can opt among PowerShell or Bash using the up left dropdown button. I prefer bash. You will realize that you will be connected to a small Ubuntu container containing the “az” command.

Create a Resource Group

First make sure you’re logged in by issuing:

$ az account show
{
  "environmentName": "AzureCloud",
  "homeTenantId": "xxxx-e8f6-40ae-8875-da47c934f1c1",
  "id": "xxxx-0262-4e54-9bd9-8b7458eec86b",
  "isDefault": true,
  "managedByTenants": [],
  "name": "rafaeldtinoco@ubuntu.com",
  "state": "Enabled",
  "tenantId": "xxxxxxxxx-e8f6-40ae-8875-da47c934f1c1",
  "user": {
    "name": "rafaeldtinoco@gmail.com",
    "type": "user"
  }
}

and then use the following command to create a resource group in a region:

$ az group create --name focal-ha --location eastus
{
  "id": "/subscriptions/xxxx/resourceGroups/focal-ha",
  "location": "eastus",
  "managedBy": null,
  "name": "focal-ha",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}

Create a virtual subnet

Initially we create a network with prefix: 10.250.0.0/16 to be used by our resource-group.

$ az network vnet create --resource-group focal-ha --name focal-ha-subnet --address-prefixes 10.250.0.0/16
{
  "newVNet": {
    "addressSpace": {
      "addressPrefixes": [
        "10.250.0.0/16"
      ]
    },
    "bgpCommunities": null,
    "ddosProtectionPlan": null,
    "dhcpOptions": {
      "dnsServers": []
    },
    "enableDdosProtection": false,
    "enableVmProtection": false,
    "etag": "W/\"xxxx-35dd-46c1-ba11-c8a5d7114ce0\"",
    "id": "xxxx/virtualNetworks/focal-ha-subnet",
    "ipAllocations": null,
    "location": "eastus",
    "name": "focal-ha-subnet",
    "provisioningState": "Succeeded",
    "resourceGroup": "focal-ha",
    "resourceGuid": "xxxx-633d-43cd-bcb6-1c2a1c1c62f8",
    "subnets": [],
    "tags": {},
    "type": "Microsoft.Network/virtualNetworks",
    "virtualNetworkPeerings": []
  }
}

Then we add a subnet called “public” with prefix: 10.250.92.0/24. This will be our cluster “public network”:

$ az network vnet subnet create --address-prefixes "10.250.92.0/24" --name public --resource-group focal-ha --vnet-name focal-ha-subnet
{
  "addressPrefix": "10.250.92.0/24",
  "addressPrefixes": null,
  "delegations": [],
  "etag": "W/\"xxxx-ba56-4255-9ff2-ec9c9273bc75\"",
  "id": "xxxx/virtualNetworks/focal-ha-subnet/subnets/public",
  "ipAllocations": null,
  "ipConfigurationProfiles": null,
  "ipConfigurations": null,
  "name": "public",
  "natGateway": null,
  "networkSecurityGroup": null,
  "privateEndpointNetworkPolicies": "Enabled",
  "privateEndpoints": null,
  "privateLinkServiceNetworkPolicies": "Enabled",
  "provisioningState": "Succeeded",
  "purpose": null,
  "resourceGroup": "focal-ha",
  "resourceNavigationLinks": null,
  "routeTable": null,
  "serviceAssociationLinks": null,
  "serviceEndpointPolicies": null,
  "serviceEndpoints": null,
  "type": "Microsoft.Network/virtualNetworks/subnets"
}

Later we create the “private” cluster network with prefix: 10.250.96.0/24:

$ az network vnet subnet create --address-prefixes "10.250.96.0/24" --name private --resource-group focal-ha --vnet-name focal-ha-subnet
{
  "addressPrefix": "10.250.96.0/24",
  "addressPrefixes": null,
  "delegations": [],
  "etag": "W/\"xxxx-b72f-4cf4-9168-2da758ee0023\"",
  "id": "xxxx/virtualNetworks/focal-ha-subnet/subnets/private",
  "ipAllocations": null,
  "ipConfigurationProfiles": null,
  "ipConfigurations": null,
  "name": "private",
  "natGateway": null,
  "networkSecurityGroup": null,
  "privateEndpointNetworkPolicies": "Enabled",
  "privateEndpoints": null,
  "privateLinkServiceNetworkPolicies": "Enabled",
  "provisioningState": "Succeeded",
  "purpose": null,
  "resourceGroup": "focal-ha",
  "resourceNavigationLinks": null,
  "routeTable": null,
  "serviceAssociationLinks": null,
  "serviceEndpointPolicies": null,
  "serviceEndpoints": null,
  "type": "Microsoft.Network/virtualNetworks/subnets"
}

Create a public IP address

Since I’m going to create a gateway virtual machine to access the environment, I can create a single public IP address (IPv4 in my case).

$ az network public-ip create --resource-group focal-ha --name public-ip
{
  "publicIp": {
    "ddosSettings": null,
    "dnsSettings": null,
    "etag": "W/\"xxxx-3b31-4fe5-9197-ab14b676f823\"",
    "id": "xxxx/publicIPAddresses/public-ip",
    "idleTimeoutInMinutes": 4,
    "ipAddress": null,
    "ipConfiguration": null,
    "ipTags": [],
    "location": "eastus",
    "name": "public-ip",
    "provisioningState": "Succeeded",
    "publicIpAddressVersion": "IPv4",
    "publicIpAllocationMethod": "Dynamic",
    "publicIpPrefix": null,
    "resourceGroup": "focal-ha",
    "resourceGuid": "xxxx-b585-4350-b149-6f03ff44d571",
    "sku": {
      "name": "Basic"
    },
    "tags": null,
    "type": "Microsoft.Network/publicIPAddresses",
    "zones": null
  }
}

Create an Availability Set

The next step is to create an availability set. An availability set serves to make sure that the VMs composing our cluster aren’t in the same fault (or update) domains. Mathematically speaking, a cluster should always have an odd number of votes so there is never a split brain scenario. In our case we’re going to have 3 nodes, and, because of that, 3 possible quorum votes. This means that our cluster has to have at least 2 nodes on-line in order to be quorate (have its resources on-line in a clean state).

For this reason, we’re creating an availability-set with 3 different fault (and update) domains. Considering Microsoft Azure HA recommendations.

$ az vm availability-set create --resource-group focal-ha --name focal-ha-avail --platform-fault-domain-count 3 --platform-update-domain-count 3
{- Finished ..
  "id": "xxxx/availabilitySets/focal-ha-avail",
  "location": "eastus",
  "name": "focal-ha-avail",
  "platformFaultDomainCount": 3,
  "platformUpdateDomainCount": 3,
  "proximityPlacementGroup": null,
  "resourceGroup": "focal-ha",
  "sku": {
    "capacity": null,
    "name": "Aligned",
    "tier": null
  },
  "statuses": null,
  "tags": {},
  "type": "Microsoft.Compute/availabilitySets",
  "virtualMachines": []
}

Choosing an Image to install your VMs

Gladly the Ubuntu OS image available at Azure Shell already has the “jq” tool and, with that, we can filter a list of images coming out of “az vm image list --all” command:

$ az vm image list --all --offer "Ubuntu" | jq -re '.[] | select(.publisher == "Canonical" and .offer == "UbuntuServer") | select (.version | contains("18.04.202009"))'

{
  "offer": "UbuntuServer",
  "publisher": "Canonical",
  "sku": "18.04-DAILY-LTS",
  "urn": "Canonical:UbuntuServer:18.04-DAILY-LTS:18.04.202009010",
  "version": "18.04.202009010"
}
{
  "offer": "UbuntuServer",
  "publisher": "Canonical",
  "sku": "18.04-LTS",
  "urn": "Canonical:UbuntuServer:18.04-LTS:18.04.202009010",
  "version": "18.04.202009010"
}
{
  "offer": "UbuntuServer",
  "publisher": "Canonical",
  "sku": "18_04-daily-lts-gen2",
  "urn": "Canonical:UbuntuServer:18_04-daily-lts-gen2:18.04.202009010",
  "version": "18.04.202009010"
}
{
  "offer": "UbuntuServer",
  "publisher": "Canonical",
  "sku": "18_04-lts-gen2",
  "urn": "Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010",
  "version": "18.04.202009010"
}

Judging by my jq filters, and the output above, you can see that the latest “UbuntuServer” “LTS” image not being a “daily image” is the last one. When creating the VM you can either use “urn”: “Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010” or simply “–image UbuntuLTS”.

Note: By now you have already realized that LTS version available in Azure is still 18.04 (and we have already 20.04 released). That is okay, let’s create our initial environment with the 18.04 version and we can upgrade to 20.04 before installing the cluster.

Feeding the VMs with a cloud-init yaml file

One of the best things when provisioning an environment is to have it ready, or as close as possible of being ready, to its purpose. Cloud-init allows us to customize de images in a single step the way we want.

Let’s create a file - by using the Azure Shell nano or vim utils - called cloud-init.yaml with the following content:

#cloud-config
package_upgrade: true
packages:
  - man
  - manpages
  - hello
  - locales
  - less
  - vim
  - uuid
  - bash-completion
  - sudo
  - rsync
  - bridge-utils
  - net-tools
  - vlan
  - ncurses-term
  - iputils-arping
  - iputils-ping
  - iputils-tracepath
  - traceroute
  - mtr-tiny
  - tcpdump
  - dnsutils
  - ssh-import-id
  - openssh-server
  - openssh-client
  - software-properties-common
  - build-essential
  - linux-headers-generic
write_files:
  - path: /etc/ssh/sshd_config
    content: |
      Port 22
      AddressFamily any
      SyslogFacility AUTH
      LogLevel INFO
      PermitRootLogin yes
      PubkeyAuthentication yes
      PasswordAuthentication yes
      ChallengeResponseAuthentication no
      GSSAPIAuthentication no
      HostbasedAuthentication no
      PermitEmptyPasswords no
      UsePAM yes
      IgnoreUserKnownHosts yes
      IgnoreRhosts yes
      X11Forwarding yes
      X11DisplayOffset 10
      X11UseLocalhost yes
      PermitTTY yes
      PrintMotd no
      TCPKeepAlive yes
      ClientAliveInterval 5
      PermitTunnel yes
      Banner none
      AcceptEnv LANG LC_* EDITOR PAGER SYSTEMD_EDITOR
      Subsystem     sftp /usr/lib/openssh/sftp-server
  - path: /etc/ssh/ssh_config
    content: |
      Host *
        ForwardAgent no
        ForwardX11 no
        PasswordAuthentication yes
        CheckHostIP no
        AddressFamily any
        SendEnv LANG LC_* EDITOR PAGER
        StrictHostKeyChecking no
        HashKnownHosts yes
  - path: /etc/sudoers
    content: |
        Defaults env_keep += "LANG LANGUAGE LINGUAS LC_* _XKB_CHARSET"
        Defaults env_keep += "HOME EDITOR SYSTEMD_EDITOR PAGER"
        Defaults env_keep += "XMODIFIERS GTK_IM_MODULE QT_IM_MODULE QT_IM_SWITCHER"
        Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        Defaults logfile=/var/log/sudo.log,loglinelen=0
        Defaults !syslog, !pam_session
        root ALL=(ALL) NOPASSWD: ALL
        %wheel ALL=(ALL) NOPASSWD: ALL
        %sudo ALL=(ALL) NOPASSWD: ALL
        rafaeldtinoco ALL=(ALL) NOPASSWD: ALL
runcmd:
  - systemctl disable --now snapd.service
  - systemctl disable --now unattended-upgrades
  - systemctl stop systemd-remount-fs
  - system reset-failed
  - passwd -d root
  - passwd -d rafaeldtinoco
  - echo "debconf debconf/priority select low" | sudo debconf-set-selections
  - DEBIAN_FRONTEND=noninteractive dpkg-reconfigure debconf
  - DEBIAN_FRONTEND=noninteractive apt-get update -y
  - DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y
  - DEBIAN_FRONTEND=noninteractive apt-get autoremove -y
  - DEBIAN_FRONTEND=noninteractive apt-get autoclean -y
  - systemctl disable apt-daily-upgrade.timer
  - systemctl disable apt-daily.timer
  - systemctl disable accounts-daemon.service
  - systemctl disable motd-news.timer
  - systemctl disable irqbalance.service
  - systemctl disable rsync.service
  - systemctl disable ebtables.service
  - systemctl disable pollinate.service
  - systemctl disable ufw.service
  - systemctl disable apparmor.service
  - systemctl disable apport-autoreport.path
  - systemctl disable apport-forward.socket
  - systemctl disable iscsi.service
  - systemctl disable open-iscsi.service
  - systemctl disable iscsid.socket
  - systemctl disable multipathd.socket
  - systemctl disable multipath-tools.service
  - systemctl disable multipathd.service
  - systemctl disable lvm2-monitor.service
  - systemctl disable lvm2-lvmpolld.socket
  - systemctl disable lvm2-lvmetad.socket
apt:
  preserve_sources_list: false
  primary:
    - arches: [default]
      uri: http://us.archive.ubuntu.com/ubuntu
  sources_list: |
    deb $MIRROR $RELEASE main restricted universe multiverse
    deb $MIRROR $RELEASE-updates main restricted universe multiverse
    deb $MIRROR $RELEASE-proposed main restricted universe multiverse
    deb-src $MIRROR $RELEASE main restricted universe multiverse
    deb-src $MIRROR $RELEASE-updates main restricted universe multiverse
    deb-src $MIRROR $RELEASE-proposed main restricted universe multiverse
  conf: |
    Dpkg::Options {
      "--force-confdef";
      "--force-confold";
    };
  sources:
    debug.list:
      source: |
        # deb http://ddebs.ubuntu.com $RELEASE main restricted universe multiverse
        # deb http://ddebs.ubuntu.com $RELEASE-updates main restricted universe multiverse
        # deb http://ddebs.ubuntu.com $RELEASE-proposed main restricted universe multiverse
      keyid: C8CAB6595FDFF622

Note: There is a hardcoded “rafaeldtinoco” in /etc/sudoers file created by this cloud-init yaml file. Make sure to change this to the username you’re using at Microsoft Azure VMs. Also, feel free to customize this file to suite your needs (just make sure it works first =).

Choosing your VM size

In order to see all available VM sizes in “eastus” location you would run:

$ az vm list-sizes -l eastus | jq -re '.[] | select (.name | contains("B1"))'
{
  "maxDataDiskCount": 2,
  "memoryInMb": 512,
  "name": "Standard_B1ls",
  "numberOfCores": 1,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 4096
}
{
  "maxDataDiskCount": 2,
  "memoryInMb": 2048,
  "name": "Standard_B1ms",
  "numberOfCores": 1,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 4096
}
{
  "maxDataDiskCount": 2,
  "memoryInMb": 1024,
  "name": "Standard_B1s",
  "numberOfCores": 1,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 4096
}
{
  "maxDataDiskCount": 16,
  "memoryInMb": 49152,
  "name": "Standard_B12ms",
  "numberOfCores": 12,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 98304
}
{
  "maxDataDiskCount": 32,
  "memoryInMb": 65536,
  "name": "Standard_B16ms",
  "numberOfCores": 16,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 131072
}
{
  "maxDataDiskCount": 8,
  "memoryInMb": 480000,
  "name": "Standard_HB120rs_v2",
  "numberOfCores": 120,
  "osDiskSizeInMb": 1047552,
  "resourceDiskSizeInMb": 960000
}

I’ll be using the “Standard_B1ms” size of “eastus” location for my examples.

Create the network interfaces to be used

Usually you could simply create a virtual machine with automatically created network interfaces. That would be just fine. Since we’re installing a HA cluster, we would like to have better control on what is being deployed and how.

In our cluster, when creating the focal-ha-subnet virtual network, we created 2 subnets:

- public (10.250.92.0/24)
- private (10.250.96.0/24)

So now we want to create 1 interface per subnet, per node to be provisioned.

Creating the network interfaces (NICs) gwpublic, vm01public, vm02public, vm03public:

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet public --name gwpublic

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet public --name vm03public

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet public --name vm02public

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet public --name vm01public
{
  "NewNIC": {
    "dnsSettings": {
      "appliedDnsServers": [],
      "dnsServers": [],
      "internalDnsNameLabel": null,
      "internalDomainNameSuffix": "xxxx.bx.internal.cloudapp.net",
      "internalFqdn": null
    },
    "enableAcceleratedNetworking": false,
    "enableIpForwarding": false,
    "etag": "W/\"xxxx-955b-4169-828e-ee954fd8fda0\"",
    "hostedWorkloads": [],
    "id": "xxxx/networkInterfaces/vm01public",
    "ipConfigurations": [
      {
        "applicationGatewayBackendAddressPools": null,
        "applicationSecurityGroups": null,
        "etag": "W/\"xxxx-955b-4169-828e-ee954fd8fda0\"",
        "id": "xxxx/vm01public/ipConfigurations/ipconfig1",
        "loadBalancerBackendAddressPools": null,
        "loadBalancerInboundNatRules": null,
        "name": "ipconfig1",
        "primary": true,
        "privateIpAddress": "10.250.92.4",
        "privateIpAddressVersion": "IPv4",
        "privateIpAllocationMethod": "Dynamic",
        "privateLinkConnectionProperties": null,
        "provisioningState": "Succeeded",
        "publicIPAddresses": null,
        "resourceGroup": "focal-ha",
        "subnet": {
          "addressPrefix": null,
          "addressPrefixes": null,
          "delegations": null,
          "etag": null,
          "id": "xxxx",
          "ipAllocations": null,
          "ipConfigurationProfiles": null,
          "ipConfigurations": null,
          "name": null,
          "natGateway": null,
          "networkSecurityGroup": null,
          "privateEndpointNetworkPolicies": null,
          "privateEndpoints": null,
          "privateLinkServiceNetworkPolicies": null,
          "provisioningState": null,
          "purpose": null,
          "resourceGroup": "focal-ha",
          "resourceNavigationLinks": null,
          "routeTable": null,
          "serviceAssociationLinks": null,
          "serviceEndpointPolicies": null,
          "serviceEndpoints": null
        },
        "type": "Microsoft.Network/networkInterfaces/ipConfigurations",
        "virtualNetworkTaps": null
      }
    ],
    "location": "eastus",
    "macAddress": null,
    "name": "vm01public",
    "networkSecurityGroup": null,
    "primary": null,
    "privateEndpoint": null,
    "provisioningState": "Succeeded",
    "resourceGroup": "focal-ha",
    "resourceGuid": "xxxx-c8b2-45ee-a41a-86735ac6ddb5",
    "tags": null,
    "tapConfigurations": [],
    "type": "Microsoft.Network/networkInterfaces",
    "virtualMachine": null
  }
}

Creating the network interfaces (NICs) vm01private, vm02private, vm03private:

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet private --name vm03private

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet private --name vm02private

$ az network nic create --resource-group focal-ha --vnet-name focal-ha-subnet --subnet private --name vm01private

Note: This private network will serve us as the heartbeat network for corosync. This network will be used only for that AND it is not recommended traffic (or at least big traffic) on it as it would cause delays/jittering for the cluster intercommunication network (Yes, in practice this is out of our control as this is a virtual network, but it’s still a valid recommendation).

Assign IP addresses to the cluster nodes interfaces

Remembering again: In our cluster, when creating the focal-ha-subnet virtual network, we created 2 subnets:

- public (10.250.92.0/24)
- private (10.250.96.0/24)

Let’s first learn how to list all the network interfaces we have created so far:

$ az network nic list | jq -re '.[].name'
gwpublic
vm01private
vm01public
vm02private
vm02public
vm03private
vm03public

If you observe, the default configuration for the NICs we created is for Dynamic IP addressing. Taking vm01public by example:

$ az network nic ip-config list --nic-name vm01public --resource-group focal-ha
…
    "name": "ipconfig1",
    "primary": true,
    "privateIpAddress": "10.250.92.4",
    "privateIpAddressVersion": "IPv4",
    "privateIpAllocationMethod": "Dynamic",
    "privateLinkConnectionProperties": null,
    "provisioningState": "Succeeded",
    "publicIpAddress": null,
    "resourceGroup": "focal-ha",
…

Since we’re dealing with a HA cluster, we don’t want it’s IP addresses changing from time to time, and disabling DHCP, especially for the private network, is also desirable since static leases could interfere in the Corosync Token Ring (the cluster private network).

Let’s configure the NICs IP addresses beginning with vm01. If I try to delete the existing ip-config from the NIC, it won’t allow me to:

$ az network nic ip-config delete --nic-name vm01private --resource-group focal-ha --name ipconfig1
Network interface vm01private must have one or more IP configurations.

So the secret is to add the IP configuration we want for the network interface we want, and delete the existing configuration (DHCP) after that.

Beginning with the public subnet of the focal-ha-subnet virtual network:

$ az network nic ip-config create --name vm01public --resource-group focal-ha --nic-name vm01public --make-primary --private-ip-address 10.250.92.11 --private-ip-address-version IPv4 --subnet public --vnet-name focal-ha-subnet

$ az network nic ip-config create --name vm02public --resource-group focal-ha --nic-name vm02public --make-primary --private-ip-address 10.250.92.12 --private-ip-address-version IPv4 --subnet public --vnet-name focal-ha-subnet

$ az network nic ip-config create --name vm03public --resource-group focal-ha --nic-name vm03public --make-primary --private-ip-address 10.250.92.13 --private-ip-address-version IPv4 --subnet public --vnet-name focal-ha-subnet
{
  "applicationGatewayBackendAddressPools": null,
  "applicationSecurityGroups": null,
  "etag": "W/\"xxxx-ccb4-4240-9943-05603a77793e\"",
  "id": "xxxx/vm03public/ipConfigurations/vm03public",
  "loadBalancerBackendAddressPools": null,
  "loadBalancerInboundNatRules": null,
  "name": "vm03public",
  "primary": true,
  "privateIpAddress": "10.250.92.13",
  "privateIpAddressVersion": "IPv4",
  "privateIpAllocationMethod": "Static",
  "privateLinkConnectionProperties": null,
  "provisioningState": "Succeeded",
  "publicIpAddress": null,
  "resourceGroup": "focal-ha",
  "subnet": {
…
    "id": "/xxxx/virtualNetworks/focal-ha-subnet/subnets/public",
…  },
  "type": "Microsoft.Network/networkInterfaces/ipConfigurations",
  "virtualNetworkTaps": null
}

And then the private subnet of the focal-ha-subnet virtual network:

$ az network nic ip-config create --name vm01private --resource-group focal-ha --nic-name vm01private --make-primary --private-ip-address 10.250.96.11 --private-ip-address-version IPv4 --subnet private --vnet-name focal-ha-subnet

$ az network nic ip-config create --name vm02private --resource-group focal-ha --nic-name vm02private --make-primary --private-ip-address 10.250.96.12 --private-ip-address-version IPv4 --subnet private --vnet-name focal-ha-subnet

$ az network nic ip-config create --name vm03private --resource-group focal-ha --nic-name vm03private --make-primary --private-ip-address 10.250.96.13 --private-ip-address-version IPv4 --subnet private --vnet-name focal-ha-subnet

Let’s not forget that our gateway virtual machine also needs an IP in the public subnet:

$ az network nic ip-config create --name gwpublic --resource-group focal-ha --nic-name gwpublic --make-primary --private-ip-address 10.250.92.254 --private-ip-address-version IPv4 --subnet public --vnet-name focal-ha-subnet

Okay so the situation now is this: we have the following Network interfaces:

- gwpublic: 2 ip configurations: gwpublic (10.250.92.254) and ipconfig1 (dynamic)
- vm01public: 2 ip configurations: vm01public (10.250.92.11) and ipconfig1 (dynamic)
- vm02public: 2 ip configurations: vm02public (10.250.92.12) and ipconfig1 (dynamic)
- vm03public: 2 ip configurations: vm03public (10.250.92.13) and ipconfig1 (dynamic)
- vm01private: 2 ip configurations: vm01private (10.250.96.11) and ipconfig1 (dynamic)
- vm02private: 2 ip configurations: vm02private (10.250.96.12) and ipconfig1 (dynamic)
- vm03private: 2 ip configurations: vm03private (10.250.96.13) and ipconfig1 (dynamic)

Now we can just delete the ipconfig1 IP configuration from all the NICs:

$ for int in gwpublic vm01public vm02public vm03public vm01private vm02private vm03private; do az network nic ip-config delete --name ipconfig1 --resource-group focal-ha --nic-name $int; done

Assign a Public IP address to the gateway public interface

Remember we have created a Public IP address ? We can check by doing:

$ az network public-ip list --resource-group focal-ha
[
  {
… 
    "location": "eastus",
    "name": "public-ip",
    "provisioningState": "Succeeded",
    "publicIpAddressVersion": "IPv4",
    "publicIpAllocationMethod": "Dynamic",
    "publicIpPrefix": null,
    "resourceGroup": "focal-ha",
    "resourceGuid": "xxxx-b585-4350-b149-6f03ff44d571",
    "sku": {
      "name": "Basic"
    },
    "tags": null,
    "type": "Microsoft.Network/publicIPAddresses",
    "zones": null
  }
]

Let’s assign this Public IP address to the interface “gwpublic” we created:

$ az network nic ip-config update --name gwpublic --nic-name gwpublic --resource-group focal-ha --public-ip-address public-ip
{
  "applicationGatewayBackendAddressPools": null,
  "applicationSecurityGroups": null,
  "etag": "W/\"xxxx\"",
  "id": "xxxx/gwpublic/ipConfigurations/gwpublic",
  "loadBalancerBackendAddressPools": null,
  "loadBalancerInboundNatRules": null,
  "name": "gwpublic",
  "primary": true,
  "privateIpAddress": "10.250.92.254",
  "privateIpAddressVersion": "IPv4",
  "privateIpAllocationMethod": "Static",
  "privateLinkConnectionProperties": null,
  "provisioningState": "Succeeded",
  "publicIpAddress": {
    "ddosSettings": null,
    "dnsSettings": null,
    "etag": null,
    "id": "xxxx/publicIPAddresses/public-ip",
    "idleTimeoutInMinutes": null,
    "ipAddress": null,
    "ipConfiguration": null,
    "ipTags": null,
    "location": null,
    "name": null,
    "provisioningState": null,
    "publicIpAddressVersion": null,
    "publicIpAllocationMethod": null,
    "publicIpPrefix": null,
    "resourceGroup": "focal-ha",
    "resourceGuid": null,
    "sku": null,
    "tags": null,
    "type": null,
    "zones": null
  },
  "resourceGroup": "focal-ha",
  "subnet": {
    "addressPrefix": null,
    "addressPrefixes": null,
    "delegations": null,
    "etag": null,
    "id": "xxxx/focal-ha-subnet/subnets/public",
    "ipAllocations": null,
    "ipConfigurationProfiles": null,
    "ipConfigurations": null,
    "name": null,
    "natGateway": null,
    "networkSecurityGroup": null,
    "privateEndpointNetworkPolicies": null,
    "privateEndpoints": null,
    "privateLinkServiceNetworkPolicies": null,
    "provisioningState": null,
    "purpose": null,
    "resourceGroup": "focal-ha",
    "resourceNavigationLinks": null,
    "routeTable": null,
    "serviceAssociationLinks": null,
    "serviceEndpointPolicies": null,
    "serviceEndpoints": null
  },
  "type": "Microsoft.Network/networkInterfaces/ipConfigurations",
  "virtualNetworkTaps": null
}

Since we also want to access this environment easily we can give it an DNS alias:

$ az network public-ip update --dns-name gw-focal-ha --resource-group focal-ha --namepublic-ip
{
  "ddosSettings": null,
  "dnsSettings": {
    "domainNameLabel": "gw-focal-ha",
    "fqdn": "gw-focal-ha.eastus.cloudapp.azure.com",
    "reverseFqdn": null
  },
…
}

Now our to-be-created gateway virtual machine can be accessed by doing (from elsewhere, since Azure Cloud Shell does not allow SSH connections):

$ ssh rafaeldtinoco@gw-focal-ha.eastus.cloudapp.azure.com

Create an Ubuntu gateway VM first

Perfect. After configuring all needed networking interfaces, subnets, IP addresses, DNS aliases, we are ready to start creating our virtual machines. First let’s create the virtual machine that will serve us as a access gateway to our cluster:

$ az vm create --name gateway --resource-group focal-ha --admin-username rafaeldtinoco --authentication-type ssh --custom-data ./cloud-config.yaml --generate-ssh-keys --image "Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010" --nics gwpublic --size Standard_B1ms --os-disk-name gwosdisk --ssh-key-values ~/.ssh/id_rsa.pub
{- Finished ..
  "fqdns": "gw-focal-ha.eastus.cloudapp.azure.com",
  "id": "/subscriptions/6e9c2b6f-0262-4e54-9bd9-8b7458eec86b/resourceGroups/focal-ha/providers/Microsoft.Compute/virtualMachines/gateway",
  "location": "eastus",
  "macAddress": "00-0D-3A-9E-72-5B",
  "powerState": "VM running",
  "privateIpAddress": "10.250.92.254",
  "publicIpAddress": "40.114.80.30",
  "resourceGroup": "focal-ha",
  "zones": ""
}

Some things to notice here:

- I'm creating a default username with my own user and setting the authentication type to ssh. If you take a look at the custom-data file, the previous cloud-init yaml snippet I gave as example above, it contains my ssh key and a username declaration. This is what will allow me to ssh to this created virtual machine.
- Cloud-init contains all packages I would like to be installed by default - together with the Ubuntu Server cloud image - and some config files. It also contains commands disabling some services by default.
- Together with the VM creation command, I'm able to specify the nics that we previously created.
- It will take some time for all the cloud-init commands to finish, you can observe if the VM is ready, or not, by issuing:

$ ssh rafaeldtinoco@gw-focal-ha.eastus.cloudapp.azure.com
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1023-azure x86_64)
…

$ sudo cloud-init status
status: done

Create all the cluster nodes inside the Availability Group

Once you’re good with your recently created gateway VM, you can start creating the cluster VMs. The only difference will be that for these new machines we will have to place them in the availability group we have created before. This is important in order to avoid availability issues as you don’t want 2 of your 3 nodes to be powered off because of a sudden Azure maintenance, for example, right ?

$ az vm create --name vm01 --resource-group focal-ha --availability-set focal-ha-avail --admin-username rafaeldtinoco --authentication-type ssh --custom-data ./cloud-config.yaml --generate-ssh-keys --image "Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010" --nics vm01public --size Standard_B1ms --os-disk-name vm01osdisk --ssh-key-values ~/.ssh/id_rsa.pub
{- Finished ..
  "fqdns": "",
  "id": "/subscriptions/6e9c2b6f-0262-4e54-9bd9-8b7458eec86b/resourceGroups/focal-ha/providers/Microsoft.Compute/virtualMachines/vm01",
  "location": "eastus",
  "macAddress": "00-0D-3A-9E-39-0B",
  "powerState": "VM running",
  "privateIpAddress": "10.250.92.11",
  "publicIpAddress": "",
  "resourceGroup": "focal-ha",
  "zones": ""
}

$ az vm create --name vm02 --resource-group focal-ha --availability-set focal-ha-avail --admin-username rafaeldtinoco --authentication-type ssh --custom-data ./cloud-config.yaml --generate-ssh-keys --image "Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010" --nics vm02public --size Standard_B1ms --os-disk-name vm02osdisk --ssh-key-values ~/.ssh/id_rsa.pub

$ az vm create --name vm03 --resource-group focal-ha --availability-set focal-ha-avail --admin-username rafaeldtinoco --authentication-type ssh --custom-data ./cloud-config.yaml --generate-ssh-keys --image "Canonical:UbuntuServer:18_04-lts-gen2:18.04.202009010" --nics vm03public --size Standard_B1ms --os-disk-name vm03osdisk --ssh-key-values ~/.ssh/id_rsa.pub

Accessing the virtual machines

After ssh’ing into gw-focal-ha.eastus.cloudapp.azure.com you can edit gateway’s /etc/hosts file to something like:

#----
127.0.0.1 localhost
255.255.255.255 broadcasthost
#-
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
#-
10.250.92.11 vm01.public
10.250.92.12 vm02.public
10.250.92.13 vm03.public
10.250.96.11 vm01.private vm01
10.250.96.12 vm02.private vm02
10.250.96.13 vm03.private vm03

And this will allow to easily access the machines.

Now, as you may have noticed, we haven’t added the private network interfaces to our cluster machines yet. They will serve us as the cluster interconnect network (for the messaging layer, called Corosync).

Let’s do it now… but first, we have to stop-deallocate the 3 virtual machines:

$ az vm stop --resource-group focal-ha --name vm01
$ az vm stop --resource-group focal-ha --name vm02
$ az vm stop --resource-group focal-ha --name vm03

$ az vm deallocate --resource-group focal-ha --name vm01
$ az vm deallocate --resource-group focal-ha --name vm02
$ az vm deallocate --resource-group focal-ha --name vm03

Now we’re ready:

$ az vm nic add --nics vm01private --resource-group focal-ha --vm-name vm01 
[- Finished ..
  {
    "id": "/subscriptions/6e9c2b6f-0262-4e54-9bd9-8b7458eec86b/resourceGroups/focal-ha/providers/Microsoft.Network/networkInterfaces/vm01public",
    "primary": true,
    "resourceGroup": "focal-ha"
  },
  {
    "id": "/subscriptions/6e9c2b6f-0262-4e54-9bd9-8b7458eec86b/resourceGroups/focal-ha/providers/Microsoft.Network/networkInterfaces/vm01private",
    "primary": false,
    "resourceGroup": "focal-ha"
  }
]

$ az vm nic add --nics vm02private --resource-group focal-ha --vm-name vm02
$ az vm nic add --nics vm03private --resource-group focal-ha --vm-name vm03

And we can start all 3 virtual machines again:

$ az vm start --resource-group focal-ha --name vm01
$ az vm start --resource-group focal-ha --name vm02
$ az vm start --resource-group focal-ha --name vm03

Upgrading Bionic to Focal

At the moment I wrote this tutorial, there were no 20.04 images available at Microsoft Azure, so I chose to use the 18.04 images and show how to upgrade them to 20.04 as this is an easy step. I usually just change the file /etc/apt/sources.list and rename the Ubuntu release.

Having something like:

deb http://azure.archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse
#deb-src http://azure.archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse
deb http://azure.archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse
#deb-src http://azure.archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse
#deb http://azure.archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse
#deb-src http://azure.archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu focal-security main restricted universe multiverse
#deb-src http://security.ubuntu.com/ubuntu focal-security main restricted universe multiverse

is good enough. You can simply:

$ sudo apt-get update && sudo apt-get dist-upgrade

and reboot in order to have an upgraded Ubuntu.

Note: I had to run “dpkg --purge linux-azure-5.4-tools-5.4.0-1023” before doing an “apt-get -f install” to continue the upgrade. That because this package is a backport from Focal to Bionic and upgrade path does not exist, so it complain about a new package trying to overwrite files from an older one.

Another note: After upgrading, and rebooting, I have a small alias in my virtual machine to clean up all the cached packages so they don’t consume OS disk space:

alias cache="sudo mv /etc/apt/sources.list /tmp/sources.list ; sudo touch /etc/apt/sources.list ; sudo apt-get --purge autoremove -y ; sudo apt-get autoclean ; sudo apt-get clean ; sudo mv /tmp/sources.list /etc/apt/sources.list ; sudo apt-get update"

Tip: I usually remove all the /etc/default/grub.d/* files and create a single /etc/default/grub file with all things I want:

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX=""
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 earlyprintk=ttyS0 pti=off kpti=off nopcid noibrs noibpb spectre_v2=off nospec_store_bypass_disable l1tf=off elevator=noop apparmor=0"
GRUB_TERMINAL=serial
GRUB_SERIAL_COMMAND="serial --speed=9600 --unit=0 --word=8 --parity=no --stop=1"
GRUB_RECORDFAIL_TIMEOUT=30
GRUB_TIMEOUT_STYLE=countdown
GRUB_TIMEOUT=1

You can see that I’m disabling the HW mitigations for Spectre/Meltdown/MDS/… HW issues. My environment is just a lab and nobody will be accessing it, and I’m paying for it, let’s say I don’t want more overhead than needed. I don’t recommend disabling mitigations in an production environment.

$ sudo update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.0-1023-azure
Found initrd image: /boot/initrd.img-5.4.0-1023-azure
Adding boot menu entry for UEFI Firmware Settings
done

and after a reboot I have my virtual machine fully updated to Focal:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	focal

$ cache
…
<wait package removal and cache cleanup>

Observation: Don’t forget to do this in all 3 virtual machines (and the gateway, if you want).

Installing The Messaging Layer: Corosync

As many of you know, pacemaker is built on top of corosync messaging. This means that, before setting our resources and fence agents, we first have to make sure we have a messaging layer up and running.

Let’s install corosync in all cluster nodes:

[rafaeldtinoco@vm01 ~]$ sudo apt-get install -y corosync
…
[rafaeldtinoco@vm01 ~]$ sudo systemctl is-active corosync.service
active
[rafaeldtinoco@vm01 ~]$ sudo corosync-quorumtool 
Quorum information
------------------
Date:             Mon Sep  7 18:18:04 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          1.5
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 node1 (local)

[rafaeldtinoco@vm02 ~]$ sudo apt-get install -y corosync
…
[rafaeldtinoco@vm02 ~]$ sudo systemctl is-active corosync.service
active
[rafaeldtinoco@vm02 ~]$ sudo corosync-quorumtool 
…

[rafaeldtinoco@vm03 ~]$ sudo apt-get install -y corosync
…
[rafaeldtinoco@vm03 ~]$ sudo systemctl is-active corosync.service
active
[rafaeldtinoco@vm03 ~]$ sudo corosync-quorumtool 
…

In Ubuntu, corosync starts in a standalone 1-node cluster so the output for “corosync-quorumtool” should be similar to the one showed above.

Before moving on and trying to setup anything else, let’s first make sure we can establish a fully operational corosync cluster with the nodes “vm01, vm02 and vm03”.

In order to achieve that, we first have to generate a corosync key in just one node:

[rafaeldtinoco@vm01 ~]$ corosync-keygen 
Corosync Cluster Engine Authentication key generator.
Gathering 2048 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.

and copy to the other nodes:

[rafaeldtinoco@vm01 ~]$ sudo scp /etc/corosync/authkey root@vm02:/etc/corosync/authkey
[rafaeldtinoco@vm01 ~]$ sudo scp /etc/corosync/authkey root@vm03:/etc/corosync/authkey

All nodes should have also the same /etc/corosync/corosync.conf file with the following:

totem {
    version: 2
    secauth: off
    cluster_name: clufocal
    # transport: udpu
    transport: knet
    interface {
        linknumber: 0
        knet_transport: udp
        knet_link_priority: 0
    }
}

nodelist {
    node {
        ring0_addr: 10.250.96.11
        name: vm01
        nodeid: 1
    }
    node {
        ring0_addr: 10.250.96.12
        name: vm02
        nodeid: 2
    }
    node {
        ring0_addr: 10.250.96.13
        name: vm03
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 0
}

qb {
    ipc_type: native
}

logging {
    fileline: on
    to_stderr: on
    to_logfile: yes
    logfile: /var/log/corosync/corosync.log
    to_syslog: no
    debug: off
}

This to notice in this corosync.conf file: Our cluster name will be “clufocal”, we are not going to use the old “udpu” transport mode but the new kronosnet one, we describe each node and inform their PRIVATE interface address to be in the Totem protocol setup (the messaging protocol).

Now we can restart corosync by issuing at each node:

$ sudo systemctl restart corosync.service

and then verifying if we were able to put all 3 nodes together in a corosync cluster AND if the new formed cluster is “quorate”:

[rafaeldtinoco@vm01 ~]$ sudo corosync-quorumtool -ai
Quorum information
------------------
Date:             Tue Sep  8 00:27:55 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1.1f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.250.96.11 (local)
         2          1 10.250.96.12
         3          1 10.250.96.13

As you may know, in a cluster we should always have an odd number of votes. Usually votes are given by a node in the cluster, like in this case - 3 nodes, 3 votes - but, sometimes, a vote can also be given by a remote daemon, or by a shared storage, allowing a cluster to have an even number of nodes.

The votes serve as a mechanism to allow the cluster to correctly identify what to do in the case of a split. Now things start to make sense, right ? Going back to “Create an Availability Set” session, you will remember that we had asked Azure for 3 different fault domains and 3 diff upgrade domains.

As our cluster will be a 3 node cluster setup, our virtual machines are placed each one in a different fault/upgrade domain within the Azure Availability Set (cluster-focal-aval). This will prevent a case where 1 upgrade (or fault) domain is taken out, for example, and takes 2 of our nodes off: this would make our cluster to fail entirely as we need at least 2 votes in order to have a functional cluster.

We’re now ready to move on and configure the resource agent now.

Installing the Resource Agent: Pacemaker

Let’s install altogether: the resource manager, it’s command line interfaces, the crmsh interface, the resource and fencing agents:

$ sudo apt-get install -y pacemaker pacemaker-cli-utils crmsh resource-agents fence-agents
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  cluster-glue docutils-common libcib27 libcrmcluster29 libcrmcommon34 libcrmservice28
  libdbus-glib-1-2 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2 liblrm2 liblrmd28
  libmysqlclient21 libnet-telnet-perl libnet1 libopenhpi3 libopenipmi0 libpacemaker1 libpaper-utils
  libpaper1 libpe-rules26 libpe-status28 libpils2 libplumb2 libplumbgpl2 libsensors-config
  libsensors5 libsnmp-base libsnmp35 libstonith1 libstonithd26 libtiff5 libtimedate-perl libwebp6
  libwebpdemux2 libwebpmux3 libxml2-utils mysql-common net-tools openhpid pacemaker-common
  pacemaker-resource-agents python3-boto3 python3-botocore python3-bs4 python3-cachetools
  python3-dateutil python3-docutils python3-fasteners python3-google-auth
  python3-google-auth-httplib2 python3-googleapi python3-html5lib python3-jmespath python3-lxml
  python3-monotonic python3-oauth2client python3-olefile python3-parallax python3-pexpect
  python3-pil python3-ptyprocess python3-pycurl python3-pygments python3-roman python3-rsa
  python3-s3transfer python3-soupsieve python3-sqlalchemy python3-sqlalchemy-ext python3-suds
  python3-uritemplate python3-webencodings sgml-base snmp xml-core
…

Make sure to install those in all cluster nodes. In Ubuntu, pacemaker will be “ready” (enabled and started) right after the installation and you could check that by issuing the following command in all servers:

[rafaeldtinoco@vm01 ~]$ systemctl is-active pacemaker.service 
active
[rafaeldtinoco@vm02 ~]$ systemctl is-active pacemaker.service 
active
[rafaeldtinoco@vm03 ~]$ systemctl is-active pacemaker.service 
active

or by start using the crmsh tool, the preferred tool to manage a pacemaker cluster in Ubuntu:

$ sudo crm status
Cluster Summary:

  • Stack: corosync
  • Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  • Last updated: Mon Sep 7 18:44:10 2020
  • Last change: Mon Sep 7 18:42:55 2020 by hacluster via crmd on vm02
  • 3 nodes configured
  • 0 resource instances configured

Node List:

  • Online: [ vm01 vm02 vm03 ]

Full List of Resources:

  • No resources

As you can see here, pacemaker is “online”, with all nodes active, and that was expected since we had corosync, pacemaker messaging layer, up and running. Now it is time for us configure the basic pacemaker settings AND the fencing agent.

Pacemaker: Configuring Fencing Agent Requirements

As most of you, already used to HA clusters, might be aware, a cluster without a fencing mechanism is not a cluster you should put much faith on. Pacemaker has a mechanism called STONITH (Shoot The Other Node In The Head) which takes care of isolating faulty nodes, so they don’t access the same resources as the quorate part of a cluster, in the case the cluster is split because of a network problem, for example.

There are 2 fencing mechanisms supported at Microsoft Azure:

1. Azure Resource Manager (ARM)
2. Shared SCSI Disk Fence

and, as I have already describe how to configure (2) Shared SCSI Disk Fence in another post, I’ll concentrate efforts in describing the (1) Azure Resource Manager (ARM) Fence mechanism in this article. Please note that independently of the fencing mechanism, I’ll still use the Shared SCSI Disk Feature that Microsoft Azure provides, in order to migrate data from one node to the other.

The ARM fencing agent requires some python modules not yet available in Ubuntu as packages, so, in order to acquire those modules, we have to use the Python PIP tool. Let’s install it in all nodes:

$ apt-get install -y python3-pip
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp cpp-9 dpkg-dev fakeroot
  g++ g++-9 gcc gcc-9 gcc-9-base libalgorithm-diff-perl libalgorithm-diff-xs-perl
  libalgorithm-merge-perl libasan5 libatomic1 libbinutils libc-dev-bin libc6-dev libcc1-0
  libcrypt-dev libctf-nobfd0 libctf0 libdpkg-perl libexpat1-dev libfakeroot libfile-fcntllock-perl
  libgcc-9-dev libgomp1 libisl22 libitm1 liblsan0 libmpc3 libpython3-dev libpython3.8-dev
  libquadmath0 libstdc++-9-dev libtsan0 libubsan1 linux-libc-dev make manpages-dev python-pip-whl
  python3-dev python3-wheel python3.8-dev zlib1g-dev
…

After having PIP install we can finally install Azure python modules in all nodes:

$ sudo pip3 install azure-storage-blob
$ sudo pip3 install azure-mgmt-storage
$ sudo pip3 install azure-mgmt-compute

and make sure your python modules are installed:

$ for module in azure-storage-blob azure-mgmt-storage azure-mgmt-compute; do sudo pip3 show $module; done

Name: azure-storage-blob
Version: 12.4.0
Summary: Microsoft Azure Blob Storage Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob
Author: Microsoft Corporation
Author-email: ascl@microsoft.com
License: MIT License
Location: /usr/local/lib/python3.8/dist-packages
Requires: cryptography, azure-core, msrest
Required-by: 

Name: azure-mgmt-storage
Version: 11.2.0
Summary: Microsoft Azure Storage Management Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python
Author: Microsoft Corporation
Author-email: azpysdkhelp@microsoft.com
License: MIT License
Location: /usr/local/lib/python3.8/dist-packages
Requires: msrest, msrestazure, azure-common
Required-by: 

Name: azure-mgmt-compute
Version: 13.0.0
Summary: Microsoft Azure Compute Management Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python
Author: Microsoft Corporation
Author-email: azpysdkhelp@microsoft.com
License: MIT License
Location: /usr/local/lib/python3.8/dist-packages
Requires: msrestazure, azure-common, msrest
Required-by:

Note: From Ubuntu Focal and on, Python 3 is the default python installed but, in order to differentiate it from the Python 2, we always have to refer it as “python3”.

Pacemaker: Configuring Fencing Agent Roles

Without getting into what is a custom role and how they work in Microsoft Azure Cloud, let’s stick with what we need only, in order to get the fence agent working.

Create a file like this:

$ cat fence-agent-role.json 
{
  "Name": "fence-ha-role-fencing",
  "Id": null,
  "IsCustom": true,
  "Description": "fence-ha-role-fencing",
  "Actions": [
    "Microsoft.Compute/*/read",
    "Microsoft.Compute/virtualMachines/powerOff/action",
    "Microsoft.Compute/virtualMachines/start/action"
  ],
  "NotActions": [
  ],
  "AssignableScopes": [
    "/subscriptions/<subscriptionId>"
  ]
}

In order to easily obtain your subscriptionId you can execute:

$ az account list --output table

Create the Custom Role Definition by issuing:

$ az role definition create --role-definition ./fence-agent-role.json 
{
  "assignableScopes": [
    "/subscriptions/6e9c2b6f-0262-4e54-9bd9-8b7458eec86b"
  ],
  "description": "fence-ha-role-fencing",
  "id": "/xxxx/0436c3ca-9c09-4b39-b657-c22713e366f8",
  "name": "xxxx-9c09-4b39-b657-c22713e366f8",
  "permissions": [
    {
      "actions": [
        "Microsoft.Compute/*/read",
        "Microsoft.Compute/virtualMachines/powerOff/action",
        "Microsoft.Compute/virtualMachines/start/action"
      ],
      "dataActions": [],
      "notActions": [],
      "notDataActions": []
    }
  ],
  "roleName": "fence-ha-role-fencing",
  "roleType": "CustomRole",
  "type": "Microsoft.Authorization/roleDefinitions"
}

Register a new application in Azure Active Directory:

  1. Go to http://portal.azure.com
  2. Open the Azure Active Directory Blade
  3. Go to Properties and write down the Directory ID (This is the tenandID)
  4. Click App registrations
  5. Click New registration
  6. Enter a Name like Focal-HA-fence-app
  7. Select “Accounts in this organization directory only”
  8. Select Application Type Web and enter “http://localhost” and click Add.
  9. Click Register.
  10. Select Certificates and secrets and click in New client secret
  11. Describe the new key (client secret). Select Never expires and click Add.
  12. Write down the value of the secret, it’s the password for the fence agent.
  13. Select Overview. Write down the Application ID, it is the username for the fence agent.

If you did everything right, you should have a list somewhere with something like this:

Variable Content
Subscription ID ffffffff-0262-4e54-9bd9-8b7458eec86b
Tenant ID ffffffff-e8f6-40ae-8875-da47c934f1c1
Password (App secret Value) 4ffff.F~ffffrc8-CZ_Wu~o6wffffffff_
Username (App ID) ffffffff-2b92-4ca1-87f2-0fb865c64635

(without all the 'f’s used to obscure mine). You will need all this information to configure the fence agent in pacemaker (so it has proper rights to power off the virtual machines when needed).

After having the Custom Role created, you need to:

  1. Go to https://portal.azure.com
  2. Open the All Resources Blade
  3. Select vm01
  4. Click Access control (IAM) on the left menu
  5. Click Add a role assignment
  6. Select the role “fence-ha-role-fencing” in the Role dropdown
  7. Start writing “Focal-HA” in the Select dropdown and you will see the app created Focal-HA-fence-app.
  8. Save

Repeat the same step for vm02 and vm03.

Pacemaker: Configuring the Fencing Agent

Okay so, until now, we have:

- Installed needed python modules for the fence_azure_arm agent to work
- Created a custom role with proper VM start/shutdown roles
- Created an App able to start/shutdown the VMs using the custom role

Now it is time to configure pacemaker initial variables. Whenever configuring pacemaker, Ubuntu recommends you to use the “crmsh” utility for the configuration and administration of your pacemaker cluster. There is also another tool called pcs, mostly used in other OSes, that could also be used but it is out of the scope of this tutorial (and crmsh is the “default recommended tool” for Ubuntu HA anyways).

Let’s begin with some initial commands in “crm configure” in a single node (pacemaker will replicate the cluster configuration to all nodes, so you just have to configure it in a single node):

[rafaeldtinoco@vm01 ~]$ crm configure
crm(live/vm01)configure# property stonith-enabled=on
crm(live/vm01)configure# property stonith-action=reboot
crm(live/vm01)configure# property no-quorum-policy=stop
crm(live/vm01)configure# have-watchdog=false
crm(live/vm01)configure# commit
crm(live/vm01)configure# quit
bye

[rafaeldtinoco@vm01 ~]$ crm config show
node 1: vm01
node 2: vm02
node 3: vm03
property cib-bootstrap-options: \
	have-watchdog=false \
	dc-version=2.0.3-4b1f869f0f \
	cluster-infrastructure=corosync \
	cluster-name=clufocal \
	stonith-enabled=on \
	stonith-action=reboot \
	no-quorum-policy=suicide

We have enabled stonith and told that its default action is to reboot the node to be fenced - if the fence agent we configured supports it. It also told that, in case the cluster does not have quorum, all nodes in the affected partition - the one without quorum - should suicide.

Next step is to configure the agent. Pacemaker has a bunch of fence agents available and you can list them by issuing:

[rafaeldtinoco@vm01 ~]$ stonith_admin -I | grep -I arm
fence_azure_arm

First let’s configure stonith-timeout parameter:

[rafaeldtinoco@vm01 ~]$ crm configure property stonith-timeout=900

We are going to configure 3 “resource agents” in the cluster now. It is actually a “fence” agent, but, despite the arguments given to the agent during the configuration, there is no big difference among the two when configuring them.

I prefer using the command “crm configure edit” and editing my resource creation command directly there, I find it easier. But you can use “crm configure” and give a single command also, it works the same.

node 1: vm01
node 2: vm02
node 3: vm03
primitive fence-vm01 stonith:fence_azure_arm \
    params \
        action=reboot \
        plug=vm01 \
        resourceGroup="focal-ha" \
        username="ffffffff-2b92-4ca1-87f2-0fb865c64635" \
        login="ffffffff-2b92-4ca1-87f2-0fb865c64635" \
        passwd="4ffff.F~ffffrc8-CZ_Wu~o6wffffffff_" \
        tenantId="ffffffff-e8f6-40ae-8875-da47c934f1c1" \
        subscriptionId="ffffffff-0262-4e54-9bd9-8b7458eec86b" \
        pcmk_reboot_timeout=900 \
    \
    op monitor \
        interval=3600 \
        timeout=120
property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=2.0.3-4b1f869f0f \
    cluster-infrastructure=corosync \
    cluster-name=clufocal \
    stonith-enabled=on \
    stonith-action=reboot \
    no-quorum-policy=suicide \
    stonith-timeout=900

Right after crmsh exists, you will be able to see the fence agent for vm01 online:

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Mon Sep  7 23:08:46 2020
  * Last change:  Mon Sep  7 23:08:44 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ vm01 vm02 vm03 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm01

But obviously we need more settings. We can’t run the fence agent in the same node to be fenced, it makes no sense, right ? If my cluster is split… I want to make sure the remaining nodes with quorum (enough # of votes) to “SHOOT THE REMAINING NODE IN THE HEAD”… so the agent has to be running at this side of the fence, not the other.

[rafaeldtinoco@vm01 ~]$ crm configure edit
 
"""
node 1: vm01
node 2: vm02
node 3: vm03
primitive fence-vm01 stonith:fence_azure_arm \
    params \
        action=reboot \
        plug=vm01 \
        resourceGroup="focal-ha" \
        username="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        login="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        passwd="44jug.F~7b4Irc8-CZ_Wu~o6w2N6kZUsT_" \
        tenantId="2fa63a52-e8f6-40ae-8875-da47c934f1c1" \
        subscriptionId="6e9c2b6f-0262-4e54-9bd9-8b7458eec86b" \
        pcmk_reboot_timeout=900 \
    \
    op monitor \
        interval=3600 \
        timeout=120
primitive fence-vm02 stonith:fence_azure_arm \
    params \
        action=reboot \
        plug=vm02 \
        resourceGroup="focal-ha" \
        username="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        login="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        passwd="44jug.F~7b4Irc8-CZ_Wu~o6w2N6kZUsT_" \
        tenantId="2fa63a52-e8f6-40ae-8875-da47c934f1c1" \
        subscriptionId="6e9c2b6f-0262-4e54-9bd9-8b7458eec86b" \
        pcmk_reboot_timeout=900 \
    \
    op monitor \
        interval=3600 \
        timeout=120
primitive fence-vm03 stonith:fence_azure_arm \
    params \
        action=reboot \
        plug=vm03 \
        resourceGroup="focal-ha" \
        username="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        login="0bf35c49-2b92-4ca1-87f2-0fb865c64635" \
        passwd="44jug.F~7b4Irc8-CZ_Wu~o6w2N6kZUsT_" \
        tenantId="2fa63a52-e8f6-40ae-8875-da47c934f1c1" \
        subscriptionId="6e9c2b6f-0262-4e54-9bd9-8b7458eec86b" \
        pcmk_reboot_timeout=900 \
    \
    op monitor \
        interval=3600 \
        timeout=120
location l-fence-vm01 fence-vm01 -inf: vm01
location l-fence-vm02 fence-vm02 -inf: vm02
location l-fence-vm03 fence-vm03 -inf: vm03
property cib-bootstrap-options: \
	have-watchdog=false \
	dc-version=2.0.3-4b1f869f0f \
	cluster-infrastructure=corosync \
	cluster-name=clufocal \
	stonith-enabled=on \
	stonith-action=reboot \
	no-quorum-policy=suicide \
	stonith-timeout=900
"""

Right after adding fence-vm02 and fence-vm03, we also have to create locations (l-fence-vm01, l-fence-vm02, l-fence-vm03) so the resources are placed where we would like them to be (in our case excluding nodes to be fenced by the resource/fence agent).

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Sep  8 00:17:09 2020
  * Last change:  Tue Sep  8 00:15:57 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ vm01 vm02 vm03 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm02
  * fence-vm02	(stonith:fence_azure_arm):	 Started vm01
  * fence-vm03	(stonith:fence_azure_arm):	 Started vm01

Pacemaker: Testing the fence-agent

One of the most important part of the cluster setup is to make sure the fencing mechanism works. Whenever a failure occur, we have to be 100% sure that the cluster police engine will be able to take correct decisions and all of them depend on fencing to work (in a typical active/passive scenario).

There are some different ways to test the fence agents, one of them is to call the fence_XXXX binary/script by hand passing the correct arguments (the same ones declared in the pacemaker configuration). I prefer to test the fence agent after it is already configured in the pacemaker, making the cluster to take a fence action just like it would in a real scenario. In order to obtain this I simply block all network traffic to the private network (the cluster interconnect) and watch pacemaker to take needed actions.

Let’s try:

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Sep  8 01:30:58 2020
  * Last change:  Tue Sep  8 01:30:23 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ vm01 vm02 vm03 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm02
  * fence-vm02	(stonith:fence_azure_arm):	 Started vm01
  * fence-vm03	(stonith:fence_azure_arm):	 Started vm01

Cluster is working fine, let’s stop the heartbeat in vm03:

[rafaeldtinoco@vm03 ~]$ sudo iptables -A INPUT -i eth1 -j DROP

Cluster realizes right away and starts the fencing action on vm02 (activating fence-vm03 in order to fence the vm03 virtual machine). It has not fenced vm03 yet, but the reboot of vm03 is already scheduled:

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Sep  8 01:31:01 2020
  * Last change:  Tue Sep  8 01:30:23 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Node vm03: UNCLEAN (offline)
  * Online: [ vm01 vm02 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm02
  * fence-vm02	(stonith:fence_azure_arm):	 Started vm01
  * fence-vm03	(stonith:fence_azure_arm):	 Started vm01

Pending Fencing Actions:
  * reboot of vm03 pending: client=pacemaker-controld.3921, origin=vm02

With no pending actions within the resource manager, we can check that vm03 is gone:

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Sep  8 01:31:04 2020
  * Last change:  Tue Sep  8 01:30:23 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Node vm03: UNCLEAN (offline)
  * Online: [ vm01 vm02 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm02
  * fence-vm02	(stonith:fence_azure_arm):	 Started vm01
  * fence-vm03	(stonith:fence_azure_arm):	 Started vm01

Finally vm03 gets back online:

[rafaeldtinoco@vm03 ~]$ uptime
 01:32:09 up 0 min,  1 user,  load average: 1.79, 0.57, 0.20

And cluster is back on track:

[rafaeldtinoco@vm01 ~]$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: vm02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Sep  8 01:32:03 2020
  * Last change:  Tue Sep  8 01:30:23 2020 by root via cibadmin on vm01
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ vm01 vm02 vm03 ]

Full List of Resources:
  * fence-vm01	(stonith:fence_azure_arm):	 Started vm02
  * fence-vm02	(stonith:fence_azure_arm):	 Started vm01
  * fence-vm03	(stonith:fence_azure_arm):	 Started vm01

TO BE CONTINUED…

TODO: I still need to finish configuration of the shared disk and the cluster resources. To be finished soon.