Summary
This tutorial demonstrates how to set up MS SQL Server on Ubuntu instances running on Azure in High Availability mode.
Warning: MS SQL Server is not yet generally available for 24.04. These instructions have been updated in advance to include 24.04, but will not work until this the case. You can instead follow the steps for older releases for now.
Prerequisites
It is assumed that:
- The user is running bash locally (ie. a Linux shell).
- The Azure CLI client is available locally and is already authenticated. If you need help with this, see Appendix A for a quick start guide on setting this up.
- Three VMs running Ubuntu (18.04, 20.04, 22.04 or 24.04) have been instantiated in Azure with internal IPs
10.0.0.11
,10.0.0.12
and10.0.0.13
, with ssh and scp to them as both theroot
andubuntu
users working and accessible via the namesvm1
,vm2
andvm3
respectively. If you need help with this, see Appendix B for a quick start guide on setting this up.
Step 1: Install HA software
ssh into vm1
with the command ssh vm1
.
On Ubuntu 24.04 “Noble Numbat”:
Type the following commands to install the required software:
sudo apt-get update
sudo apt-get install -y pacemaker fence-agents-azure-arm
On Ubuntu 20.04 “Focal Fossa” or Ubuntu 22.04 “Jammy Jellyfish”:
Type the following commands to install the required software:
sudo apt-get update
sudo apt-get install -y corosync pacemaker pacemaker-cli-utils crmsh resource-agents fence-agents python3-azure
On Ubuntu 18.04 “Bionic Beaver”:
On 18.04, use of a PPA is required. Enable it and then install packages using the following commands:
sudo add-apt-repository -y ppa:canonical-server/azure-sql-ha
sudo apt-get install -y corosync pacemaker pacemaker-cli-utils crmsh resource-agents fence-agents python3-azure python-adal
Type exit
to leave ssh, and repeat the above steps for the other two VMs.
Step 2a: Configure the cluster (for 24.04 or later only)
Delete the default cluster from each VM:
ssh vm1 sudo pcs cluster destroy
ssh vm2 sudo pcs cluster destroy
ssh vm3 sudo pcs cluster destroy
Set a cluster password on each VM:
echo hacluster:demopassword|ssh vm1 sudo chpasswd
echo hacluster:demopassword|ssh vm2 sudo chpasswd
echo hacluster:demopassword|ssh vm3 sudo chpasswd
Refer to Appendix C on the creation of an Azure service principal and on granting a suitable role assignment to that service principal so that it has permission to fence the cluster nodes. Once done, you should have expansions for the six variables in the command below that are required to be supplied to the Azure fence agent.
ssh into vm1
with the command ssh vm1
. Then type the following commands, expanding the six variables as needed. If you have set shell variables according to Appendix C, then typing these commands as-is will expand them automatically.
sudo pcs host auth -u hacluster -p demopassword vm{1,2,3}
sudo pcs cluster setup demo vm{1,2,3} transport udpu --enable --start --wait=30
sudo pcs property set stonith-enabled=on
sudo pcs property set stonith-action=reboot
sudo pcs property set no-quorum-policy=stop
sudo pcs property set stonith-timeout=900
sudo pcs stonith create fence-vm fence_azure_arm \
resourceGroup="$resource_group" \
username="$secret_id" \
login="$application_id" \
passwd="$password" \
tenantId="$tenant_id" \
subscriptionId="$subscription_id" \
pcmk_reboot_timeout=900 \
power_timeout=60 \
op monitor \
interval=3600 \
timeout=120
Type exit
to leave ssh.
Verify the status of the cluster:
ssh vm1 sudo pcs status
You should see output like the following:
Cluster name: demo
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: vm3 (version 2.1.6-6fdc9deea29) - partition with quorum
* Last updated: Tue Apr 30 18:53:13 2024 on vm1
* Last change: Tue Apr 30 18:51:43 2024 by root via cibadmin on vm1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ vm1 vm2 vm3 ]
Full List of Resources:
* fence-vm (stonith:fence_azure_arm): Started vm1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Step 2b: Configure the cluster (for 22.04 and earlier only)
Corosync
Create a temporary local file called corosync.conf
using nano. Run nano corosync.conf
and then enter the following contents. When done, press Ctrl-X
, agree to save by pressing y
, then press Return
to accept the default corosync.conf
filename as specified earlier. Your corosync.conf
file should now look like this, which you can verify with the command cat corosync.conf
:
totem {
version: 2
secauth: off
cluster_name: demo
transport: udpu
}
nodelist {
node {
ring0_addr: 10.0.0.11
name: vm1
nodeid: 1
}
node {
ring0_addr: 10.0.0.12
name: vm2
nodeid: 2
}
node {
ring0_addr: 10.0.0.13
name: vm3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
two_node: 0
}
qb {
ipc_type: native
}
logging {
fileline: on
to_stderr: on
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: no
debug: off
}
Copy the local file corosync.conf
to the three VMs:
scp corosync.conf root@vm1:/etc/
scp corosync.conf root@vm2:/etc/
scp corosync.conf root@vm3:/etc/
Generate /etc/corosync/authkey
on the first VM:
ssh vm1 sudo corosync-keygen
Copy the generated authkey
file temporarily to the local system, and then from there to the other two VMs. We use -p
to preserve the permissions that make the file available only to the root
user:
scp -p root@vm1:/etc/corosync/authkey ./
scp -p authkey root@vm2:/etc/corosync/
scp -p authkey root@vm3:/etc/corosync/
The local copies of the files corosync.conf
and authkey
are no longer required and you can delete them:
rm -f corosync.conf authkey
Restart corosync on all three VMs:
ssh vm1 sudo systemctl restart corosync
ssh vm2 sudo systemctl restart corosync
ssh vm3 sudo systemctl restart corosync
Verify the status of the cluster:
ssh vm1 sudo crm status
You should see output like the following:
Cluster Summary:
* Stack: corosync
* Current DC: vm1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Tue Feb 22 15:38:36 2022
* Last change: Tue Feb 22 15:32:45 2022 by root via cibadmin on vm1
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ vm1 vm2 vm3 ]
Full List of Resources:
* No resources
Pacemaker
ssh into vm1
with the command ssh vm1
. Then type the following commands:
sudo crm configure property stonith-enabled=on
sudo crm configure property stonith-action=reboot
sudo crm configure property no-quorum-policy=stop
sudo crm configure property stonith-timeout=900
sudo crm configure property have-watchdog=false
Refer to Appendix C on the creation of an Azure service principal and on granting a suitable role assignment to that service principal so that it has permission to fence the cluster nodes. Once done, you should have expansions for the six variables in the command below that are required to be supplied to the Azure fence agent.
Now run the following command on the first node, expanding the six variables as needed. If you have set shell variables according to Appendix C, then typing this command as-is will expand them automatically.
sudo crm configure primitive fence-vm \
stonith:fence_azure_arm \
params \
action=reboot \
resourceGroup="$resourceGroup" \
username="$secretId" \
login="$applicationId" \
passwd="$password" \
tenantId="$tenantId" \
subscriptionId="$subscriptionId" \
pcmk_reboot_timeout=900 \
power_timeout=60 \
op monitor \
interval=3600 \
timeout=120
Type exit
to leave ssh.
Verify the status of the cluster:
ssh vm1 sudo crm status
You should see output like the following:
Cluster Summary:
* Stack: corosync
* Current DC: vm1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Feb 23 13:04:51 2022
* Last change: Wed Feb 23 12:59:43 2022 by root via cibadmin on vm1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ vm1 vm2 vm3 ]
Full List of Resources:
* fence-vm (stonith:fence_azure_arm): Started vm1
Step 3: Install and Configure SQL Server
Follow instructions in Appendix D to install and configure SQL Server itself and to create a suitable availability group.
On 24.04 and later only
Connect Pacemaker to manage SQL Server’s availability group ag1
by running ssh vm1
and then the following command:
sudo pcs resource create ag1_cluster ocf:mssql:ag \
ag_name="ag1" meta failure-timeout=60s \
op start timeout=60s \
op stop timeout=60s \
op promote timeout=60s \
op demote timeout=10s \
op monitor timeout=60s interval=10s \
op monitor timeout=60s interval=11s role="Master" \
op monitor timeout=60s interval=12s role="Slave" \
op notify timeout=60s
sudo pcs resource promotable ag1_cluster \
promoted-max=1 \
promoted-node-max=1 \
clone-max=3 \
clone-node-max=1 \
notify=true
Type exit
to leave ssh.
On 22.04 or earlier only
Connect Pacemaker to manage SQL Server’s availability group ag1
by running ssh vm1
and then the following command:
sudo crm configure <<EOT
primitive ag1_cluster ocf:mssql:ag \
params ag_name="ag1" meta failure-timeout=60s \
op start timeout=60s \
op stop timeout=60s \
op promote timeout=60s \
op demote timeout=10s \
op monitor timeout=60s interval=10s \
op monitor timeout=60s interval=11s role="Master" \
op monitor timeout=60s interval=12s role="Slave" \
op notify timeout=60s
ms ms-ag1_cluster ag1_cluster \
meta master-max="1" master-node-max="1" clone-max="3" \
clone-node-max="1" notify="true"
commit
EOT
Type exit
to leave ssh.
Step 4: Verification
Verify the status of the cluster. If using 24.04 or later:
ssh vm1 sudo pcs status
or if using 22.04 or earlier:
ssh vm1 sudo crm status
You should see output like the following:
Cluster Summary:
* Stack: corosync
* Current DC: vm1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Feb 23 13:22:10 2022
* Last change: Wed Feb 23 13:19:00 2022 by root via cibadmin on vm1
* 3 nodes configured
* 4 resource instances configured
Node List:
* Online: [ vm1 vm2 vm3 ]
Full List of Resources:
* fence-vm (stonith:fence_azure_arm): Started vm1
* Clone Set: ms-ag1_cluster [ag1_cluster] (promotable):
* Masters: [ vm1 ]
* Slaves: [ vm2 vm3 ]
Type ssh vm1
and then /opt/mssql-tools/bin/sqlcmd -S localhost -U SA
to enter the SQL monitor. Then type the following SQL command:
SELECT synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
GO
You should see the following output:
synchronization_state_desc
------------------------------------------------------------
SYNCHRONIZED
SYNCHRONIZED
SYNCHRONIZED
(3 rows affected)
Type exit
to leave the SQL monitor and exit
again to leave ssh.
Conclusion
You now have SQL Server set up with an availability group that is fully synchronized, with status and fencing managed by Corosync and Pacemaker.
Appendix A: Prepare the Azure CLI
Follow Microsoft’s instructions to install the Azure CLI. Once done, the az
command should be available to use.
The Azure CLI needs to be signed in to access Azure resources. The simplest way is to run az login
. For more details, see Microsoft’s instructions here.
Appendix B: Create and configure virtual machines
This tutorial requires three VMs named vm1
, vm2
and vm3
available to each other internally on the IP addresses 10.0.0.11
, 10.0.0.12
and 10.0.0.13
, with ssh and scp to them available from the outside using both the ubuntu
and root
users. A quick way to achieve this is with the following steps. See Microsoft’s tutorial on creating a virtual network and deploying virtual machines for more details.
Create the virtual machines
Run the following commands:
resourceGroup=tutorial-rg
az group create --name $resourceGroup --location eastus
az network vnet create \
--name myVNet \
--resource-group "$resourceGroup" \
--address-prefixes 10.0.0.0/16 \
--subnet-name myBackendSubnet \
--subnet-prefixes 10.0.0.0/24
az vm create \
--resource-group “$resourceGroup” \
--name vm1 \
--image UbuntuLTS \
--vnet-name myVNet \
--subnet myBackendSubnet \
--generate-ssh-keys \
--private-ip-address 10.0.0.11
az vm create \
--resource-group “$resourceGroup” \
--name vm2 \
--image UbuntuLTS \
--vnet-name myVNet \
--subnet myBackendSubnet \
--generate-ssh-keys \
--private-ip-address 10.0.0.12
az vm create \
--resource-group “$resourceGroup” \
--name vm3 \
--image UbuntuLTS \
--vnet-name myVNet \
--subnet myBackendSubnet \
--generate-ssh-keys \
--private-ip-address 10.0.0.13
As you create the VMs, make a note of each publicIpAddress
given in the output. To reach a particular VM over ssh, you can then use ssh ubuntu@publicIPAddress
. To leave an ssh session, use the command exit
.
Configure virtual machines
Permitting direct root login is the easiest way to transfer configuration files for the purposes of this tutorial. In production, configuration management tools should be used instead. Enable direct root login by ssh-ing into each VM in turn and running the following command:
sudo cp ~ubuntu/.ssh/authorized_keys ~root/.ssh/authorized_keys
Now you can also use ssh root@publicIPAddress
to reach a particular VM.
Configure ssh for easy access
Run nano ~/.ssh/config
and then ensure the lines exist in that file, replacing each instance of publicIPAddress
with the corresponding public IP address of the VM from above. When done, press Ctrl-X
, agree to save by pressing y
, then press Return
to accept the default config filename as specified earlier.
Host vm1
Hostname publicIPAddress
Username ubuntu
Host vm2
Hostname publicIPAddress
Username ubuntu
Host vm3
Hostname publicIPAddress
Username ubuntu
Now you can use ssh vm1
to reach vm1
as the ubuntu
user, ssh root@vm1
to reach vm1
as the root
user, and equivalently for the other two VMs. scp to and from each VM should also work equivalently.
ssh and scp to the VMs are now configured as required for the rest of this tutorial. The VMs are running Ubuntu 18.04 “Bionic Beaver” as is the default when requesting the UbuntuLTS
image on Azure.
Appendix C: Managing authentication for the fence agent
Create the custom fencing role
Create a temporary local file called fence-agent-role.json
using nano. Run nano fence-agent-role.json
and then enter the following contents, but change subscriptionID
to your actual Azure subscription ID. Your actual subscription ID is available from Azure Portal → Subscriptions. When done, press Ctrl-X
, agree to save by pressing y
, then press Return
to accept the default fence-agent-role.json
filename as specified earlier. Your fence-agent-role.json
file should now look like this, which you can verify with the command cat fence-agent-role.json
:
{
"Name": "fence-ha-role-fencing",
"Id": null,
"IsCustom": true,
"Description": "fence-ha-role-fencing",
"Actions": [
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/powerOff/action",
"Microsoft.Compute/virtualMachines/start/action"
],
"NotActions": [
],
"AssignableScopes": [
"/subscriptions/subscriptionId"
]
}
Now type the following command to define the custom role inside Azure:
az role definition create --role-definition ./fence-agent-role.json
The local file fence-agent-role.json is no longer required and you can delete it with:
rm -f fence-agent-role.json
Create a service principal
The service principal is the identity that the fence agent will use to authenticate with Azure. Once set up, there are five variables specific to your credentials that you will need to set or expand to configure the fence agent with later, as well as the specific resource group name you are using for this tutorial. These are:
- Subscription ID:
$subscriptionId
- Directory (tenant) ID:
$tenantId
- Application (client) ID:
$applicationId
- Secret ID:
$secretId
- Value:
$password
Keep a note of these separately, but also assign them to shell variables so that they are automatically expanded by the shell in later steps in this tutorial. For example, to assign the value abcd
to the shell variable $subscriptionId
, type the command:
subscriptionId=abcd
Note that $
is used to refer to a variable and to request that it be expanded, but is not used for a variable assignment itself.
The actual subscription ID is available from Azure Portal → Subscriptions. Assign the variable as above but using your actual subscription ID.
Make a note of the values of the other variables and assign them to shell variables as they are generated in the following steps:
- From the Azure Portal, go to Azure Active Directory → App registrations → New registration.
- Specify a name for your service principal. Any name will do. For example: “TutorialFenceAgent”.
All other options can be left at defaults. - Click Register.
- Save the Application (client) ID (
$applicationId
) and Directory (tenant) ID ($tenantId
). - Go to Certificates & secrets → New client secret.
- All options can be left at defaults. If you wish, you may change the expiry to match your expectations for this tutorial.
- Click Add.
- Save the Secret ID (
$secretId
) and the Value ($password
).
Assign service principal to fence role scoped to VMs
To allow the fence agent to use the service principal to fence specific VMs, use the following command to assign the our custom fence agent role against our service principal and scoped to a specific VM:
az role assignment create \
--assignee "$applicationId" \
--role fence-ha-role-fencing
--scope "/subscriptions/$subscriptionId/$resourceGroup/providers/Microsoft.Compute/virtualMachines/vm1"
Repeat the command twice more but adjusting the specified scope to be against the VMs vm2
and vm3
.
Appendix D: Install and configure SQL Server
Install SQL Server
Follow Microsoft’s instructions to install SQL Server. Ensure that you install both their mssql-server
and mssql-server-ha
packages.
Configure SQL Server
To enter the SQL monitor on a particular VM, first ssh to the VM (eg. ssh vm1
) and then run the following command. You will need the database password you configured in the “Install SQL Server” step previously.
/opt/mssql-tools/bin/sqlcmd -S localhost -U SA
To leave the SQL monitor, type exit
. Leave the ssh session to return to your host machine as needed with a further exit
.
Create a certificate with the following SQL commands using the SQL monitor on vm1:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**';
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
BACKUP CERTIFICATE dbm_certificate
TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
ENCRYPTION BY PASSWORD = '**<Private_Key_Password>**'
);
Copy the certificate to the second VM from your host machine:
scp -3 root@vm1:/var/opt/mssql/data/dbm_certificate.{cer,pvk} root@vm2:/var/opt/mssql/data/
Set ownership on the second VM from your host machine:
ssh root@vm2 chown mssql.mssql /var/opt/mssql/data/dbm_certificate.{cer,pvk}
Repeat the above copy and ownership steps but for vm3
instead of vm2
.
Run the following commands on the SQL monitor of each of the VMs in turn:
CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (
ROLE = ALL,
AUTHENTICATION = CERTIFICATE dbm_certificate,
ENCRYPTION = REQUIRED ALGORITHM AES
);
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
USE [master];
CREATE LOGIN [pacemakerLogin] with PASSWORD= N'StrongPassword123!';
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin];
On your host machine, create a temporary local file called passwd
using nano. Run nano passwd
and then enter the following contents. When done, press Ctrl-X
, agree to save by pressing y
, then press Return
to accept the default passwd
filename as specified earlier. Your passwd
file should now look like this, which you can verify with the command cat passwd
:
pacemakerLogin
StrongPassword123!
On your host machine, run the following two commands to copy the file to vm1
and set its permissions so that only root
can read it:
scp passwd root@vm1:/var/opt/mssql/secrets/
ssh root@vm1 chmod 600 /var/opt/mssql/secrets/passwd
Repeat the above two commands but with vm2
instead of vm1
, and then again with vm3
instead of vm1
.
Run the following commands on the SQL monitor of vm1
only:
CREATE DATABASE [db1];
ALTER DATABASE [db1] SET RECOVERY FULL;
BACKUP DATABASE [db1] TO DISK = N'/var/opt/mssql/data/db1.bak' with compression;
CREATE AVAILABILITY GROUP [ag1]
WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
FOR DATABASE [db1] REPLICA ON
N'vm1'
WITH (
ENDPOINT_URL = N'tcp://vm1:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://10.0.0.11:1433'),
PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('vm2', 'vm3'), READ_WRITE_ROUTING_URL = 'TCP://10.0.0.11:1433'),
SESSION_TIMEOUT = 10
),
N'vm2'
WITH (
ENDPOINT_URL = N'tcp://vm2:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://10.0.0.12:1433'),
PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('vm1', 'vm3'), READ_WRITE_ROUTING_URL = 'TCP://10.0.0.12:1433'),
SESSION_TIMEOUT = 10
),
N'vm3'
WITH(
ENDPOINT_URL = N'tcp://vm3:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://10.0.0.13:1433'),
PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('vm1', 'vm2'), READ_WRITE_ROUTING_URL = 'TCP://10.0.0.13:1433'),
SESSION_TIMEOUT = 10
);
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
Run the following commands on the SQL monitors of vm2
and vm3
only:
ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
Now SQL Server should be set up with an availability group called ag1
on all three VMs.