Two-Node High Availability Cluster: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
Line 25: Line 25:
Many different methods can be used for fencing a node, so just a few changes would be needed to make this work under different scenarios.  
Many different methods can be used for fencing a node, so just a few changes would be needed to make this work under different scenarios.  


The first step is to create a copy of cluster.conf:
First, login to the CLI in any of your properly configure cluster machines.
 
Then, create a copy of cluster.conf (append it .new):


<pre>
<pre>
cp /usr/pve/cluster.conf /usr/pve/cluster.conf.new
cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
</pre>
</pre>


Then go and edit cluster.conf.new from the CLI in any of the cluster machines. In one of the first lines you will see this:
go and edit cluster.conf.new. In one of the first lines you will see this:


<pre>
<pre>
Line 37: Line 39:
</pre>
</pre>


Be sure to increase the number "config_version" each time you plan to apply new configurations as this is the internal mechanism used by the cluster configuration tools to apply it.
Be sure to increase the number "config_version" each time you plan to apply new configurations as this is the internal mechanism used by the cluster configuration tools to detect new changes.
 
Now, add the available fencing devices to the config files by adding this lines (it is ok right after </clusternodes>):
 
<pre>
<fencedevices>
        <fencedevice agent="fence_ilo" hostname="hpilohost1" login="hpilologin" name="hpilofence1" passwd="hpilopword"/>
        <fencedevice agent="fence_ilo" hostname="hpilohost2" login="hpilologin" name="hpilofence2" passwd="hpilologin"/>
        <fencedevice agent="fence_ilo" hostname="hpilohost3" login="hpilologin" name="hpilofence3" passwd="hpilopword"/>
</fencedevices>
</pre>


=Problems and workarounds=
=Problems and workarounds=

Revision as of 15:48, 18 January 2012


Yellowpin.svg Note: This is a work in progress, stay tuned!
Yellowpin.svg Note: Article about Proxmox VE 2.0 beta

Introduction

This article explores how to build a two-node cluster with HA enabled under Proxmox. HA is generally recommended to be deployed on at least three nodes to prevent strange behaviours and potentially lethal data incoherences (for further info look for "Quorum". Nevertheless, with some tweaking, it is also possible to successfully use Proxmox to run on a two-node cluster.

Although in the case of two-node clusters it is recommended to use a third, shared quorum disk partition, Proxmox allows to build the cluster without it. Let's see how.

System requirements

If you run HA, only high end server hardware with no single point of failure should be used. This includes redundant disks (Hardware Raid), redundant power supply, UPS systems, network bonding.

  • Fully configured Proxmox_VE_2.0_Cluster, with 2 nodes.
  • Shared storage (SAN for Virtual Disk Image Store for HA KVM). In this case, no external storage was used. Instead, a cheaper alternative (DRBD) was tested.
  • Reliable network, suitable configured
  • Fencing device(s) - reliable and TESTED!. We will use HP's iLO for this example.

What is DRBD used for?

For this testing configuration, two DRBD resources were created, one for VM images an another one for VMs users data. Thanks to DRBD (if properly configured), a mirror raid is created through the network (be aware that, although possible, using WANs would mean high latencies). As VMs and data is replicated synchronously in both nodes, if one of them fails, it will be possible to restart "dead" machines on the other node without data loss.

Configuring Fencing

Fencing is vital for Proxmox to manage a node loss and thus provide effective HA. Fencing is the mechanism used to prevent data inconsistences between nodes in a cluster by ensuring that a node reported as "dead" is really down. If it isn't a reboot or power-off signal is sent to force it to go to a safe state.

Many different methods can be used for fencing a node, so just a few changes would be needed to make this work under different scenarios.

First, login to the CLI in any of your properly configure cluster machines.

Then, create a copy of cluster.conf (append it .new):

cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new

go and edit cluster.conf.new. In one of the first lines you will see this:

<cluster alias="hpiloclust" config_version="12" name="hpiloclust">

Be sure to increase the number "config_version" each time you plan to apply new configurations as this is the internal mechanism used by the cluster configuration tools to detect new changes.

Now, add the available fencing devices to the config files by adding this lines (it is ok right after </clusternodes>):

<fencedevices>
        <fencedevice agent="fence_ilo" hostname="hpilohost1" login="hpilologin" name="hpilofence1" passwd="hpilopword"/>
        <fencedevice agent="fence_ilo" hostname="hpilohost2" login="hpilologin" name="hpilofence2" passwd="hpilologin"/>
        <fencedevice agent="fence_ilo" hostname="hpilohost3" login="hpilologin" name="hpilofence3" passwd="hpilopword"/>
</fencedevices>

Problems and workarounds

DRBD split-brain

Under some cirumstances, data consistence between both nodes on DRBD partitions can be lost. In this case, the only efficient way to solve this is by manually intervention. The cluster administrator must decide which node data must be preserved and which one is discarded to gain data coherence again. Under DRBD, split-brain situations will probably occur when data connection is lost for longer than a few seconds.

Lets consider a failure scenario where node A is the node where we had the most machines running and thus the one we want to conserve data. Therefore, node B changes made to DRBD partition while the split-brain situation lasted, must be discarded. We consider a Primary/Primary configuration for DRBD. The procedure to follow would be :

  • Go to a node B terminal (repeat for each DRBD resource where quorum is lost):
drbdadm secondary [resource name]
drbdadm disconnect [resource name]
drbdadm -- --discard-my-data connect [resource name]
  • Go to a node A terminal (repeat for each DRBD resource where quorum is lost):
drbdadm connect [resource name]
  • Go back to node B terminal (repeat for each DRBD resource where quorum is lost):
drbdadm primary [resource name]