High Availability Cluster : Simple version

From Proxmox VE
Jump to: navigation, search
Yellowpin.svg Note: Article about the old stable Proxmox VE 3.x releases. See High Availability Cluster 4.x for the new HA stack.

Introduction

Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. 
In simple words, if a virtual machine (VM) is configured as HA and the physical host fails,
the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.

The Proxmox VE HA Cluster is based on proven Linux HA technologies, providing stable and reliable HA service. 

For a more simple way to build a HA Cluster, and for a better understanding of it's bases, we have created this guide.

System Requirements

For testing this guide, we need:

  • at least three machines with virtualization support.
  • at least 1 shared storage

Cluster Creation

We have N node ready, for example this list of hostnames:

  • node1 ( ip example : 192.168.7.1 )
  • node2 ( ip example : 192.168.7.2 )
  • node3 ( ip example : 192.168.7.3 )
  • node4 ( ip example : 192.168.7.4 )

Now we can create the cluster Extensys :

On node1 :
root@node1:~$ pvecm create Extensys

#NOTE: we can create the cluster on any node.

And then, we add all the other nodes:


root@node2:~$ pvecm add 192.168.7.1  <-- this is the ip of the node where the cluster has been created
*NOTE: follow all the istructions, then wait until it's done.

root@node3:~$ pvecm add 192.168.7.1

root@node4:~$ pvecm add 192.168.7.2  <-- we can add to any node of the cluster

Display the cluster status :


root@node1:~$ pvecm nodes
   1   M    536   2014-05-17 11:33:43  node1
   2   M    536   2014-05-20 08:12:19  node2
   3   M    536   2014-05-17 11:33:43  node3
   4   M    528   2014-05-17 11:33:18  node4

Basic HA: Fencing Between Nodes

The first step to implement the High Avaiability is to activate the fencing on every node : (for more info on fencing, read here.)

Uncomment this line in /etc/default/redhat-cluster-pve:

FENCE_JOIN="yes"

Then:


root@node3:~$ service cman reload
*NOTE: we must check if the service rgmanages is online and running after cman restart

root@node3:~$ service rgmanager status
rgmanager (pid 2961 2959 2956) is running...

If you encounter any problem with rgmanager, try to reboot all nodes by "reboot" (do not use "shutdown -r now").

When all the nodes are in the fencing domain, we can use this command to see if everything is fine:


root@node3:~$  clustat

Cluster Status for extensys @ Mon May 26 11:35:43 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                            1 Online, rgmanager
 node2                                                            2 Online, rgmanager
 node3                                                            3 Online, Local, rgmanager
 node4                                                            4 Online, rgmanager

If you have two nodes, you must set the expected vote at 1 on every node by this command :

pvecm expected 1

HA, activate on VM

Now we have 4 nodes in cluster, with fencing actived. Now we need to add the iScsi/nfs storage in Datacenter->Storage. Once done, create one or two VM on it, then shut it down. Now go to Datacenter -> HA , click on Add -> HA managed VM/CT , add the id of the VM, enable autostart and click on activate . Go to the VM option, enable "Start at boot".

At this point, we need to check if all we have done is right:


root@node3:~$  clustat

Cluster Status for extensys @ Mon May 26 12:01:22 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                            1 Online, rgmanager
 node2                                                            2 Online, rgmanager
 node3                                                            3 Online, Local, rgmanager
 node4                                                            4 Online, rgmanager


 Service Name                                                     Owner (Last)                                                     State         
 ------- ----                                                     ----- ------                                                     -----         
 pvevm:101                                                        node2                                                            started       
 pvevm:102                                                        node4                                                            started 

If you wish to test HA :

  • 1 start a machine on a node ( like node1 )
  • 2 reboot or gently power off that node.(*)

(*)This method of HA don't help on power loss or hard reset, so is only for testing purpose. ( Or production, with some work on it ). For more information :