Difference between revisions of "High Availability Cluster"

From Proxmox VE
Jump to navigation Jump to search
Line 46: Line 46:
 
If you run HA, only high end server hardware with no single point of failure should be used. This includes redundant disks (Hardware Raid), redundant power supply, UPS systems, network bonding.
 
If you run HA, only high end server hardware with no single point of failure should be used. This includes redundant disks (Hardware Raid), redundant power supply, UPS systems, network bonding.
  
*Fully configured [[Proxmox_VE_2.0_Cluster]], with at least 3 nodes (maximum supported configuration: currently 16 nodes per cluster). Note that, with certain limitations, 2-node configuration is also possible ([[http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster]]).
+
*Fully configured [[Proxmox_VE_2.0_Cluster]], with at least 3 nodes (maximum supported configuration: currently 16 nodes per cluster). Note that, with certain limitations, 2-node configuration is also possible ([[Two-Node_High_Availability_Cluster]]).
 
*Shared storage (SAN for Virtual Disk Image Store for HA KVM)
 
*Shared storage (SAN for Virtual Disk Image Store for HA KVM)
 
*Reliable network, suitable configured  
 
*Reliable network, suitable configured  

Revision as of 16:00, 19 January 2012

Yellowpin.svg Note: Article about Proxmox VE 2.0 beta

Introduction

Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.

The Proxmox VE HA Cluster is based on proofed Linux HA technologies, providing stable and reliable HA service.

Screen-HA-status

Update to the latest version

Before you start, make sure you have installed the latest packages, just run this on all nodes:

aptitude update && aptitude full-upgrade && aptitude install resource-agents-pve

package list (23.12.2011):

pveversion -v

pve-manager: 2.0-18 (pve-manager/2.0/16283a5a)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-55
pve-kernel-2.6.32-6-pve: 2.6.32-55
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.8-3
pve-cluster: 1.0-17
qemu-server: 2.0-13
pve-firmware: 1.0-14
libpve-common-perl: 1.0-11
libpve-access-control: 1.0-5
libpve-storage-perl: 2.0-9
vncterm: 1.0-2
vzctl: 3.0.29-3pve8
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-1
ksm-control-daemon: 1.1-1
uname -a
Linux proxmox-7-62 2.6.32-6-pve #1 SMP Mon Dec 19 10:15:23 CET 2011 x86_64 GNU/Linux

System requirements

If you run HA, only high end server hardware with no single point of failure should be used. This includes redundant disks (Hardware Raid), redundant power supply, UPS systems, network bonding.

  • Fully configured Proxmox_VE_2.0_Cluster, with at least 3 nodes (maximum supported configuration: currently 16 nodes per cluster). Note that, with certain limitations, 2-node configuration is also possible (Two-Node_High_Availability_Cluster).
  • Shared storage (SAN for Virtual Disk Image Store for HA KVM)
  • Reliable network, suitable configured
  • NFS for Containers
  • Fencing device(s) - reliable and TESTED!

HA Configuration

The configuration of fence devices is CLI only. Adding and managing VM´s and containers for HA should be done via GUI.

Fencing is a needed part for Proxmox VE 2.0 HA, without fencing, HA will not start working.

Fencing

Configure VM or Containers for HA

Review again if you have everything you need and if all systems are running reliable. It makes no sense to configure HA cluster setup on unreliable hardware.

Enable a KVM VM or a Container for HA

See also the video tutorial on Proxmox VE Youtube channel

Screen-Add-HA-managed_VM-CT

Screen-Show-HA-managed_VM-CT

HA Cluster maintenance (node reboots)

If you need to reboot a node, e.g. because of a kernel update you need to stop rgmanager. By doing this, all resources are stopped and moved to other nodes. All KVM guests will get a ACPI shutdown request (if this does not work due to VM internal setting just a 'stop').

You can stop the rgmanager service via GUI or just run:

/etc/init.d/rgmanager stop

The command will take a while, monitor the "tasks" and the VM´s and CT´s on the GUI. as soon as the rgmanager is stopped, you can reboot your node. as soon as the node is up again, continue with the next node and so on.

Video Tutorials

Proxmox VE Youtube channel

Certified Configurations and Examples

Testing

Before going in production do as many tests as possible.

Useful command line tools

Here is a list of useful CLI tools:

  • clustat - Cluster Status Utility
  • clusvcadm - Cluster User Service Administration Utility
  • ccs_config_validate - validate cluster.conf file
  • fence_tool - a utility for the fenced daemon
  • fence_node - a utility to run fence agents