Difference between revisions of "High Availability Cluster 4.x"

From Proxmox VE
Jump to navigation Jump to search
 
(34 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 +
{{Note|Most information was moved to our [http://pve.proxmox.com/pve-docs/ reference documentation], see [[High Availability]].}}
 +
 
== Introduction ==  
 
== Introduction ==  
 
Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.
 
Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.
Line 7: Line 9:
 
In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator.   
 
In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator.   
  
=== Update to the latest version ===
+
For a more up to date documentation see [[High Availability]]
Before you start, make sure you have installed the latest packages, just run the following on all nodes:
 
  
apt-get update && apt-get dist-upgrade
+
== HA Simulator ==
 +
[[Image:Screen-HA-4-simulator.png|HA Simulator in Action|thumb]]
  
== System requirements ==
+
By using the HA simulator you can test and learn all functionalities of the Proxmox VE HA solutions.
  
If you run HA, high end server hardware with no single point of failure is required. This includes redundant disks, redundant power supply, UPS systems, and network bonding.  
+
The simulator allows you to watch and test the behaviour of a real-world 3 node cluster with 6 VM's.
  
*Proxmox VE 4.0 comes with Self-Fencing with hardware watchdog or Software watchdog.
+
You do not have to setup or configure a real cluster, the HA simulator runs out of the box on the current code base.
*Fully configured [[Proxmox_VE_4.x_Cluster]] (version 4.0 and later), with at least 3 nodes (maximum supported configuration: currently 32 nodes per cluster).
 
*Shared storage (SAN, NAS/NFS, Ceph, DRBD9, ... for virtual disk images)
 
*Reliable, redundant network, suitable configured which supports multicast
 
*An extra network for cluster communication, one network for VM traffic and one network for storage traffic.
 
  
It's essential that you use redundant network connections for the cluster communication (bonding). If not, a simple switch reboot (or power loss on the switch) can fence all cluster nodes if it takes longer than 60 sec.
+
Install with apt:
  
== HA Configuration ==
+
apt-get install pve-ha-simulator
  
Adding and managing VM´s or containers for HA can be done via GUI or CLI (`ha-manager add <VMID>`).
+
To start the simulator you must have a X11 redirection to your current system.
  
'''Important:''' note that before enabling HA for a service you should test it thoughtfully. See if migration works, look that '''NO''' local resources are used by it. Secure that it may run on all nodes defined by its group and better on all cluster nodes.
+
If you are on a Linux machine you can use:
 +
 +
ssh root@<IPofPVE4> -Y
  
=== Fencing ===
+
On Windows it is working with  [http://mobaxterm.mobatek.net/ mobaxterm].
Proxmox VE Cluster 4.0 or greater comes with watchdog fencing.
 
This works out of the box, no configuration is required.
 
  
How Watchdog fencing works:
+
After starting the simulator create a working directory:
  
If the node has connection with the cluster and has quorum, the watchdog will be reset. If quorum is lost, the node is not able to reset the watchdog. This will trigger a reboot after 60 seconds.
+
mkdir working
  
If your hardware has a hardware watchdog, this one will be automatically detected and used. Otherwise, ha-manager just uses the Linux softdog. Therefore testing Proxmox VE HA inside a virtual environment is possible.
+
To start the simulator type
  
=== Permissions ===
+
pve-ha-simulator working/
From version 1.0-13 of the pve-ha-manager package, the HA stack is better integrated in the permission system of Proxmox VE.
 
  
* Creation, deletion and updating a resource or group needs the 'Sys.Console' privilege for the whole cluster (i.e. on the root path '/').
+
== Hardware Watchdogs ==
* The current status and the configured resources and groups may be read (but '''not''' written) with the 'Sys.Audit' privilege on the root path.
+
if no hardware watchdog is defined, proxmox is loading the softdog module,
 +
which emulate the /dev/watchdog device.
  
=== HA Groups ===
+
To enable a hardware watchdog, you need to specify the module to load
The Proxmox VE HA Cluster is using groups for mapping vm to node.
+
<pre>
 +
/etc/default/pve-ha-manager
 +
WATCHDOG_MODULE=mywatchdogmodule
 +
</pre>
  
For example: If a "vm100" is in the group "ONE" and group "ONE" has members "pve1,pve2" and "vm100" is running on pve1.
+
also, please disable nmi watchdog, which is embedeed in cpu apic.
  
When "pve1" is crashing. "vm100" will migrated to "pve2".
+
edit: /etc/default/grub
 +
<pre>
 +
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"
 +
</pre>
 +
the
 +
<pre>
 +
# update-grub
 +
</pre>
  
The Proxmox VE HA Groups has two option restricted and nofailback.
+
=== iTCO Watchdog (module "iTCO_wdt") ===
 +
This is an hardware watchdog, available in almost all intels motherboard (ich chipset) since 15 years.
  
*restricted: VM's bound to the group may only run on  cluster  members  which are also members of the group. If no members of the group are available, the service is placed in the stopped state.
 
  
*nofailback: Enabling this option for a group will prevent automated fail-back after a more-preferred node  rejoins  the cluster.
+
=== IPMI Watchdog (module "ipmi_watchdog") ===
  
=== Enable a VM/CT for HA ===
+
For IPMI Watchdogs you may have to set the action, else it may not do anything when it triggers.
On the CLI, you can use ha-manger to achieve this task.
 
  
<b>IMPORTANT:</b>
+
For this purpose edit the /etc/modprobe.d/ipmi_watchdog.conf (simple create the file):
  
If you enable HA it's not possible to turnoff the VM inside the VM. Also, if it is disabled the VM will be stopped.
+
  options ipmi_watchdog action=power_cycle panic_wdt_timeout=10
   
 
If you add a VM/CT, its instantly 'ha-managed'.
 
  
ha-manager add vm:100
+
'''NOTE''': reboot or reload ipmi_watchdog module to take the changes in effect.
 
 
To add a VM/CT on GUI.
 
 
 
[[Image:Screen-Add-HA-4-managed_VM-CT.png|Adding a Service to the HA manager|thumb]]
 
  
=== Disable a VM/CT for HA ===
+
=== Dell IDrac (module "ipmi_watchdog") ===
  
If you want to disable a ha-managed VM/CT (e.g. for shutdown) via CLI:
+
For Dell IDrac, please desactivate the Automated System Recovery Agent in IDrac configuration.
  
ha-manager disable vm:100
+
[[File:Idrac-asr.png|thumb]]
  
If you want to re-enable a ha-managed VM/CT:
 
  
ha-manager enable vm:100
+
If openmanage is installed, you need to disable watchdog management from openmanage
  
=== HA Cluster Maintenance (node reboots) ===
 
If you need to reboot a node, e.g. because of a kernel update, you need to migrate all VM/CT to another node or disable them. By disabling them, all resources are stopped. All VM guests will get an ACPI shutdown request (if this won't work due to VM internal settings, they'll just get a 'stop').
 
  
The command will take some time for execution, monitor the "tasks" and the VM´s and CT´s on the GUI. As soon as the VM/CT are either stopped or migrated, you can reboot your node. As soon as the node is up again, continue with the next node and so on.
 
  
'''Note:''' When you gracefully shutdown a node, it services won't get migrated by the HA stack. You have to migrate them manually before you power off your node (for example for hardware maintenance).
 
  
== HA Simulator ==
+
<pre>
[[Image:Screen-HA-4-simulator.png|HA Simulator in Action|thumb]]
+
/opt/dell/srvadmin/sbin/dcecfg command=removepopalias aliasname=dcifru
 +
</pre>
  
By using the HA simulator you can test and learn all functionalities of the Proxmox VE HA solutions.
+
and reboot server
  
The simulator allows you to watch and test the behaviour of a real-world 3 node cluster with 6 VM's.
+
After restart, check that watchdog timer is 10s, and not overrided by openmanage
 +
<pre>
 +
idracadm getsysinfo -w
  
You do not have to setup or configure a real cluster, the HA simulator runs out of the box on the current code base.
+
Watchdog Information:
 +
Recovery Action        = Power Cycle
 +
Present countdown value = 9 seconds
 +
Initial countdown value = 10 seconds
  
Install with apt:
+
</pre>
  
apt-get install pve-ha-simulator
+
or
  
To start the simulator you must have a X11 redirection to your current system.
+
<pre>
 +
# ipmitool mc watchdog get
 +
Watchdog Timer Use:    SMS/OS (0x44)
 +
Watchdog Timer Is:      Started/Running
 +
Watchdog Timer Actions: Hard Reset (0x01)
 +
Pre-timeout interval:  0 seconds
 +
Timer Expiration Flags: 0x00
 +
Initial Countdown:      10 sec
 +
Present Countdown:      9 sec
  
If you are on a Linux machine you can use:
+
</pre>
 
ssh root@<IPofPVE4> -Y
 
  
On Windows it is working with  [http://mobaxterm.mobatek.net/ mobaxterm].
+
=== HP ILO (module "hpwdt" )===
  
After starting the simulator create a working directory:
+
Users have reported crash with this module,
 +
please test and update the wiki if it's working fine
  
  mkdir working
+
also, disable HP ASR feature (Automatic Server Recovery).
  
To start the simulator type
+
http://h17007.www1.hp.com/docs/iss/proliant_uefi/s_asr_status.html
  
pve-ha-simulator working/
+
If you have installed hp management tools,
 +
you need to disable "hp-asrd" daemon
  
 
== Troubleshooting ==
 
== Troubleshooting ==
 
=== Error recovery ===
 
=== Error recovery ===
If a service start fails we try to recover from it with our "restart" and "relocate" policy, see the man page of the ha-manager for more information.
 
 
If after all tries the service state could not be recovered it gets placed in an error state. In this state the service won't get touched by the HA
 
stack anymore. To recover from this state you should follow these steps:
 
 
* bring the resource back into an safe and consistent state (e.g: killing its process)
 
* disable the ha resource to place it in an stopped state
 
* fix the error which led to this failures
 
* after you fixed all errors you may enable the service again
 
 
Note: when a Service fails to stop it also get's placed in the error state, you may follow the same steps to recover from it.
 
 
=== IPMI Watchdog ===
 
 
For IPMI Watchdogs you may have to set the action, else it may not do anything when it triggers.
 
 
For this purpose edit the /etc/modprobe.d/impi_watchdog.conf (simple create the file):
 
 
options ipmi_watchdog action=power_cycle
 
  
'''NOTE''': reboot or reload ipmi_watchdog module to take the changes in effect.
+
See [[High Availability#ha_manager_error_recovery|High Availability - Error Recovery]]
  
 
=== Failed watchdog-mux or Multiple Watchdogs ===
 
=== Failed watchdog-mux or Multiple Watchdogs ===
Line 153: Line 144:
 
Our watchdog multiplexer will use /dev/watchdog which maps to /dev/watchdog0.
 
Our watchdog multiplexer will use /dev/watchdog which maps to /dev/watchdog0.
  
Selecting a specific watchdog is not implemented, mainly for this quote from the linux-watchdog mailing list:
+
Selecting a specific watchdog is not implemented, mainly for this quote from the linux-watchdog mailing list:[http://www.spinics.net/lists/linux-watchdog/msg04091.html]
  
 
  The watchdog device node <-> driver mapping is fragile and
 
  The watchdog device node <-> driver mapping is fragile and
Line 163: Line 154:
 
When deleting a node from a HA cluster you have to ensure the following:
 
When deleting a node from a HA cluster you have to ensure the following:
  
* all HA services were relocate to another node! A graceful shutdown will '''NOT''' auto migrate them.
+
* all HA services were relocate to another node!
 
* Remove the node from all defined groups.
 
* Remove the node from all defined groups.
* Shutdown the node you want to remove, from now on this node '''MUST NOT''' come online in the same network again, without being reinstalled/cleared of all cluster traces.
+
* Shutdown the node you want to remove, from now on this node '''MUST NOT''' come online in the same network again, without being reinstalled/cleared of all cluster configurations.
 
* execute `pvecm delnode nodename` from an remaining node.
 
* execute `pvecm delnode nodename` from an remaining node.
 +
 
The HA stack now places the node in an 'gone' state, you still see it in the manager status.
 
The HA stack now places the node in an 'gone' state, you still see it in the manager status.
 
After an hour in this state it will be auto deleted. This ensures that if the node died ungracefully the services still will be fenced and migrated to another node.
 
After an hour in this state it will be auto deleted. This ensures that if the node died ungracefully the services still will be fenced and migrated to another node.
  
 
=== Durations ===
 
=== Durations ===
Note that some HA actions may take their time, and don't happen instantly. This avoids out of control feedback loops, an ensures that the HA stack is all the time (where it's possible) in a safe and consistent state.
+
Note that some HA actions may take their time, and don't happen instantly. This avoids out of control feedback loops, and ensures that the HA stack is in a safe and consistent state all the time.
  
 
=== Container ===
 
=== Container ===
Note that while containers may be put under HA, currently (PVE4Beta2) they don't support live migration. To migrate an container stop it, migrate it offline and start it again. If a node fails recovery works after the failed node was fenced, as long as you don't use local bound resources.
+
Note that while containers may be put under HA, currently (PVE 4.1) they don't support live migration. So all migrate actions on them will be mapped to relocate (stop, move, start).
 +
Recoveries on node failure work, as long as you don't use local resources.
  
 
== Video Tutorials ==
 
== Video Tutorials ==
Line 184: Line 177:
 
=== Useful command line tools ===
 
=== Useful command line tools ===
 
Here is a list of useful CLI tools:
 
Here is a list of useful CLI tools:
*ha-manger - to manage the ha stack of the cluster
+
*ha-manager - to manage the ha stack of the cluster
 
*pvecm - to manage the cluster-manager   
 
*pvecm - to manage the cluster-manager   
 
*corosync* - to manipulate the corosync  
 
*corosync* - to manipulate the corosync  
  
[[Category: HOWTO]][[Category: Installation]]
+
[[Category: Archive]][[Category: Installation]]

Latest revision as of 11:50, 17 August 2020

Yellowpin.svg Note: Most information was moved to our reference documentation, see High Availability.

Introduction

Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.

The Proxmox VE HA Cluster is based on the Proxmox VE HA Manager (pve-ha-manager) - using watchdog fencing. Major benefit of Linux softdog or hardware watchdog is zero configuration - it just works out of the box.

In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator.

For a more up to date documentation see High Availability

HA Simulator

HA Simulator in Action

By using the HA simulator you can test and learn all functionalities of the Proxmox VE HA solutions.

The simulator allows you to watch and test the behaviour of a real-world 3 node cluster with 6 VM's.

You do not have to setup or configure a real cluster, the HA simulator runs out of the box on the current code base.

Install with apt:

apt-get install pve-ha-simulator

To start the simulator you must have a X11 redirection to your current system.

If you are on a Linux machine you can use:

ssh root@<IPofPVE4> -Y

On Windows it is working with mobaxterm.

After starting the simulator create a working directory:

mkdir working

To start the simulator type

pve-ha-simulator working/

Hardware Watchdogs

if no hardware watchdog is defined, proxmox is loading the softdog module, which emulate the /dev/watchdog device.

To enable a hardware watchdog, you need to specify the module to load

/etc/default/pve-ha-manager
WATCHDOG_MODULE=mywatchdogmodule

also, please disable nmi watchdog, which is embedeed in cpu apic.

edit: /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"

the

# update-grub

iTCO Watchdog (module "iTCO_wdt")

This is an hardware watchdog, available in almost all intels motherboard (ich chipset) since 15 years.


IPMI Watchdog (module "ipmi_watchdog")

For IPMI Watchdogs you may have to set the action, else it may not do anything when it triggers.

For this purpose edit the /etc/modprobe.d/ipmi_watchdog.conf (simple create the file):

options ipmi_watchdog action=power_cycle panic_wdt_timeout=10

NOTE: reboot or reload ipmi_watchdog module to take the changes in effect.

Dell IDrac (module "ipmi_watchdog")

For Dell IDrac, please desactivate the Automated System Recovery Agent in IDrac configuration.

Idrac-asr.png


If openmanage is installed, you need to disable watchdog management from openmanage



/opt/dell/srvadmin/sbin/dcecfg command=removepopalias aliasname=dcifru

and reboot server

After restart, check that watchdog timer is 10s, and not overrided by openmanage

idracadm getsysinfo -w

Watchdog Information:
Recovery Action         = Power Cycle
Present countdown value = 9 seconds
Initial countdown value = 10 seconds

or

# ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x44)
Watchdog Timer Is:      Started/Running
Watchdog Timer Actions: Hard Reset (0x01)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x00
Initial Countdown:      10 sec
Present Countdown:      9 sec

HP ILO (module "hpwdt" )

Users have reported crash with this module, please test and update the wiki if it's working fine

also, disable HP ASR feature (Automatic Server Recovery).

http://h17007.www1.hp.com/docs/iss/proliant_uefi/s_asr_status.html

If you have installed hp management tools, you need to disable "hp-asrd" daemon

Troubleshooting

Error recovery

See High Availability - Error Recovery

Failed watchdog-mux or Multiple Watchdogs

Disable all BIOS watchdog functionality, those settings setup the watchdog in the expectancy that the OS resets it, that is not our desired use case here and may lead to problems - e.g.: reset of the node after a fixed amount of time.

Intel AMT (OS Health Watchdog) should be disabled and with it the mei and mei_me modules, as they may cause problems.

If you host has multiple watchdogs available, only allow the one you want to use for HA, i.e. blacklist the other modules from loading. Our watchdog multiplexer will use /dev/watchdog which maps to /dev/watchdog0.

Selecting a specific watchdog is not implemented, mainly for this quote from the linux-watchdog mailing list:[1]

The watchdog device node <-> driver mapping is fragile and
can change from one kernel version to the next or even across reboot, so
users shouldn't assume it to be persistent.

Deleting Nodes From The Cluster

When deleting a node from a HA cluster you have to ensure the following:

  • all HA services were relocate to another node!
  • Remove the node from all defined groups.
  • Shutdown the node you want to remove, from now on this node MUST NOT come online in the same network again, without being reinstalled/cleared of all cluster configurations.
  • execute `pvecm delnode nodename` from an remaining node.

The HA stack now places the node in an 'gone' state, you still see it in the manager status. After an hour in this state it will be auto deleted. This ensures that if the node died ungracefully the services still will be fenced and migrated to another node.

Durations

Note that some HA actions may take their time, and don't happen instantly. This avoids out of control feedback loops, and ensures that the HA stack is in a safe and consistent state all the time.

Container

Note that while containers may be put under HA, currently (PVE 4.1) they don't support live migration. So all migrate actions on them will be mapped to relocate (stop, move, start). Recoveries on node failure work, as long as you don't use local resources.

Video Tutorials

Proxmox VE Youtube channel

Testing

Before going into production it is highly recommended to do as many tests as possible. Then, do some more.

Useful command line tools

Here is a list of useful CLI tools:

  • ha-manager - to manage the ha stack of the cluster
  • pvecm - to manage the cluster-manager
  • corosync* - to manipulate the corosync