High Availability Cluster 4.x: Difference between revisions
(76 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
{{Note|Most information was moved to our [http://pve.proxmox.com/pve-docs/ reference documentation], see [[High Availability]].}} | |||
== Introduction == | |||
Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes. | Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes. | ||
Line 10: | Line 9: | ||
In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator. | In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator. | ||
[[ | For a more up to date documentation see [[High Availability]] | ||
== | == HA Simulator == | ||
[[Image:Screen-HA-4-simulator.png|HA Simulator in Action|thumb]] | |||
By using the HA simulator you can test and learn all functionalities of the Proxmox VE HA solutions. | |||
The simulator allows you to watch and test the behaviour of a real-world 3 node cluster with 6 VM's. | |||
You do not have to setup or configure a real cluster, the HA simulator runs out of the box on the current code base. | |||
Install with apt: | |||
apt-get install pve-ha-simulator | |||
To start the simulator you must have a X11 redirection to your current system. | |||
If you are on a Linux machine you can use: | |||
ssh root@<IPofPVE4> -Y | |||
On Windows it is working with [http://mobaxterm.mobatek.net/ mobaxterm]. | |||
After starting the simulator create a working directory: | |||
mkdir working | |||
To start the simulator type | |||
pve-ha-simulator working/ | |||
== | == Hardware Watchdogs == | ||
if no hardware watchdog is defined, proxmox is loading the softdog module, | |||
which emulate the /dev/watchdog device. | |||
< | To enable a hardware watchdog, you need to specify the module to load | ||
<pre> | |||
/etc/default/pve-ha-manager | |||
WATCHDOG_MODULE=mywatchdogmodule | |||
</pre> | |||
also, please disable nmi watchdog, which is embedeed in cpu apic. | |||
edit: /etc/default/grub | |||
<pre> | <pre> | ||
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0" | |||
</pre> | </pre> | ||
the | |||
<pre> | |||
# update-grub | |||
</pre> | |||
=== iTCO Watchdog (module "iTCO_wdt") === | |||
This is an hardware watchdog, available in almost all intels motherboard (ich chipset) since 15 years. | |||
=== IPMI Watchdog (module "ipmi_watchdog") === | |||
For IPMI Watchdogs you may have to set the action, else it may not do anything when it triggers. | |||
For this purpose edit the /etc/modprobe.d/ipmi_watchdog.conf (simple create the file): | |||
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10 | |||
'''NOTE''': reboot or reload ipmi_watchdog module to take the changes in effect. | |||
=== Dell IDrac (module "ipmi_watchdog") === | |||
For Dell IDrac, please desactivate the Automated System Recovery Agent in IDrac configuration. | |||
== | [[File:Idrac-asr.png|thumb]] | ||
If openmanage is installed, you need to disable watchdog management from openmanage | |||
<pre> | |||
/opt/dell/srvadmin/sbin/dcecfg command=removepopalias aliasname=dcifru | |||
</pre> | |||
and reboot server | |||
After restart, check that watchdog timer is 10s, and not overrided by openmanage | |||
<pre> | <pre> | ||
idracadm getsysinfo -w | |||
Watchdog Information: | |||
Recovery Action = Power Cycle | |||
Present countdown value = 9 seconds | |||
Initial countdown value = 10 seconds | |||
</pre> | </pre> | ||
or | |||
<pre> | <pre> | ||
# ipmitool mc watchdog get | |||
Watchdog Timer Use: SMS/OS (0x44) | |||
Watchdog Timer Is: Started/Running | |||
Watchdog Timer Actions: Hard Reset (0x01) | |||
Pre-timeout interval: 0 seconds | |||
Timer Expiration Flags: 0x00 | |||
Initial Countdown: 10 sec | |||
Present Countdown: 9 sec | |||
</pre> | </pre> | ||
=== HP ILO (module "hpwdt" )=== | |||
Users have reported crash with this module, | |||
please test and update the wiki if it's working fine | |||
also, disable HP ASR feature (Automatic Server Recovery). | |||
http://h17007.www1.hp.com/docs/iss/proliant_uefi/s_asr_status.html | |||
If you have installed hp management tools, | |||
you need to disable "hp-asrd" daemon | |||
== Troubleshooting == | |||
=== Error recovery === | |||
See [[High Availability#ha_manager_error_recovery|High Availability - Error Recovery]] | |||
=== Failed watchdog-mux or Multiple Watchdogs === | |||
Disable all BIOS watchdog functionality, those settings setup the watchdog in the expectancy that the OS resets it, that is not our desired use case here and may lead to problems - e.g.: reset of the node after a fixed amount of time. | |||
Intel AMT (OS Health Watchdog) should be disabled and with it the ''mei'' and ''mei_me'' modules, as they may cause problems. | |||
If you host has multiple watchdogs available, only allow the one you want to use for HA, i.e. blacklist the other modules from loading. | |||
Our watchdog multiplexer will use /dev/watchdog which maps to /dev/watchdog0. | |||
Selecting a specific watchdog is not implemented, mainly for this quote from the linux-watchdog mailing list:[http://www.spinics.net/lists/linux-watchdog/msg04091.html] | |||
The | The watchdog device node <-> driver mapping is fragile and | ||
can change from one kernel version to the next or even across reboot, so | |||
users shouldn't assume it to be persistent. | |||
= | === Deleting Nodes From The Cluster === | ||
=Testing= | When deleting a node from a HA cluster you have to ensure the following: | ||
* all HA services were relocate to another node! | |||
* Remove the node from all defined groups. | |||
* Shutdown the node you want to remove, from now on this node '''MUST NOT''' come online in the same network again, without being reinstalled/cleared of all cluster configurations. | |||
* execute `pvecm delnode nodename` from an remaining node. | |||
The HA stack now places the node in an 'gone' state, you still see it in the manager status. | |||
After an hour in this state it will be auto deleted. This ensures that if the node died ungracefully the services still will be fenced and migrated to another node. | |||
=== Durations === | |||
Note that some HA actions may take their time, and don't happen instantly. This avoids out of control feedback loops, and ensures that the HA stack is in a safe and consistent state all the time. | |||
=== Container === | |||
Note that while containers may be put under HA, currently (PVE 4.1) they don't support live migration. So all migrate actions on them will be mapped to relocate (stop, move, start). | |||
Recoveries on node failure work, as long as you don't use local resources. | |||
== Video Tutorials == | |||
[http://www.youtube.com/user/ProxmoxVE Proxmox VE Youtube channel] | |||
== Testing == | |||
Before going into production it is highly recommended to do as many tests as possible. Then, do some more. | Before going into production it is highly recommended to do as many tests as possible. Then, do some more. | ||
==Useful command line tools== | |||
=== Useful command line tools === | |||
Here is a list of useful CLI tools: | Here is a list of useful CLI tools: | ||
*ha- | *ha-manager - to manage the ha stack of the cluster | ||
*pvecm - to manage the cluster-manager | *pvecm - to manage the cluster-manager | ||
*corosync* - to manipulate the corosync | *corosync* - to manipulate the corosync | ||
[[Category: | [[Category: Archive]][[Category: Installation]] |
Latest revision as of 10:50, 17 August 2020
Note: Most information was moved to our reference documentation, see High Availability. |
Introduction
Proxmox VE High Availability Cluster (Proxmox VE HA Cluster) enables the definition of high available virtual machines. In simple words, if a virtual machine (VM) is configured as HA and the physical host fails, the VM is automatically restarted on one of the remaining Proxmox VE Cluster nodes.
The Proxmox VE HA Cluster is based on the Proxmox VE HA Manager (pve-ha-manager) - using watchdog fencing. Major benefit of Linux softdog or hardware watchdog is zero configuration - it just works out of the box.
In order to learn more about functionality of the new Proxmox VE HA manager, install the HA simulator.
For a more up to date documentation see High Availability
HA Simulator
By using the HA simulator you can test and learn all functionalities of the Proxmox VE HA solutions.
The simulator allows you to watch and test the behaviour of a real-world 3 node cluster with 6 VM's.
You do not have to setup or configure a real cluster, the HA simulator runs out of the box on the current code base.
Install with apt:
apt-get install pve-ha-simulator
To start the simulator you must have a X11 redirection to your current system.
If you are on a Linux machine you can use:
ssh root@<IPofPVE4> -Y
On Windows it is working with mobaxterm.
After starting the simulator create a working directory:
mkdir working
To start the simulator type
pve-ha-simulator working/
Hardware Watchdogs
if no hardware watchdog is defined, proxmox is loading the softdog module, which emulate the /dev/watchdog device.
To enable a hardware watchdog, you need to specify the module to load
/etc/default/pve-ha-manager WATCHDOG_MODULE=mywatchdogmodule
also, please disable nmi watchdog, which is embedeed in cpu apic.
edit: /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet nmi_watchdog=0"
the
# update-grub
iTCO Watchdog (module "iTCO_wdt")
This is an hardware watchdog, available in almost all intels motherboard (ich chipset) since 15 years.
IPMI Watchdog (module "ipmi_watchdog")
For IPMI Watchdogs you may have to set the action, else it may not do anything when it triggers.
For this purpose edit the /etc/modprobe.d/ipmi_watchdog.conf (simple create the file):
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10
NOTE: reboot or reload ipmi_watchdog module to take the changes in effect.
Dell IDrac (module "ipmi_watchdog")
For Dell IDrac, please desactivate the Automated System Recovery Agent in IDrac configuration.
If openmanage is installed, you need to disable watchdog management from openmanage
/opt/dell/srvadmin/sbin/dcecfg command=removepopalias aliasname=dcifru
and reboot server
After restart, check that watchdog timer is 10s, and not overrided by openmanage
idracadm getsysinfo -w Watchdog Information: Recovery Action = Power Cycle Present countdown value = 9 seconds Initial countdown value = 10 seconds
or
# ipmitool mc watchdog get Watchdog Timer Use: SMS/OS (0x44) Watchdog Timer Is: Started/Running Watchdog Timer Actions: Hard Reset (0x01) Pre-timeout interval: 0 seconds Timer Expiration Flags: 0x00 Initial Countdown: 10 sec Present Countdown: 9 sec
HP ILO (module "hpwdt" )
Users have reported crash with this module, please test and update the wiki if it's working fine
also, disable HP ASR feature (Automatic Server Recovery).
http://h17007.www1.hp.com/docs/iss/proliant_uefi/s_asr_status.html
If you have installed hp management tools, you need to disable "hp-asrd" daemon
Troubleshooting
Error recovery
See High Availability - Error Recovery
Failed watchdog-mux or Multiple Watchdogs
Disable all BIOS watchdog functionality, those settings setup the watchdog in the expectancy that the OS resets it, that is not our desired use case here and may lead to problems - e.g.: reset of the node after a fixed amount of time.
Intel AMT (OS Health Watchdog) should be disabled and with it the mei and mei_me modules, as they may cause problems.
If you host has multiple watchdogs available, only allow the one you want to use for HA, i.e. blacklist the other modules from loading. Our watchdog multiplexer will use /dev/watchdog which maps to /dev/watchdog0.
Selecting a specific watchdog is not implemented, mainly for this quote from the linux-watchdog mailing list:[1]
The watchdog device node <-> driver mapping is fragile and can change from one kernel version to the next or even across reboot, so users shouldn't assume it to be persistent.
Deleting Nodes From The Cluster
When deleting a node from a HA cluster you have to ensure the following:
- all HA services were relocate to another node!
- Remove the node from all defined groups.
- Shutdown the node you want to remove, from now on this node MUST NOT come online in the same network again, without being reinstalled/cleared of all cluster configurations.
- execute `pvecm delnode nodename` from an remaining node.
The HA stack now places the node in an 'gone' state, you still see it in the manager status. After an hour in this state it will be auto deleted. This ensures that if the node died ungracefully the services still will be fenced and migrated to another node.
Durations
Note that some HA actions may take their time, and don't happen instantly. This avoids out of control feedback loops, and ensures that the HA stack is in a safe and consistent state all the time.
Container
Note that while containers may be put under HA, currently (PVE 4.1) they don't support live migration. So all migrate actions on them will be mapped to relocate (stop, move, start). Recoveries on node failure work, as long as you don't use local resources.
Video Tutorials
Testing
Before going into production it is highly recommended to do as many tests as possible. Then, do some more.
Useful command line tools
Here is a list of useful CLI tools:
- ha-manager - to manage the ha stack of the cluster
- pvecm - to manage the cluster-manager
- corosync* - to manipulate the corosync