Fencing: Difference between revisions
Eric Blevins (talk | contribs) No edit summary |
(archive) |
||
(61 intermediate revisions by 14 users not shown) | |||
Line 1: | Line 1: | ||
{{ | {{PVE3}} | ||
=Introduction= | == Introduction == | ||
To ensure data integrity, only one node is allowed to run a VM or any other cluster-service at a time. The use of power switches in the hardware configuration enables a node to power-cycle another node before restarting that node's HA services during a fail-over process. This prevents two nodes from simultaneously accessing the same data and corrupting it. Fence devices are used to guarantee data integrity under all failure conditions. | To ensure data integrity, only one node is allowed to run a VM or any other cluster-service at a time. The use of power switches in the hardware configuration enables a node to power-cycle another node before restarting that node's HA services during a fail-over process. This prevents two nodes from simultaneously accessing the same data and corrupting it. Fence devices are used to guarantee data integrity under all failure conditions. | ||
=Configure nodes to boot immediately and always after power cycle= | |||
Check your bios settings and test if | For a good easy introduction to HA fencing concepts device, see: http://www.clusterlabs.org/doc/crm_fencing.html | ||
and also http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/index.html | |||
obviously read the most general and introductive parts of these docs, as they are referring to other HA software, not pve. | |||
== Configure nodes to boot immediately and always after power cycle == | |||
Check your bios settings and test if it works. Just unplug the power cord and test if the server boots up after reconnecting. | |||
If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options: | If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options: | ||
Line 12: | Line 18: | ||
In any case, you need to '''make sure that the node turns off immediately when fenced.''' If you have delays here, the HA resources cannot be moved. | In any case, you need to '''make sure that the node turns off immediately when fenced.''' If you have delays here, the HA resources cannot be moved. | ||
=List of supported fence devices= | == Enable fencing on all nodes == | ||
In order to get fencing active, you also need to join each node to the fencing domain. Do the following on all your cluster nodes. | |||
*Enable fencing in /etc/default/redhat-cluster-pve (Just uncomment the last line, see below): | |||
nano /etc/default/redhat-cluster-pve | |||
<pre> | |||
FENCE_JOIN="yes" | |||
</pre> | |||
*restart cman service: | |||
/etc/init.d/cman restart | |||
*join the fence domain with: | |||
fence_tool join | |||
To check the status, just run (this example shows all 3 nodes already joined): | |||
fence_tool ls | |||
<pre>fence domain | |||
member count 3 | |||
victim count 0 | |||
victim now 0 | |||
master nodeid 1 | |||
wait state none | |||
members 1 2 3</pre> | |||
'''Note''' | |||
If the cluster goes out of sync, when you complete the join and restart cman service for all nodes, then you must also restart the service pve-cluster on all nodes: | |||
service pve-cluster restart | |||
=General HowTo for editing the cluster.conf= | |||
'''Note: ''this is no more valid under PVE 4''''' : You should rather read the page [[Editing_corosync.conf]] | |||
First, create a copy of the current cluster.conf, make the needed changes, increase the config_version number, check the syntax and if everything is ready, activate the new config via GUI. | |||
Here are the steps : | |||
cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new | |||
nano /etc/pve/cluster.conf.new | |||
If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings. | |||
You should validate the config with the following command ['''''Note: this is no more valid under PVE 4''''', see [[Editing_corosync.conf]]]: | |||
ccs_config_validate -v -f /etc/pve/cluster.conf.new | |||
In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via GUI to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically. | |||
== List of supported fence devices == | |||
==APC Switch Rack PDU== | === APC Switch Rack PDU === | ||
E.g. AP7921, here is a example used in our test lab. | E.g. AP7921, here is a example used in our test lab. | ||
===Create a user on the APC web interface=== | ==== Create a user on the APC web interface ==== | ||
I just configured a new user via "Outlet User Management" | I just configured a new user via "Outlet User Management" | ||
*user name: hpapc | *user name: hpapc | ||
Line 24: | Line 78: | ||
Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply. | Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply. | ||
===Example /etc/pve/cluster.conf.new with APC power fencing=== | ==== Example /etc/pve/cluster.conf.new with APC power fencing ==== | ||
This example uses the APC power switch as fencing device. Additionally, a simple "TestIP" is used for HA service and fail-over testing. | This example uses the APC power switch as fencing device (make sure you enabled SSH on your APC). Additionally, a simple "TestIP" is used for HA service and fail-over testing. | ||
cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new | cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new | ||
Line 39: | Line 93: | ||
<fencedevices> | <fencedevices> | ||
<fencedevice agent="fence_apc" ipaddr="192.168.2.30" login="hpapc" name="apc" passwd="12345678"/> | <fencedevice agent="fence_apc" ipaddr="192.168.2.30" login="hpapc" name="apc" passwd="12345678" power_wait="10"/> | ||
</fencedevices> | </fencedevices> | ||
Line 90: | Line 144: | ||
'''Note''' | '''Note''' | ||
If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings. | If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings. | ||
You should validate the config with the following command: | |||
ccs_config_validate -v -f /etc/pve/cluster.conf.new | |||
In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via gui to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically. | In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via gui to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically. | ||
The power_wait option specifies how long to wait between performing a power action. Without it the server will be turned off, then on in quick succession. Setting this ensures that the server will be turned off for a certain amount of time before being turned back on resulting in more reliable fencing. | |||
=== [[Intel Modular Server HA]] === | |||
=== [[Dell servers using iDRAC]] === | |||
Dell iDRAC cards can be used as fencing devices | |||
==== Create a user on the Dell iDRAC ==== | |||
* Create a fencing_user account on each iDRAC with 'Operator' permissions. | |||
* Although the iDRAC network is usually on a private, secure network, unique passwords for each machine can be entered in the configuration below. | |||
:* Configure your fence user under iDRAC User Authentication and add ''Operator'' status for iDRAC, LAN, and Serial Port. | |||
:* Set IPMI User Privileges to ''Operator'' and check Enable Serial Over LAN | |||
* Your proxmox hosts need to have network access, through ssh to your Dell iDRAC cards. | |||
* See [[Fencing#Testing_Dell_iDRAC|Testing Dell iDRAC]] to verify your syntax. | |||
==== Example /etc/pve/cluster.conf.new with iDRAC ==== | |||
This config was tested with DRAC V7 cards. | |||
See [[Fencing#General_HowTo_for_editing_the_cluster.conf|above]] for editing/activation steps. | |||
nano /etc/pve/cluster.conf.new | |||
<source lang="xml"> | <source lang="xml"> | ||
<?xml version="1.0"?> | <?xml version="1.0"?> | ||
<cluster name=" | <cluster name="peR620" config_version="28"> | ||
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"> | <cman keyfile="/var/lib/pve-cluster/corosync.authkey"> | ||
</cman> | </cman> | ||
<fencedevices> | <fencedevices> | ||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login=" | <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node1-drac" passwd="XXXX" secure="1"/> | ||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login=" | <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node2-drac" passwd="XXXX" secure="1"/> | ||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login=" | <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node3-drac" passwd="XXXX" secure="1"/> | ||
</fencedevices> | </fencedevices> | ||
<clusternodes> | <clusternodes> | ||
Line 174: | Line 204: | ||
</clusternode> | </clusternode> | ||
</clusternodes> | </clusternodes> | ||
</cluster> | </cluster> | ||
</source> | |||
See [[Fencing#General_HowTo_for_editing_the_cluster.conf|above]] for editing/activation steps. | |||
For Dell iDRAC5 Cards you can basically use the same config as for DRAC7, but you need to change the ''fencedevice'' commands to: | |||
<source lang="xml"> | |||
<fencedevices> | |||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node1-drac" passwd="XXXX" secure="1"/> | |||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node2-drac" passwd="XXXX" secure="1"/> | |||
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node3-drac" passwd="XXXX" secure="1"/> | |||
</fencedevices> | |||
</source> | </source> | ||
==[[Dell blade servers]]== | === [[Dell blade servers]] === | ||
PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. | PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. | ||
Line 194: | Line 230: | ||
Example: | Example: | ||
<source lang="xml"> | <source lang="xml"> | ||
Line 238: | Line 272: | ||
</source> | </source> | ||
[[ | === [[IPMI (generic)]] === | ||
This is a generic method for IPMI | |||
this is needed on all nodes 2013-07-02 . see notes at end of section. | |||
aptitude install ipmitool | |||
<source lang="xml"> | |||
<?xml version="1.0"?> | |||
<cluster name="clustername" config_version="6"> | |||
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"> | |||
</cman> | |||
<fencedevices> | |||
<fencedevice agent="fence_ipmilan" name="ipmi1" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/> | |||
<fencedevice agent="fence_ipmilan" name="ipmi2" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/> | |||
<fencedevice agent="fence_ipmilan" name="ipmi3" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/> | |||
</fencedevices> | |||
<clusternodes> | |||
<clusternode name="host1" votes="1" nodeid="1"> | |||
<fence> | |||
<method name="1"> | |||
<device name="ipmi1"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="host2" votes="1" nodeid="2"> | |||
<fence> | |||
<method name="1"> | |||
<device name="ipmi2"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="host3" votes="1" nodeid="3"> | |||
<fence> | |||
<method name="1"> | |||
<device name="ipmi3"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</clusternodes> | |||
<rm> | |||
<service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate"> | |||
<ip address="192.168.7.180"/> | |||
</service> | |||
</rm> | |||
</cluster> | |||
</source> | |||
==== IPMI notes ==== | |||
After setting up IPMI in cluster.conf, I tested and got this: | |||
<pre> | |||
fbc3 ~ # fence_node fbc240 -vv | |||
fence fbc240 dev 0.0 agent fence_ipmilan result: error from agent | |||
agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=**** passwd=****** power_wait=5 | |||
fence fbc240 failed | |||
</pre> | |||
which was solved with | |||
aptitude install ipmitool | |||
then: | |||
<pre> | |||
fbc3 ~ # fence_node fbc240 -vv | |||
fence fbc240 dev 0.0 agent fence_ipmilan result: success | |||
agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=***** passwd=*** power_wait=5 | |||
fence fbc240 success | |||
</pre> | |||
The above was tested on a SuperMicro system. | |||
=== APC Master Switch === | |||
Some old APC PDUs do not support SSH and do not work with fence_apc. | |||
These older units do work with SNMP allowing the fence agent fence_apc_snmp to work. | |||
<source lang="xml"> | |||
<?xml version="1.0"?> | |||
<cluster name="hpcluster765" config_version="28"> | |||
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"> | |||
</cman> | |||
<fencedevices> | |||
<fencedevice agent="fence_apc_snmp" ipaddr="192.168.2.30" name="apc" community="12345678" power_wait="10"/> | |||
</fencedevices> | |||
<clusternodes> | |||
<clusternode name="hp4" votes="1" nodeid="1"> | |||
<fence> | |||
<method name="power"> | |||
<device name="apc" port="4" /> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="hp1" votes="1" nodeid="2"> | |||
<fence> | |||
<method name="power"> | |||
<device name="apc" port="1" /> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="hp3" votes="1" nodeid="3"> | |||
<fence> | |||
<method name="power"> | |||
<device name="apc" port="3" /> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="hp2" votes="1" nodeid="4"> | |||
<fence> | |||
<method name="power"> | |||
<device name="apc" port="2" /> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</clusternodes> | |||
<rm> | |||
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate"> | |||
<ip address="192.168.7.180"/> | |||
</service> | |||
</rm> | |||
</cluster> | |||
</source> | |||
=== Fencing using a managed switch === | |||
'''Prerequisites''': | |||
#A managed switch supporting SNMP | |||
#Write access to the switch through SNMP | |||
<br> | |||
The idea behind this method is to either isolate the entire node or isolate the node from shared storage. The way this is done is to call the switch using the proper command to disable one or more port(s) on the switch and doing so effectively avoid the node from being able to start a VM or CT on the shared storage since no route will exists to the shared storage from the node. Restoring the access to the shared storage requires operator intervention on the switch or by running the fence command with the option to open the port(s) again. If the nodes are using bonding you need to disable the bridge aggregation on the switch and not the individual ports which is members of the bridge aggregation. | |||
The shown example here uses SNMPv2c without password but a configured ACL on the switch only allowing members running on the cluster vlan access to the configured fencing group on the switch. The fence_agent supports both an index number or the name for the ports. | |||
'''See list of known interfaces on the switch''': fence_ifmib -o list -c <community> -a <IP> -n switch | |||
'''Disable a specific interface on the switch''': fence_ifmib --action=off -c <community> -a <IP> -n <index|name> | |||
'''Enable a specific interface on the switch''': fence_ifmib --action=on -c <community> -a <IP> -n <index|name> | |||
<br> | |||
<br> | |||
Example: | |||
<source lang="xml"><?xml version="1.0"?> | |||
<cluster config_version="74" name="proxmox"> | |||
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/> | |||
<quorumd allow_kill="0" interval="3" label="proxmox1_qdisk" tko="10" votes="1"> | |||
<heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/> | |||
<heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/> | |||
</quorumd> | |||
<totem token="54000"/> | |||
<fencedevices> | |||
<fencedevice agent="fence_ifmib" community="fencing" ipaddr="172.16.3.254" name="hp1910" snmp_version="2c"/> | |||
</fencedevices> | |||
<clusternodes> | |||
<clusternode name="esx1" nodeid="1" votes="1"> | |||
<fence> | |||
<method name="fence"> | |||
<device action="off" name="hp1910" port="Bridge-Aggregation2"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<clusternode name="esx2" nodeid="2" votes="1"> | |||
<fence> | |||
<method name="fence"> | |||
<device action="off" name="hp1910" port="Bridge-Aggregation3"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</clusternodes> | |||
<rm> | |||
<failoverdomains> | |||
<failoverdomain name="webfailover" ordered="0" restricted="1"> | |||
<failoverdomainnode name="esx1"/> | |||
<failoverdomainnode name="esx2"/> | |||
</failoverdomain> | |||
</failoverdomains> | |||
<resources> | |||
<ip address="172.16.3.7" monitor_link="5"/> | |||
</resources> | |||
<service autostart="1" domain="webfailover" name="web" recovery="relocate"> | |||
<ip ref="172.16.3.7"/> | |||
</service> | |||
<pvevm autostart="1" vmid="109"/> | |||
</rm> | |||
</cluster></source><br> | |||
== Multiple methods for a node == | |||
Note: See also '''man fenced''' | |||
In more advanced configurations, multiple fencing methods can be defined for a node. If fencing fails using the first method, fenced will try the next method, and continue to cycle through methods until one succeeds. | |||
<source lang="xml"> | |||
<clusternode name="node1" nodeid="1"> | |||
<fence> | |||
<method name="1"> | |||
<device name="myswitch" foo="x"/> | |||
</method> | |||
<method name="2"> | |||
<device name="another" bar="123"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<fencedevices> | |||
<fencedevice name="myswitch" agent="..." something="..."/> | |||
<fencedevice name="another" agent="..."/> | |||
</fencedevices> | |||
</source> | |||
== Dual path, redundant power == | |||
Note: See also '''man fenced''' | |||
Sometimes fencing a node requires disabling two power ports or two i/o paths. This is done by specifying two or more devices within a method. fenced will run the agent for the device twice, once for each device line, and both must succeed for fencing to be considered successful. | |||
<source lang="xml"> | |||
<clusternode name="node1" nodeid="1"> | |||
<fence> | |||
<method name="1"> | |||
<device name="sanswitch1" port="11"/> | |||
<device name="sanswitch2" port="11"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</source> | |||
When using power switches to fence nodes with dual power supplies, the agents must be told to turn off both power ports before restoring power to either port. The default off-on behavior of the agent could result | |||
in the power never being fully disabled to the node. | |||
<source lang="xml"> | |||
<clusternode name="node1" nodeid="1"> | |||
<fence> | |||
<method name="1"> | |||
<device name="nps1" port="11" action="off"/> | |||
<device name="nps2" port="11" action="off"/> | |||
<device name="nps1" port="11" action="on"/> | |||
<device name="nps2" port="11" action="on"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</source> | |||
=Test fencing= | |||
Before you use the fencing device, make sure that it works as expected: | |||
Display internal fenced state: | |||
fence_tool ls | |||
<source lang="xml"> | |||
fence domain | |||
member count 3 | |||
victim count 0 | |||
victim now 0 | |||
master nodeid 3 | |||
wait state none | |||
members 2 3 4 | |||
</source> | |||
Test fencing with '''fence_node''': | |||
fence_node NODENAME -vv | |||
Where NODENAME is from the node definition line: | |||
<clusternode name="NODENAME" nodeid="1" votes="1"> | |||
You should get a "success" here and your machine powers off. | |||
Repeat the command and the machine powers back on. | |||
=== Testing APC Switch Rack PDU === | |||
In my example configuration, the AP7921 uses the IP 192.168.2.30. | |||
Query the status of power supply: | |||
fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o status -n 1 -v | |||
Reboot the server using fence_apc: | |||
fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o reboot -n 1 -v | |||
Test fencing with '''fence_node''': | |||
fence_node NODENAME -vv | |||
You should get a "success" here. | |||
=== Testing Dell iDRAC === | |||
iDRAC[5,6,7] use the fence_drac5 agent, as indicated in the Dell settings above. | |||
Test on the command line of another server using: | |||
fence_drac5 --ip="10.1.1.2" --username="prox-b-drac" --password="****" --ssh --verbose --debug-file="/tmp/foo" --command-prompt="admin1->" --action="off" | |||
Check the /tmp/foo file for connection logs. | |||
* Can you ssh into the iDRAC using the given username/password? | |||
* Is ssh enabled within iDRAC management? (Overview > iDRAC preferences > iDRAC Settings > Network > Services > ssh) | |||
nc -zv [ipOf_iDRAC] 22 | |||
* Are you trying to connect to the iDRAC management port or the Proxmox IP address? | |||
[[Category: | [[Category: Archive]] | ||
[[Category: Proxmox VE 3.x]] |
Latest revision as of 16:10, 18 July 2019
Note: This article is about the previous Proxmox VE 3.x releases |
Introduction
To ensure data integrity, only one node is allowed to run a VM or any other cluster-service at a time. The use of power switches in the hardware configuration enables a node to power-cycle another node before restarting that node's HA services during a fail-over process. This prevents two nodes from simultaneously accessing the same data and corrupting it. Fence devices are used to guarantee data integrity under all failure conditions.
For a good easy introduction to HA fencing concepts device, see: http://www.clusterlabs.org/doc/crm_fencing.html and also http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/index.html
obviously read the most general and introductive parts of these docs, as they are referring to other HA software, not pve.
Configure nodes to boot immediately and always after power cycle
Check your bios settings and test if it works. Just unplug the power cord and test if the server boots up after reconnecting.
If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options:
- make sure that you did not installed acpid (remove with: aptitude remove acpid)
- disable ACPI soft-off in the bios
- disable via acpi=off to the kernel boot command line
In any case, you need to make sure that the node turns off immediately when fenced. If you have delays here, the HA resources cannot be moved.
Enable fencing on all nodes
In order to get fencing active, you also need to join each node to the fencing domain. Do the following on all your cluster nodes.
- Enable fencing in /etc/default/redhat-cluster-pve (Just uncomment the last line, see below):
nano /etc/default/redhat-cluster-pve
FENCE_JOIN="yes"
- restart cman service:
/etc/init.d/cman restart
- join the fence domain with:
fence_tool join
To check the status, just run (this example shows all 3 nodes already joined):
fence_tool ls
fence domain member count 3 victim count 0 victim now 0 master nodeid 1 wait state none members 1 2 3
Note If the cluster goes out of sync, when you complete the join and restart cman service for all nodes, then you must also restart the service pve-cluster on all nodes:
service pve-cluster restart
General HowTo for editing the cluster.conf
Note: this is no more valid under PVE 4 : You should rather read the page Editing_corosync.conf
First, create a copy of the current cluster.conf, make the needed changes, increase the config_version number, check the syntax and if everything is ready, activate the new config via GUI.
Here are the steps :
cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
nano /etc/pve/cluster.conf.new
If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings.
You should validate the config with the following command [Note: this is no more valid under PVE 4, see Editing_corosync.conf]:
ccs_config_validate -v -f /etc/pve/cluster.conf.new
In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via GUI to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically.
List of supported fence devices
APC Switch Rack PDU
E.g. AP7921, here is a example used in our test lab.
Create a user on the APC web interface
I just configured a new user via "Outlet User Management"
- user name: hpapc
- password: 12345678
Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.
Example /etc/pve/cluster.conf.new with APC power fencing
This example uses the APC power switch as fencing device (make sure you enabled SSH on your APC). Additionally, a simple "TestIP" is used for HA service and fail-over testing.
cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
nano /etc/pve/cluster.conf.new
<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="192.168.2.30" login="hpapc" name="apc" passwd="12345678" power_wait="10"/>
</fencedevices>
<clusternodes>
<clusternode name="hp4" votes="1" nodeid="1">
<fence>
<method name="power">
<device name="apc" port="4" secure="on"/>
</method>
</fence>
</clusternode>
<clusternode name="hp1" votes="1" nodeid="2">
<fence>
<method name="power">
<device name="apc" port="1" secure="on"/>
</method>
</fence>
</clusternode>
<clusternode name="hp3" votes="1" nodeid="3">
<fence>
<method name="power">
<device name="apc" port="3" secure="on"/>
</method>
</fence>
</clusternode>
<clusternode name="hp2" votes="1" nodeid="4">
<fence>
<method name="power">
<device name="apc" port="2" secure="on"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="192.168.7.180"/>
</service>
</rm>
</cluster>
Note
If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings.
You should validate the config with the following command:
ccs_config_validate -v -f /etc/pve/cluster.conf.new
In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via gui to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically.
The power_wait option specifies how long to wait between performing a power action. Without it the server will be turned off, then on in quick succession. Setting this ensures that the server will be turned off for a certain amount of time before being turned back on resulting in more reliable fencing.
Intel Modular Server HA
Dell servers using iDRAC
Dell iDRAC cards can be used as fencing devices
Create a user on the Dell iDRAC
- Create a fencing_user account on each iDRAC with 'Operator' permissions.
- Although the iDRAC network is usually on a private, secure network, unique passwords for each machine can be entered in the configuration below.
- Configure your fence user under iDRAC User Authentication and add Operator status for iDRAC, LAN, and Serial Port.
- Set IPMI User Privileges to Operator and check Enable Serial Over LAN
- Your proxmox hosts need to have network access, through ssh to your Dell iDRAC cards.
- See Testing Dell iDRAC to verify your syntax.
Example /etc/pve/cluster.conf.new with iDRAC
This config was tested with DRAC V7 cards.
See above for editing/activation steps.
nano /etc/pve/cluster.conf.new
<?xml version="1.0"?>
<cluster name="peR620" config_version="28">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node1-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node2-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="fencing_user" name="node3-drac" passwd="XXXX" secure="1"/>
</fencedevices>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node3-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>
See above for editing/activation steps.
For Dell iDRAC5 Cards you can basically use the same config as for DRAC7, but you need to change the fencedevice commands to:
<fencedevices>
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node1-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node2-drac" passwd="XXXX" secure="1"/>
<fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="fencing_user" name="node3-drac" passwd="XXXX" secure="1"/>
</fencedevices>
Dell blade servers
PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed.
NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See Bug 466788 for beta code and further discussions on this issue.
NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see Bug 496748.
To configure your nodes for DRAC CMC fencing:
- For CMC IP Address enter the DRAC CMC IP address.
- Enter the specific blade for Module Name. For example, enter server-1 for blade 1, and server-4 for blade 4.
Example:
<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_drac5" module_name="server-1" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade1" passwd="drac_password"/>
<fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade2" passwd="drac_password"/>
<fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade3" passwd="drac_password"/>
</fencedevices>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="drac-cmc-blade1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="drac-cmc-blade2"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="drac-cmc-blade3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="192.168.7.180"/>
</service>
</rm>
</cluster>
IPMI (generic)
This is a generic method for IPMI
this is needed on all nodes 2013-07-02 . see notes at end of section.
aptitude install ipmitool
<?xml version="1.0"?>
<cluster name="clustername" config_version="6">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_ipmilan" name="ipmi1" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
<fencedevice agent="fence_ipmilan" name="ipmi2" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
<fencedevice agent="fence_ipmilan" name="ipmi3" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="host1" votes="1" nodeid="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="host2" votes="1" nodeid="2">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="host3" votes="1" nodeid="3">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
<ip address="192.168.7.180"/>
</service>
</rm>
</cluster>
IPMI notes
After setting up IPMI in cluster.conf, I tested and got this:
fbc3 ~ # fence_node fbc240 -vv fence fbc240 dev 0.0 agent fence_ipmilan result: error from agent agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=**** passwd=****** power_wait=5 fence fbc240 failed
which was solved with
aptitude install ipmitool
then:
fbc3 ~ # fence_node fbc240 -vv fence fbc240 dev 0.0 agent fence_ipmilan result: success agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=***** passwd=*** power_wait=5 fence fbc240 success
The above was tested on a SuperMicro system.
APC Master Switch
Some old APC PDUs do not support SSH and do not work with fence_apc. These older units do work with SNMP allowing the fence agent fence_apc_snmp to work.
<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<fencedevices>
<fencedevice agent="fence_apc_snmp" ipaddr="192.168.2.30" name="apc" community="12345678" power_wait="10"/>
</fencedevices>
<clusternodes>
<clusternode name="hp4" votes="1" nodeid="1">
<fence>
<method name="power">
<device name="apc" port="4" />
</method>
</fence>
</clusternode>
<clusternode name="hp1" votes="1" nodeid="2">
<fence>
<method name="power">
<device name="apc" port="1" />
</method>
</fence>
</clusternode>
<clusternode name="hp3" votes="1" nodeid="3">
<fence>
<method name="power">
<device name="apc" port="3" />
</method>
</fence>
</clusternode>
<clusternode name="hp2" votes="1" nodeid="4">
<fence>
<method name="power">
<device name="apc" port="2" />
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="192.168.7.180"/>
</service>
</rm>
</cluster>
Fencing using a managed switch
Prerequisites:
- A managed switch supporting SNMP
- Write access to the switch through SNMP
The idea behind this method is to either isolate the entire node or isolate the node from shared storage. The way this is done is to call the switch using the proper command to disable one or more port(s) on the switch and doing so effectively avoid the node from being able to start a VM or CT on the shared storage since no route will exists to the shared storage from the node. Restoring the access to the shared storage requires operator intervention on the switch or by running the fence command with the option to open the port(s) again. If the nodes are using bonding you need to disable the bridge aggregation on the switch and not the individual ports which is members of the bridge aggregation.
The shown example here uses SNMPv2c without password but a configured ACL on the switch only allowing members running on the cluster vlan access to the configured fencing group on the switch. The fence_agent supports both an index number or the name for the ports.
See list of known interfaces on the switch: fence_ifmib -o list -c <community> -a <IP> -n switch
Disable a specific interface on the switch: fence_ifmib --action=off -c <community> -a <IP> -n <index|name>
Enable a specific interface on the switch: fence_ifmib --action=on -c <community> -a <IP> -n <index|name>
Example:
<?xml version="1.0"?>
<cluster config_version="74" name="proxmox">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd allow_kill="0" interval="3" label="proxmox1_qdisk" tko="10" votes="1">
<heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
<heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/>
</quorumd>
<totem token="54000"/>
<fencedevices>
<fencedevice agent="fence_ifmib" community="fencing" ipaddr="172.16.3.254" name="hp1910" snmp_version="2c"/>
</fencedevices>
<clusternodes>
<clusternode name="esx1" nodeid="1" votes="1">
<fence>
<method name="fence">
<device action="off" name="hp1910" port="Bridge-Aggregation2"/>
</method>
</fence>
</clusternode>
<clusternode name="esx2" nodeid="2" votes="1">
<fence>
<method name="fence">
<device action="off" name="hp1910" port="Bridge-Aggregation3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<failoverdomains>
<failoverdomain name="webfailover" ordered="0" restricted="1">
<failoverdomainnode name="esx1"/>
<failoverdomainnode name="esx2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.16.3.7" monitor_link="5"/>
</resources>
<service autostart="1" domain="webfailover" name="web" recovery="relocate">
<ip ref="172.16.3.7"/>
</service>
<pvevm autostart="1" vmid="109"/>
</rm>
</cluster>
Multiple methods for a node
Note: See also man fenced
In more advanced configurations, multiple fencing methods can be defined for a node. If fencing fails using the first method, fenced will try the next method, and continue to cycle through methods until one succeeds.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="myswitch" foo="x"/>
</method>
<method name="2">
<device name="another" bar="123"/>
</method>
</fence>
</clusternode>
<fencedevices>
<fencedevice name="myswitch" agent="..." something="..."/>
<fencedevice name="another" agent="..."/>
</fencedevices>
Dual path, redundant power
Note: See also man fenced
Sometimes fencing a node requires disabling two power ports or two i/o paths. This is done by specifying two or more devices within a method. fenced will run the agent for the device twice, once for each device line, and both must succeed for fencing to be considered successful.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="sanswitch1" port="11"/>
<device name="sanswitch2" port="11"/>
</method>
</fence>
</clusternode>
When using power switches to fence nodes with dual power supplies, the agents must be told to turn off both power ports before restoring power to either port. The default off-on behavior of the agent could result in the power never being fully disabled to the node.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="nps1" port="11" action="off"/>
<device name="nps2" port="11" action="off"/>
<device name="nps1" port="11" action="on"/>
<device name="nps2" port="11" action="on"/>
</method>
</fence>
</clusternode>
Test fencing
Before you use the fencing device, make sure that it works as expected:
Display internal fenced state:
fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 3
wait state none
members 2 3 4
Test fencing with fence_node:
fence_node NODENAME -vv
Where NODENAME is from the node definition line:
<clusternode name="NODENAME" nodeid="1" votes="1">
You should get a "success" here and your machine powers off.
Repeat the command and the machine powers back on.
Testing APC Switch Rack PDU
In my example configuration, the AP7921 uses the IP 192.168.2.30.
Query the status of power supply:
fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o status -n 1 -v
Reboot the server using fence_apc:
fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o reboot -n 1 -v
Test fencing with fence_node:
fence_node NODENAME -vv
You should get a "success" here.
Testing Dell iDRAC
iDRAC[5,6,7] use the fence_drac5 agent, as indicated in the Dell settings above.
Test on the command line of another server using:
fence_drac5 --ip="10.1.1.2" --username="prox-b-drac" --password="****" --ssh --verbose --debug-file="/tmp/foo" --command-prompt="admin1->" --action="off"
Check the /tmp/foo file for connection logs.
- Can you ssh into the iDRAC using the given username/password?
- Is ssh enabled within iDRAC management? (Overview > iDRAC preferences > iDRAC Settings > Network > Services > ssh)
nc -zv [ipOf_iDRAC] 22
- Are you trying to connect to the iDRAC management port or the Proxmox IP address?