Introduction

To ensure data integrity, only one node is allowed to run a VM or any other cluster-service at a time. The use of power switches in the hardware configuration enables a node to power-cycle another node before restarting that node's HA services during a fail-over process. This prevents two nodes from simultaneously accessing the same data and corrupting it. Fence devices are used to guarantee data integrity under all failure conditions.

For a good easy introduction to HA fencing concepts device, see: http://www.clusterlabs.org/doc/crm_fencing.html and also http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/index.html

obviously read the most general and introductive parts of these docs, as they are referring to other HA software, not pve.

Configure nodes to boot immediately and always after power cycle

Check your bios settings and test if it works. Just unplug the power cord and test if the server boots up after reconnecting.

If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options:

make sure that you did not installed acpid (remove with: aptitude remove acpid)
disable ACPI soft-off in the bios
disable via acpi=off to the kernel boot command line

In any case, you need to make sure that the node turns off immediately when fenced. If you have delays here, the HA resources cannot be moved.

Enable fencing on all nodes

In order to get fencing active, you also need to join each node to the fencing domain. Do the following on all your cluster nodes.

Enable fencing in /etc/default/redhat-cluster-pve (Just uncomment the last line, see below):

nano /etc/default/redhat-cluster-pve

FENCE_JOIN="yes"

restart cman service:

/etc/init.d/cman restart

join the fence domain with:

fence_tool join

To check the status, just run (this example shows all 3 nodes already joined):

fence_tool ls

fence domain
member count  3
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2 3

Note If the cluster goes out of sync, when you complete the join and restart cman service for all nodes, then you must also restart the service pve-cluster on all nodes:

service pve-cluster restart

General HowTo for editing the cluster.conf

First, create a copy of the current cluster.conf, make the needed changes, increase the config_version number, check the syntax and if everything is ready, activate the new config via GUI.

Here are the steps:

cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new

nano /etc/pve/cluster.conf.new

If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings.

You should validate the config with the following command:

ccs_config_validate -v -f /etc/pve/cluster.conf.new

In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via GUI to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically.

List of supported fence devices

APC Switch Rack PDU

E.g. AP7921, here is a example used in our test lab.

Create a user on the APC web interface

I just configured a new user via "Outlet User Management"

user name: hpapc
password: 12345678

Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.

Example /etc/pve/cluster.conf.new with APC power fencing

This example uses the APC power switch as fencing device (make sure you enabled SSH on your APC). Additionally, a simple "TestIP" is used for HA service and fail-over testing.

cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new

nano /etc/pve/cluster.conf.new

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <fencedevices>
    <fencedevice agent="fence_apc" ipaddr="192.168.2.30" login="hpapc" name="apc" passwd="12345678" power_wait="10"/>
  </fencedevices>

  <clusternodes>

  <clusternode name="hp4" votes="1" nodeid="1">
    <fence>
      <method name="power">
        <device name="apc" port="4" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp1" votes="1" nodeid="2">
    <fence>
      <method name="power">
        <device name="apc" port="1" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp3" votes="1" nodeid="3">
    <fence>
      <method name="power">
        <device name="apc" port="3" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp2" votes="1" nodeid="4">
    <fence>
      <method name="power">
        <device name="apc" port="2" secure="on"/>
      </method>
    </fence>
  </clusternode>

  </clusternodes>

  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>

</cluster>

Note

If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings.

You should validate the config with the following command:

ccs_config_validate -v -f /etc/pve/cluster.conf.new

In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via gui to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically.

The power_wait option specifies how long to wait between performing a power action. Without it the server will be turned off, then on in quick succession. Setting this ensures that the server will be turned off for a certain amount of time before being turned back on resulting in more reliable fencing.

Intel Modular Server HA

Dell servers

Dell iDRAC cards can be used as fencing devices:

Rather than providing your root or Admin login credientials to the iDRAC, a fencing account can be created on each iDRAC with 'Operator' permissions.
Although the iDRAC network is usually on a private, secure network, unique passwords for each machine can be entered in the configuration below.

Configure your fence user under iDRAC User Authentication and add Operator status for iDRAC, LAN, and Serial Port.
Set IPMI User Privileges to Operator and check Enable Serial Over LAN

Your proxmox hosts need to have network access, through ssh to your Dell iDRAC cards.
See Testing Dell Servers to verify your syntax.

This config was tested with DRAC V7 cards.

<?xml version="1.0"?>
<cluster name="peR620" config_version="28">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>
  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node1-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node2-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node3-drac" passwd="XXXX" secure="1"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="node1-drac"/>
        </method>
      </fence>
  </clusternode>
  <clusternode name="node2" nodeid="2" votes="1">
    <fence>
      <method name="1">
        <device name="node2-drac"/>
      </method>
    </fence>
  </clusternode>
  <clusternode name="node3" nodeid="3" votes="1">
    <fence>
      <method name="1">
        <device name="node3-drac"/>
      </method>
    </fence>
  </clusternode>
  </clusternodes>
</cluster>

For Dell iDRAC5 Cards you can basically use the same config as for DRAC7, but you need to change the fencedevice commands to:

  <fencedevices>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node1-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node2-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node3-drac" passwd="XXXX" secure="1"/>
  </fencedevices>

Dell blade servers

PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed.

NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See Bug 466788 for beta code and further discussions on this issue.

NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see Bug 496748.

To configure your nodes for DRAC CMC fencing:

For CMC IP Address enter the DRAC CMC IP address.
Enter the specific blade for Module Name. For example, enter server-1 for blade 1, and server-4 for blade 4.

Example:

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>
  <fencedevices>
       <fencedevice agent="fence_drac5" module_name="server-1" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade1" passwd="drac_password"/>
       <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade2" passwd="drac_password"/>
       <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade3" passwd="drac_password"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="drac-cmc-blade1"/>
        </method>
      </fence>
  </clusternode>
  <clusternode name="node2" nodeid="2" votes="1">
    <fence>
      <method name="1">
        <device name="drac-cmc-blade2"/>
      </method>
    </fence>
  </clusternode>
  <clusternode name="node3" nodeid="3" votes="1">
    <fence>
      <method name="1">
        <device name="drac-cmc-blade3"/>
      </method>
    </fence>
  </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>
</cluster>

IPMI (generic)

This is a generic method for IPMI

this is needed on all nodes 2013-07-02 . see notes at end of section.

aptitude install ipmitool

<?xml version="1.0"?>
<cluster name="clustername" config_version="6">
    <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
    </cman>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" name="ipmi1" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
        <fencedevice agent="fence_ipmilan" name="ipmi2" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
        <fencedevice agent="fence_ipmilan" name="ipmi3" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
    </fencedevices>
    <clusternodes>
    <clusternode name="host1" votes="1" nodeid="1">
        <fence>
            <method name="1">
                 <device name="ipmi1"/>
            </method>
        </fence>
    </clusternode>
    <clusternode name="host2" votes="1" nodeid="2">
        <fence>
            <method name="1">
                 <device name="ipmi2"/>
            </method>
        </fence>
    </clusternode>
    <clusternode name="host3" votes="1" nodeid="3">
        <fence>
            <method name="1">
                 <device name="ipmi3"/>
            </method>
        </fence>
    </clusternode>
</clusternodes>
<rm>
    <service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
        <ip address="192.168.7.180"/>
    </service>
</rm>
</cluster>

IPMI notes

After setting up IPMI in cluster.conf, I tested and got this:

fbc3  ~ # fence_node fbc240 -vv
fence fbc240 dev 0.0 agent fence_ipmilan result: error from agent
agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=**** passwd=****** power_wait=5 
fence fbc240 failed

which was solved with

aptitude install ipmitool

then:

fbc3  ~ # fence_node fbc240 -vv
fence fbc240 dev 0.0 agent fence_ipmilan result: success
agent args: nodename=fbc240 agent=fence_ipmilan lanplus=1 ipaddr=10.1.10.173 login=***** passwd=*** power_wait=5 
fence fbc240 success

The above was tested on a SuperMicro system.

APC Master Switch

Some old APC PDUs do not support SSH and do not work with fence_apc. These older units do work with SNMP allowing the fence agent fence_apc_snmp to work.

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <fencedevices>
    <fencedevice agent="fence_apc_snmp" ipaddr="192.168.2.30" name="apc" community="12345678" power_wait="10"/>

  </fencedevices>

  <clusternodes>

  <clusternode name="hp4" votes="1" nodeid="1">
    <fence>
      <method name="power">
        <device name="apc" port="4" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp1" votes="1" nodeid="2">
    <fence>
      <method name="power">
        <device name="apc" port="1" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp3" votes="1" nodeid="3">
    <fence>
      <method name="power">
        <device name="apc" port="3" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp2" votes="1" nodeid="4">
    <fence>
      <method name="power">
        <device name="apc" port="2" />
      </method>
    </fence>
  </clusternode>

  </clusternodes>

  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>

</cluster>

Fencing using a managed switch

Prerequisites:

A managed switch supporting SNMP
Write access to the switch through SNMP

The idea behind this method is to either isolate the entire node or isolate the node from shared storage. The way this is done is to call the switch using the proper command to disable one or more port(s) on the switch and doing so effectively avoid the node from being able to start a VM or CT on the shared storage since no route will exists to the shared storage from the node. Restoring the access to the shared storage requires operator intervention on the switch or by running the fence command with the option to open the port(s) again. If the nodes are using bonding you need to disable the bridge aggregation on the switch and not the individual ports which is members of the bridge aggregation.

The shown example here uses SNMPv2c without password but a configured ACL on the switch only allowing members running on the cluster vlan access to the configured fencing group on the switch. The fence_agent supports both an index number or the name for the ports.

See list of known interfaces on the switch: fence_ifmib -o list -c <community> -a <IP> -n switch

Disable a specific interface on the switch: fence_ifmib --action=off -c <community> -a <IP> -n <index|name>

Enable a specific interface on the switch: fence_ifmib --action=on -c <community> -a <IP> -n <index|name>

Example:

<?xml version="1.0"?>
<cluster config_version="74" name="proxmox">
 <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
 <quorumd allow_kill="0" interval="3" label="proxmox1_qdisk" tko="10" votes="1">
   <heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
   <heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/>
 </quorumd>
 <totem token="54000"/>
 <fencedevices>
   <fencedevice agent="fence_ifmib" community="fencing" ipaddr="172.16.3.254" name="hp1910" snmp_version="2c"/>
 </fencedevices>
 <clusternodes>
   <clusternode name="esx1" nodeid="1" votes="1">
     <fence>
       <method name="fence">
         <device action="off" name="hp1910" port="Bridge-Aggregation2"/>
       </method>
     </fence>
   </clusternode>
   <clusternode name="esx2" nodeid="2" votes="1">
     <fence>
       <method name="fence">
         <device action="off" name="hp1910" port="Bridge-Aggregation3"/>
       </method>
     </fence>
   </clusternode>
 </clusternodes>
 <rm>
   <failoverdomains>
     <failoverdomain name="webfailover" ordered="0" restricted="1">
       <failoverdomainnode name="esx1"/>
       <failoverdomainnode name="esx2"/>
     </failoverdomain>
   </failoverdomains>
   <resources>
     <ip address="172.16.3.7" monitor_link="5"/>
   </resources>
   <service autostart="1" domain="webfailover" name="web" recovery="relocate">
     <ip ref="172.16.3.7"/>
   </service>
   <pvevm autostart="1" vmid="109"/>
 </rm>
</cluster>

Multiple methods for a node

Note: See also man fenced

In more advanced configurations, multiple fencing methods can be defined for a node. If fencing fails using the first method, fenced will try the next method, and continue to cycle through methods until one succeeds.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="myswitch" foo="x"/>
               </method>
               <method name="2">
               <device name="another" bar="123"/>
               </method>
               </fence>
       </clusternode>

       <fencedevices>
               <fencedevice name="myswitch" agent="..." something="..."/>
               <fencedevice name="another" agent="..."/>
       </fencedevices>

Dual path, redundant power

Note: See also man fenced

Sometimes fencing a node requires disabling two power ports or two i/o paths. This is done by specifying two or more devices within a method. fenced will run the agent for the device twice, once for each device line, and both must succeed for fencing to be considered successful.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="sanswitch1" port="11"/>
               <device name="sanswitch2" port="11"/>
               </method>
               </fence>
       </clusternode>

When using power switches to fence nodes with dual power supplies, the agents must be told to turn off both power ports before restoring power to either port. The default off-on behavior of the agent could result in the power never being fully disabled to the node.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="nps1" port="11" action="off"/>
               <device name="nps2" port="11" action="off"/>
               <device name="nps1" port="11" action="on"/>
               <device name="nps2" port="11" action="on"/>
               </method>
               </fence>
       </clusternode>

Test fencing

Before you use the fencing device, make sure that it works as expected. In my example configuration, the AP7921 uses the IP 192.168.2.30:

Display internal fenced state:

fence_tool ls

  fence domain
  member count  3
  victim count  0
  victim now    0
  master nodeid 3
  wait state    none
  members       2 3 4

Query the status of power supply:

fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o status -n 1 -v

Reboot the server using fence_apc:

fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o reboot -n 1 -v

Test fencing with fence_node:

fence_node NODENAME -vv

You should get a "success" here.

Testing Dell iDRAC