Revision as of 08:35, 4 October 2012

Note: Article about Proxmox VE 2.0

Introduction

To ensure data integrity, only one node is allowed to run a VM or any other cluster-service at a time. The use of power switches in the hardware configuration enables a node to power-cycle another node before restarting that node's HA services during a fail-over process. This prevents two nodes from simultaneously accessing the same data and corrupting it. Fence devices are used to guarantee data integrity under all failure conditions.

For a good easy introduction to HA fencing concepts device, see: http://www.clusterlabs.org/doc/crm_fencing.html and also http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/index.html

obviously read the most general and introductive parts of these docs, as they are referring to other HA software, not pve.

Configure nodes to boot immediately and always after power cycle

Check your bios settings and test if if works. Just unplug the power cord and test if the server boots up after reconnecting.

If you use integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing - here are the different options:

make sure that you did not installed acpid (remove with: aptitude remove acpid)
disable ACPI soft-off in the bios
disable via acpi=off to the kernel boot command line

In any case, you need to make sure that the node turns off immediately when fenced. If you have delays here, the HA resources cannot be moved.

Enable fencing on all nodes

In order to get fencing active, you also need to join each node to the fencing domain. Do the following on all your cluster nodes.

Enable fencing in /etc/default/redhat-cluster-pve (Just uncomment the last line, see below):

nano /etc/default/redhat-cluster-pve

FENCE_JOIN="yes"

join the fence domain with:

fence_tool join

To check the status, just run (this example shows all 3 nodes already joined):

fence_tool ls

fence domain
member count  3
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2 3

List of supported fence devices

APC Switch Rack PDU

E.g. AP7921, here is a example used in our test lab.

Create a user on the APC web interface

I just configured a new user via "Outlet User Management"

user name: hpapc
password: 12345678

Make sure that you enable "Outlet Access" and SSH and the most important part, make sure you connected the physical servers to the right power supply.

Example /etc/pve/cluster.conf.new with APC power fencing

This example uses the APC power switch as fencing device (make sure you enabled SSH on your APC). Additionally, a simple "TestIP" is used for HA service and fail-over testing.

cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new

nano /etc/pve/cluster.conf.new

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <fencedevices>
    <fencedevice agent="fence_apc" ipaddr="192.168.2.30" login="hpapc" name="apc" passwd="12345678" power_wait="10"/>
  </fencedevices>

  <clusternodes>

  <clusternode name="hp4" votes="1" nodeid="1">
    <fence>
      <method name="power">
        <device name="apc" port="4" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp1" votes="1" nodeid="2">
    <fence>
      <method name="power">
        <device name="apc" port="1" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp3" votes="1" nodeid="3">
    <fence>
      <method name="power">
        <device name="apc" port="3" secure="on"/>
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp2" votes="1" nodeid="4">
    <fence>
      <method name="power">
        <device name="apc" port="2" secure="on"/>
      </method>
    </fence>
  </clusternode>

  </clusternodes>

  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>

</cluster>

Note

If you edit this file via CLI, you need to increase ALWAYS the "config_version" number. This guarantees that the all nodes apply´s the new settings.

You should validate the config with the following command:

ccs_config_validate -v -f /etc/pve/cluster.conf.new

In order to apply this new config, you need to go to the web interface (Datacenter/HA). You can see the changes done and if the syntax is ok you can commit the changed via gui to all nodes. By doing this, all nodes gets the info about the new config and apply them automatically.

The power_wait option specifies how long to wait between performing a power action. Without it the server will be turned off, then on in quick succession. Setting this ensures that the server will be turned off for a certain amount of time before being turned back on resulting in more reliable fencing.

Intel Modular Server HA

Dell servers

You can use dell drac cards as fencing devices.

Your proxmox hosts need to have network access, through ssh to your dell drac cards.

This config was tested with DRAC V5 cards.

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>
  <fencedevices>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node1-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node2-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node3-drac" passwd="XXXX" secure="1"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="node1-drac"/>
        </method>
      </fence>
  </clusternode>
  <clusternode name="node2" nodeid="2" votes="1">
    <fence>
      <method name="1">
        <device name="node2-drac"/>
      </method>
    </fence>
  </clusternode>
  <clusternode name="node3" nodeid="3" votes="1">
    <fence>
      <method name="1">
        <device name="node3-drac"/>
      </method>
    </fence>
  </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>
</cluster>

For Dell iDRAC6 Cards you can basically use the same config as for DRAC5, but you need to change the lines

  <fencedevices>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node1-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node2-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" ipaddr="X.X.X.X" login="root" name="node3-drac" passwd="XXXX" secure="1"/>
  </fencedevices>

to

  <fencedevices>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node1-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node2-drac" passwd="XXXX" secure="1"/>
    <fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="X.X.X.X" login="root" name="node3-drac" passwd="XXXX" secure="1"/>
  </fencedevices>

Dell blade servers

PowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed.

NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See Bug 466788 for beta code and further discussions on this issue.

NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see Bug 496748.

To configure your nodes for DRAC CMC fencing:

For CMC IP Address enter the DRAC CMC IP address.
Enter the specific blade for Module Name. For example, enter server-1 for blade 1, and server-4 for blade 4.

Example:

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>
  <fencedevices>
       <fencedevice agent="fence_drac5" module_name="server-1" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade1" passwd="drac_password"/>
       <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade2" passwd="drac_password"/>
       <fencedevice agent="fence_drac5" module_name="server-2" ipaddr="CMC IP Address (X.X.X.X)" login="root" secure="1" name="drac-cmc-blade3" passwd="drac_password"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="drac-cmc-blade1"/>
        </method>
      </fence>
  </clusternode>
  <clusternode name="node2" nodeid="2" votes="1">
    <fence>
      <method name="1">
        <device name="drac-cmc-blade2"/>
      </method>
    </fence>
  </clusternode>
  <clusternode name="node3" nodeid="3" votes="1">
    <fence>
      <method name="1">
        <device name="drac-cmc-blade3"/>
      </method>
    </fence>
  </clusternode>
  </clusternodes>
  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>
</cluster>

IPMI (generic)

This is a generic method for IPMI

<?xml version="1.0"?>
<cluster name="clustername" config_version="6">
    <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
    </cman>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" name="ipmi1" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
        <fencedevice agent="fence_ipmilan" name="ipmi2" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
        <fencedevice agent="fence_ipmilan" name="ipmi3" lanplus="1" ipaddr="X.X.X.X" login="ipmiusername" passwd="ipmipassword" power_wait="5"/>
    </fencedevices>
    <clusternodes>
    <clusternode name="host1" votes="1" nodeid="1">
        <fence>
            <method name="1">
                 <device name="ipmi1"/>
            </method>
        </fence>
    </clusternode>
    <clusternode name="host2" votes="1" nodeid="2">
        <fence>
            <method name="1">
                 <device name="ipmi2"/>
            </method>
        </fence>
    </clusternode>
    <clusternode name="host3" votes="1" nodeid="3">
        <fence>
            <method name="1">
                 <device name="ipmi3"/>
            </method>
        </fence>
    </clusternode>
</clusternodes>
<rm>
    <service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
        <ip address="192.168.7.180"/>
    </service>
</rm>
</cluster>

APC Master Switch

Some old APC PDUs do not support SSH and do not work with fence_apc. These older units do work with SNMP allowing the fence agent fence_apc_snmp to work.

<?xml version="1.0"?>
<cluster name="hpcluster765" config_version="28">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <fencedevices>
    <fencedevice agent="fence_apc_snmp" ipaddr="192.168.2.30" name="apc" community="12345678" power_wait="10"/>

  </fencedevices>

  <clusternodes>

  <clusternode name="hp4" votes="1" nodeid="1">
    <fence>
      <method name="power">
        <device name="apc" port="4" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp1" votes="1" nodeid="2">
    <fence>
      <method name="power">
        <device name="apc" port="1" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp3" votes="1" nodeid="3">
    <fence>
      <method name="power">
        <device name="apc" port="3" />
      </method>
    </fence>
  </clusternode>

  <clusternode name="hp2" votes="1" nodeid="4">
    <fence>
      <method name="power">
        <device name="apc" port="2" />
      </method>
    </fence>
  </clusternode>

  </clusternodes>

  <rm>
    <service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
      <ip address="192.168.7.180"/>
    </service>
  </rm>

</cluster>

to be extended

tbd.

Multiple methods for a node

Note: See also man fenced

In more advanced configurations, multiple fencing methods can be defined for a node. If fencing fails using the first method, fenced will try the next method, and continue to cycle through methods until one succeeds.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="myswitch" foo="x"/>
               </method>
               <method name="2">
               <device name="another" bar="123"/>
               </method>
               </fence>
       </clusternode>

       <fencedevices>
               <fencedevice name="myswitch" agent="..." something="..."/>
               <fencedevice name="another" agent="..."/>
       </fencedevices>

Dual path, redundant power

Note: See also man fenced

Sometimes fencing a node requires disabling two power ports or two i/o paths. This is done by specifying two or more devices within a method. fenced will run the agent for the device twice, once for each device line, and both must succeed for fencing to be considered successful.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="sanswitch1" port="11"/>
               <device name="sanswitch2" port="11"/>
               </method>
               </fence>
       </clusternode>

When using power switches to fence nodes with dual power supplies, the agents must be told to turn off both power ports before restoring power to either port. The default off-on behavior of the agent could result in the power never being fully disabled to the node.

       <clusternode name="node1" nodeid="1">
               <fence>
               <method name="1">
               <device name="nps1" port="11" action="off"/>
               <device name="nps2" port="11" action="off"/>
               <device name="nps1" port="11" action="on"/>
               <device name="nps2" port="11" action="on"/>
               </method>
               </fence>
       </clusternode>

Test fencing

Before you use the fencing device, make sure that it works as expected. In my example configuration, the AP7921 uses the IP 192.168.2.30:

Query the status of power supply:

fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o status -n 1 -v

Reboot the server using fence_apc:

fence_apc -x -l hpapc -p 12345678 -a 192.168.2.30 -o reboot -n 1 -v

Test fencing with fence_node:

fence_node NODENAME -vv

You should get a "success" here.

Fencing: Difference between revisions

Revision as of 08:35, 4 October 2012

Contents

Introduction

Configure nodes to boot immediately and always after power cycle

Enable fencing on all nodes

List of supported fence devices

APC Switch Rack PDU

Create a user on the APC web interface

Example /etc/pve/cluster.conf.new with APC power fencing

Intel Modular Server HA

Dell servers

Dell blade servers

IPMI (generic)

APC Master Switch

to be extended

Multiple methods for a node

Dual path, redundant power

Test fencing

Navigation menu

Fencing: Difference between revisions

Revision as of 08:35, 4 October 2012

Introduction

Configure nodes to boot immediately and always after power cycle

Enable fencing on all nodes

List of supported fence devices

APC Switch Rack PDU

Create a user on the APC web interface

Example /etc/pve/cluster.conf.new with APC power fencing

Intel Modular Server HA

Dell servers

Dell blade servers

IPMI (generic)

APC Master Switch

to be extended

Multiple methods for a node

Dual path, redundant power

Test fencing

Navigation menu

Search