Multicast notes

From Proxmox VE
Revision as of 10:03, 26 February 2017 by Bread-baker (talk | contribs)
Jump to navigation Jump to search

Introduction

Multicast allows a single transmission to be delivered to multiple servers at the same time.

This is the basis for cluster communications in Proxmox VE 2.0 and higher, which uses corosync and cman, and would apply to any other solution which utilizes those clustering tools.

If multicast does not work in your network infrastructure, you should fix it so that it does. If all else fails, use unicast instead, but beware of the node count limitations with unicast.

IGMP snooping

IGMP snooping prevents flooding multicast traffic to all ports in the broadcast domain by only allowing traffic destined for ports which have solicited such traffic. IGMP snooping is a feature offered by most major switch manufacturers and is often enabled by default on switches. In order for a switch to properly snoop the IGMP traffic, there must be an IGMP querier on the network. If no querier is present, IGMP snooping will actively prevent ALL IGMP/Multicast traffic from being delivered!

If IGMP snooping is disabled, all multicast traffic will be delivered to all ports which may add unnecessary load, potentially allowing a denial of service attack.

IGMP querier

An IGMP querier is a multicast router that generates IGMP queries. IGMP snooping relies on these queries which are unconditionally forwarded to all ports, as the replies from the destination ports is what builds the internal tables in the switch to allow it to know which traffic to forward.

IGMP querier can be enabled on your router, switch, or even linux bridges.

Configuring IGMP/Multicast

Ensuring IGMP Snooping and Querier are enabled on your network (recommended)

Juniper - JunOS

Juniper EX switches, by default, enable IGMP snooping on all vlans as can be seen by this config snippet:

[edit protocols]
user@switch# show igmp-snooping
vlan all;

However, IGMP querier is not enabled by default. If you are using RVIs (Routed Virtual Interfaces) on your switch already, you can enabled IGMP v2 on the interface which enables the querier. However, most administrators do not use RVIs in all vlans on their switches and should be configured instead on the router. The below config setting is the same on Juniper EX switches using RVIs as it is on Juniper SRX service gateways/routers, and effectively enables IGMP querier on the specified interface/vlan. Note you must set this on all vlans which require multicast!:

set protocols igmp $iface version 2

Cisco

On Cisco switches, IGMP snooping is enabled by default. You do have to enable an IGMP snooping querier though:

ip igmp snooping querier

This will enable it for all vlans. You can verify that it is enabled:

show ip igmp snooping querier 
Vlan      IP Address               IGMP Version   Port             
-------------------------------------------------------------
1         172.16.34.4              v2            Switch                   
2         172.16.34.4              v2            Switch                   
3         172.16.34.4              v2            Switch                   

HP - ProCurve

HP Procurve switches, by default, has disabled IGMP on all vlans as can be seen by this config snippet:

# show ip igmp

Likewise, IGMP querier is also not enabled by default. When IGMP is enabled on a vlan ProCurve will negotiate with other devices for which to be querier and according to RFC the device with the lowest IP will win. Note you must set this on all vlans which require multicast! (vlan 30 used for demo):

# conf t
(config)# vlan 30
(vlan-30)# ip igmp high-priority-forward

To verify:

# sh ip igmp 30        

 Status and Counters - IP Multicast (IGMP) Status

 VLAN ID : 30
 VLAN Name : Proxmox
 Querier Address : This switch is Querier

  Active Group Addresses Reports Queries Querier Access Port
  ---------------------- ------- ------- -------------------
  239.192.105.237        214020  0                          

Brocade

Linux: Enabling Multicast querier on bridges

If your router or switch does not support enabling a multicast querier, and you are using a classic linux bridge (not Open vSwitch), then you can enable the multicast querier on the Linux bridge by adding this statement to your /etc/network/interfaces bridge configuration:

  post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )

Disabling IGMP Snooping (not recommended)

Juniper - JunOS

set protocols igmp-snooping vlan all disable

Cisco Managed Switches

# conf t
# no ip igmp snooping

HP - ProCurve

Disabling IGMP must be done on every vlan where it is enabled.

# conf t
(config)# vlan 30
(vlan-30)# no ip igmp

Netgear Managed Switches

the following are pics of setting to get multicast working on our netgear 7300 series switches. for more information see http://documentation.netgear.com/gs700at/enu/202-10360-01/GS700AT%20Series%20UG-06-18.html


Multicast-netgear-1.png

Multicast-netgear-2.png

Multicast-netgear-3.png

NetGear-multicast-save-and-apply.png

Multicast with Infiniband

IP over Infiniband (IPoIB) supports Multicast but Multicast traffic is limited to 2043 Bytes when using connected mode even if you set a larger MTU on the IPoIB interface.

Corosync has a setting, netmtu, that defaults to 1500 making it compatible with connected mode Infiniband.

Using omping to test multicast

start omping on all nodes with the following command and check the output, e.g: this is the precise version it sends 10000 packets in a interval of 1ms

omping -c 10000 -i 0.001 -F -q node1 node2 node3

crude with tons of detail

omping node1 node2 node3

find the multicast address on proxmox 4.X run this:

corosync-cmapctl -g totem.interface.0.mcastaddr

then use muticast address

 omping -m yourmulticastadress node1 node2 node3

Troubleshooting

Diagnosis from first principles

These instructions assume you aren't using unicast UDP (transport="udpu"); I've tried to note where that will make a difference.

If you are already experiencing issues, the steps taken to diagnose the problem may make the problem worse in the short term.

If you have poor-quality (or even just misconfigured) ethernet switches, some of these tests may crash your entire network, but at least then you'll know where the source of the problem is...

  1. Ensure all the nodes are in the same subnet.
    1. If you aren't clear on networking, this boils down to: do all your nodes use the same IP address for their default gateway?
    2. If you are deliberately using UDPU transport, this is not a hard requirement, but even in that case, having your hosts in the same subnet will make your task significantly easier.
  2. Ensure all the nodes can (unicast) ping each other without any packet loss at moderately-high packet rates.
    1. Test using "ping -f".
    2. Your network needs to be robust enough to have all your nodes flood-pinging each other simultaneously with < 1% packet loss.
  3. Ensure all the nodes can resolve each other's hostnames.
    1. The previous test should have taken care of this if you used hostnames instead of IP addresses.
    2. Otherwise use nslookup(1) or dig(1) to test DNS, or host(1) or ping(1) to test if you're relying on /etc/hosts.
    3. Theoretically, this shouldn't matter if you're using multicast, but not having this right will likely cause hard-to-diagnose issues later.
  4. Ensure multicast works at high packet rates. This does not apply if you are deliberately using UDPU.
    1. Run omping see below.
    2. You may want to use a parallel-SSH client of some sort to ensure omping starts up almost simultaneously on every node. This will cause each host to send a multicast packet once per millisecond.
    3. If this causes your ethernet switch to fail, consider upgrading your switch.
    4. The final "%loss" number should be < 1%.
  5. Ensure multicast works for > 5 minutes at a time. This does not apply if you are deliberately using UDPU.
    1. Run "omping -c 600 -i 1 -q <list of all nodes>" on every node simultaneously (see above).
    2. This test should take ten (10) minutes to run, which is twice as long as the default IGMPv2 leave timer, thus proving that IGMP snooping isn't the source of any problem.

If all of these tests have succeeded, and you are starting with freshly-installed Proxmox VE nodes, you should be able to form a multicast cluster without any issues. See below for further notes on UDPU.

Use unicast (UDPU) instead of multicast, if all else fails

Unicast is a technology for sending messages to a single network destination. In corosync, unicast is implemented as UDP-unicast (UDPU). Due to increased network traffic (compared to multicast) the number of supported nodes is limited, do not use it with more that 4 cluster nodes.

FYI: OVH is a good example of a hosting company where you may need to use UDPU instead of multicast, as your hosts will generally not be able to send or receive multicast traffic to/from each other. This author wishes he knew what their network engineers were smoking, but since their network works well despite its strangeness, it must have have been something really good.

  • Carefully read the entire corosync.conf(5) and votequorum(5) manpages.
  • create the cluster as usual
  • if needed, bring the initial node into quorate state with "pvecm e 1"
  • if needed, edit /etc/pve/corosync.conf (remember increasing the version number!); it will later be auto-copied to /etc/corosync/corosync.conf on each node by one of the PVE services, where in turn it will be copied into the local /etc/corosync/corosync.conf [[1]].
  • in the totem{} stanza, add "transport: udpu"
  • pre-add the nodes to the nodelist{} stanza.
  • reboot the node (there's probably an easier way, feel free to update this page if you know how)
  • join the other nodes to the cluster

Important Note: if the nodes are not in the same subnet, you will probably also have to edit bindnetaddr in the totem stanza and change it to "0.0.0.0" for the cluster to initialize. It defaults to the IP of the first cluster member, and any other members in the same subnet will be able to initialize, but members in a different subnet will see corosync unable to initialize because it can't figure out an IP address to bind to. There may be security implications to do allowing corosync to bind to the wildcard address. Simply commenting out the bindnetaddr line may also work equally well, then corosync will figure it out dynamically on each node.