Multicast notes: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
mNo edit summary
 
(48 intermediate revisions by 12 users not shown)
Line 1: Line 1:
<div class="sticky-box notice-box">Proxmox VE 6 and newer uses <code>corosync</code> with <code>kronosnet</code> as communication layer, which <u>only supports unicast</u>. This article is only relevant for PVE 5 and older.</div>
== Introduction ==
== Introduction ==


Multicast allows a single transmission to be delivered to multiple servers at the same time.  
Multicast allows a single transmission to be delivered to multiple servers at the same time.  


This is the basis for cluster communications in Proxmox VE 2.0 and higher, which uses corosync and cman, and would apply to any other solution which utilizes those clustering tools.
This is the basis for cluster communications in Proxmox VE 2.0 to Proxmox VE 5.4. which uses corosync and cman, and would apply to any other solution which utilizes those clustering tools.
 
'''Note''': Proxmox VE 6.0 uses corosync 3 which switched out the underlying transport stack with Kronosnet (knet). Kronosnet currently only supports unicast.


If multicast does not work in your network infrastructure, you should fix it so that it does.  If all else fails, use unicast instead, but beware of the node count limitations with unicast.  
If multicast does not work in your network infrastructure, you should fix it so that it does.  If all else fails, use unicast instead, but beware of the node count limitations with unicast.  
Line 9: Line 12:
=== IGMP snooping ===
=== IGMP snooping ===


IGMP snooping prevents flooding multicast traffic to all ports in the broadcast domain by only allowing traffic destined for ports which have solicited such traffic.  IGMP snooping is a feature offered by most major switch manufacturers and is often enabled by default on switches.  In order for a switch to properly snoop the IGMP traffic, there must be an IGMP querier on the network.  If no querier is present, IGMP snooping will actively prevent ALL IGMP/Multicast traffic from being delivered!
IGMP snooping prevents flooding multicast traffic to all ports in the broadcast domain by only allowing traffic destined for ports which have solicited such traffic.  IGMP snooping is a feature offered by most major switch manufacturers and is often enabled by default on switches.  In order for a switch to properly snoop the IGMP traffic, there must be an IGMP querier on the network.  '''If no querier is present, IGMP snooping will actively prevent ALL IGMP/Multicast traffic from being delivered!'''


If IGMP snooping is disabled, all multicast traffic will be delivered to all ports which may add unnecessary load, potentially allowing a denial of service attack.
If IGMP snooping is disabled, all multicast traffic will be delivered to all ports which may add unnecessary load, potentially allowing a denial of service attack.
Line 26: Line 29:


Juniper EX switches, by default, enable IGMP snooping on all vlans as can be seen by this config snippet:
Juniper EX switches, by default, enable IGMP snooping on all vlans as can be seen by this config snippet:
  <nowiki>
  <pre>
[edit protocols]
[edit protocols]
user@switch# show igmp-snooping
user@switch# show igmp-snooping
vlan all;
vlan all;
</nowiki>
</pre>


However, IGMP querier is not enabled by default.  If you are using RVIs (Routed Virtual Interfaces) on your switch already, you can enabled IGMP v2 on the interface which enables the querier.  However, most administrators do not use RVIs in all vlans on their switches and should be configured instead on the router.  The below config setting is the same on Juniper EX switches using RVIs as it is on Juniper SRX service gateways/routers, and effectively enables IGMP querier on the specified interface/vlan.  Note you must set this on all vlans which require multicast!:
However, IGMP querier is not enabled by default.  If you are using RVIs (Routed Virtual Interfaces) on your switch already, you can enabled IGMP v2 on the interface which enables the querier.  However, most administrators do not use RVIs in all vlans on their switches and should be configured instead on the router.  The below config setting is the same on Juniper EX switches using RVIs as it is on Juniper SRX service gateways/routers, and effectively enables IGMP querier on the specified interface/vlan.  Note you must set this on all vlans which require multicast!:
<nowiki>
<pre>
set protocols igmp $iface version 2
set protocols igmp $iface version 2
</nowiki>
</pre>


==== Cisco ====
==== Cisco ====


==== Brocade ====
On Cisco switches, IGMP snooping is enabled by default. You do have to enable an IGMP snooping querier though:
<pre>
ip igmp snooping querier
</pre>


==== Linux: Enabling Multicast querier on bridges ====
This will enable it for all vlans. You can verify that it is enabled:
If your router or switch does not support enabling a multicast querier, and you are using a classic linux bridge (not Open vSwitch), then you can enable the multicast querier on the Linux bridge by adding this statement to your /etc/network/interfaces bridge configuration:
<pre>
<nowiki>
show ip igmp snooping querier
   post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
Vlan      IP Address              IGMP Version   Port           
</nowiki>
-------------------------------------------------------------
1         172.16.34.4              v2            Switch                 
2        172.16.34.4              v2            Switch                 
3        172.16.34.4              v2            Switch                 
</pre>


=== Disabling IGMP Snooping (not recommended) ===
==== HP - ProCurve ====


==== Juniper - JunOS ====
HP Procurve switches, by default, has disabled IGMP on all vlans as can be seen by this config snippet:
  <nowiki>
  <pre>
set protocols igmp-snooping vlan all disable
# show ip igmp
</nowiki>
</pre>


==== Cisco Managed Switches  ====
Likewise, IGMP querier is also not enabled by default. When IGMP is enabled on a vlan ProCurve will negotiate with other devices for which to be querier and according to RFC the device with the lowest IP will win.
<nowiki>
Note you must set this on all vlans which require multicast! (vlan 30 used for demo):
<pre>
# conf t
# conf t
# no ip igmp snooping
(config)# vlan 30
</nowiki>
(vlan-30)# ip igmp high-priority-forward
</pre>


==== Netgear Managed Switches  ====
To verify:
<pre>
# sh ip igmp 30       


the following are pics of setting to get multicast working on our netgear 7300 series switches. for more information see http://documentation.netgear.com/gs700at/enu/202-10360-01/GS700AT%20Series%20UG-06-18.html
Status and Counters - IP Multicast (IGMP) Status


<br> [[Image:Multicast-netgear-1.png]]
VLAN ID : 30
VLAN Name : Proxmox
Querier Address : This switch is Querier


[[Image:Multicast-netgear-2.png]]
  Active Group Addresses Reports Queries Querier Access Port
  ---------------------- ------- ------- -------------------
  239.192.105.237        214020  0                         
</pre>
====Netgear====
Using web/gui:


[[Image:Multicast-netgear-3.png]]
=====Per VLAN=====
Enable IGMP snooping.


[[File:NetGear-multicast-save-and-apply.png]]
Enable IGMP snooping on your VLANs under the IGMP VLAN configuration.


== Multicast with Infiniband ==
Enable multicast router mode on the ports that uplinks to the other switches.


IP over Infiniband (IPoIB) supports Multicast but Multicast traffic is limited to 2044 Bytes when using connected mode even if you set a larger MTU on the IPoIB interface.
Enable IGMP Querier


Corosync has a setting, netmtu, that defaults to 1500 making it compatible with connected mode Infiniband.  
Leave the global address at 0.0.0.0.


=== Changing netmtu ===
Set instead a Querier IP address per VLAN under the Querier VLAN configuration (VLAN10=1.1.1.10 and VLAN15=1.1.1.15.


Changing the netmtu can increase throughput '''The following information is untested.'''
Next switch VLAN10=2.2.2.10 and VLAN15=2.2.2.15, etc).


Edit the /etc/pve/cluster.conf file Add the section: <source lang="xml">
Make sure “Querier Election Participation Mode” is enabled for each VLAN.
<totem netmtu="2044" />
</source>


<br> <source lang="xml">
OR
<?xml version="1.0"?>
<cluster name="clustername" config_version="2">
  <totem netmtu="2044" />
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>


  <clusternodes>
=====Global=====
  <clusternode name="node1" votes="1" nodeid="1"/>
Enable IGMP snooping.
  <clusternode name="node2" votes="1" nodeid="2"/>
  <clusternode name="node3" votes="1" nodeid="3"/></clusternodes>


</cluster>
Enable IGMP snooping on your ports under the IGMP interface configuration.
</source>


<br>
Enable multicast router mode on the ports that uplinks to the other switches.


== Testing multicast ==
Enable IGMP Querier


not all hosting companies allow multicast traffic.  
Set a global Querier IP address (1.1.1.1 and next switch 2.2.2.2, etc.)


Some switches have multicast disabled by default.
==== Brocade ====


=== test if multicast is working between two nodes with omping ===
==== Linux: Enabling Multicast querier on bridges ====
If your router or switch does not support enabling a multicast querier, and you are using a classic linux bridge (not Open vSwitch), then you can enable the multicast querier on the Linux bridge by adding this statement to your /etc/network/interfaces bridge configuration:
<pre>
  post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
</pre>


aptitude install omping
=== Disabling IGMP Snooping (not recommended) ===


start omping on all nodes with the following command and check the output, e.g:
==== Juniper - JunOS ====
<pre>
set protocols igmp-snooping vlan all disable
</pre>


  omping node1 node2 node3
==== Cisco Managed Switches ====
<pre>
# conf t
# no ip igmp snooping
</pre>


=== test if multicast is working between two nodes with ssmping ===
==== HP - ProCurve ====
Disabling IGMP must be done on every vlan where it is enabled.
<pre>
# conf t
(config)# vlan 30
(vlan-30)# no ip igmp
</pre>


Copied from a post by e100 on forum .
==== Linux: Disabling Multicast snooping on bridges ====


*this uses '''ssmping'''
Snooping should be enabled on either the router / switch or on the linux bridge, but it may not work if enabled on both.  If you have a hosting provider that has igmp snooping enabled on the multicast switch, it may be necessary to disable snooping on the linux bridge.  In that case use:
<pre>
  post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
  post-up ( echo 0 > /sys/class/net/$IFACE/bridge/multicast_snooping )
</pre>


Install this on all nodes .
== Multicast with Infiniband ==


aptitude install ssmping
IP over Infiniband (IPoIB) supports Multicast but Multicast traffic is limited to 2043 Bytes when using connected mode even if you set a larger MTU on the IPoIB interface.


run this on Node A:
Corosync has a setting, netmtu, that defaults to 1500 making it compatible with connected mode Infiniband.


ssmpingd
== Using omping to test multicast ==


then on Node B:  
start omping on all nodes with the following command and check the output, e.g:
this is the precise version it sends 10000 packets in a interval of 1ms 
omping -c 10000 -i 0.001 -F -q node1 node2 node3
crude with tons of detail
omping node1 node2 node3
find the multicast address on proxmox 4.X run this:
corosync-cmapctl -g totem.interface.0.mcastaddr
then use muticast address
  omping -m yourmulticastadress node1 node2 node3


asmping 224.0.2.1 ip_for_NODE_A_here
== Troubleshooting ==
 
=== Diagnosis from first principles ===
example output
These instructions assume you aren't using unicast UDP (transport="udpu"); I've tried to note where that will make a difference.
<pre>asmping joined (S,G) = (*,224.0.2.234)
pinging 192.168.8.6 from 192.168.8.5
  unicast from 192.168.8.6, seq=1 dist=0 time=0.221 ms
  unicast from 192.168.8.6, seq=2 dist=0 time=0.229 ms
multicast from 192.168.8.6, seq=2 dist=0 time=0.261 ms
  unicast from 192.168.8.6, seq=3 dist=0 time=0.198 ms
multicast from 192.168.8.6, seq=3 dist=0 time=0.213 ms
  unicast from 192.168.8.6, seq=4 dist=0 time=0.234 ms
multicast from 192.168.8.6, seq=4 dist=0 time=0.248 ms
  unicast from 192.168.8.6, seq=5 dist=0 time=0.249 ms
multicast from 192.168.8.6, seq=5 dist=0 time=0.263 ms
  unicast from 192.168.8.6, seq=6 dist=0 time=0.250 ms
multicast from 192.168.8.6, seq=6 dist=0 time=0.264 ms
  unicast from 192.168.8.6, seq=7 dist=0 time=0.245 ms
multicast from 192.168.8.6, seq=7 dist=0 time=0.260 ms
</pre>
for more information see
 
man ssmping
 
and
 
less /usr/share/doc/ssmping/README.gz
 
=== ssmping notes ===
 
*there are a few other programs included in ssmping which may be of use. here is a list of the files in the package:


apt-file list ssmping
If you are already experiencing issues, the steps taken to diagnose the problem may make the problem worse in the short term.
<pre>ssmping: /usr/bin/asmping
ssmping: /usr/bin/mcfirst
ssmping: /usr/bin/ssmping
ssmping: /usr/bin/ssmpingd
ssmping: /usr/share/doc/ssmping/README.gz
ssmping: /usr/share/doc/ssmping/changelog.Debian.gz
ssmping: /usr/share/doc/ssmping/copyright
ssmping: /usr/share/man/man1/asmping.1.gz
ssmping: /usr/share/man/man1/mcfirst.1.gz
ssmping: /usr/share/man/man1/ssmping.1.gz
ssmping: /usr/share/man/man1/ssmpingd.1.gz
</pre>
*If you want to use apt-file do this:


aptitude install apt-file
If you have poor-quality (or even just misconfigured) ethernet switches, some of these tests may crash your entire network, but at least then you'll know where the source of the problem is...
apt-file update


then set up a cronjob to do ''apt-file update'' weekly or monthly ..
# Ensure all the nodes are in the same subnet.
## If you aren't clear on networking, this boils down to: ''do all your nodes use the same IP address for their default gateway?''<br />
## If you are deliberately using UDPU transport, this is not a hard requirement, but even in that case, having your hosts in the same subnet will make your task significantly easier.
# Ensure all the nodes can (unicast) ping each other without any packet loss at moderately-high packet rates.
## Test using "ping -f".
## Your network needs to be robust enough to have '''''all''''' your nodes flood-pinging each other simultaneously with < 1% packet loss.
# Ensure all the nodes can resolve each other's hostnames.
## The previous test should have taken care of this if you used hostnames instead of IP addresses.
## Otherwise use nslookup(1) or dig(1) to test DNS, or host(1) or ping(1) to test if you're relying on /etc/hosts.
## Theoretically, this shouldn't matter if you're using multicast, but not having this right will likely cause hard-to-diagnose issues later.
# Ensure multicast works at high packet rates.  ''This does not apply if you are deliberately using UDPU.''
## Run omping see below. 
## You may want to use a parallel-SSH client of some sort to ensure omping starts up almost simultaneously on every node.  This will cause each host to send a multicast packet once per millisecond. 
## If this causes your ethernet switch to fail, consider upgrading your switch. 
## The final "%loss" number should be < 1%.
# Ensure multicast works for > 5 minutes at a time.  ''This does not apply if you are deliberately using UDPU.''
## Run "omping -c 600 -i 1 -q <list of all nodes>" on every node simultaneously (see above). 
## This test should take ten (10) minutes to run, which is twice as long as the default IGMPv2 leave timer, thus proving that IGMP snooping isn't the source of any problem.


== Troubleshooting ==
If all of these tests have succeeded, and you are starting with freshly-installed Proxmox VE nodes, you should be able to form a multicast cluster without any issues.  See below for further notes on UDPU.


=== cman & iptables ===
=== Use unicast (UDPU) instead of multicast, if all else fails ===
In case ''cman'' crashes with ''cpg_send_message failed: 9'' add those to your rule set:
<pre>iptables -A INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
iptables -A INPUT -p udp -m state --state NEW -m multiport –dports 5404,5405 -j ACCEPT
</pre>


=== Use unicast instead of multicast (if all else fails) ===
Unicast is a technology for sending messages to a single network destination. In corosync, unicast is implemented as UDP-unicast (UDPU). Due to increased network traffic (compared to multicast) the number of supported nodes is limited, do not use it with more that 4 cluster nodes.


Unicast is a technology for sending messages to a single network destination. In corosync, unicast is implemented as UDP-unicast (UDPU). Due to increased network traffic (compared to multicast) the number of supported nodes is limited, do not use it with more that 4 cluster nodes.  
FYI: OVH is a good example of a hosting company where you may need to use UDPU instead of multicast, as your hosts will generally not be able to send or receive multicast traffic to/from each other.  This author wishes he knew what their network engineers were smoking, but since their network works well despite its strangeness, it must have have been something '''really''' good.


*just create the cluster as usual (pvecm create ...)  
* Carefully read the entire corosync.conf(5) and votequorum(5) manpages.
*follow this howto to create a cluster.conf.new [[Fencing#General_HowTo_for_editing_the_cluster.conf]]
* create the cluster as usual
*add the new '''transport="udpu"''' in /etc/pve/cluster.conf.new
* if needed, bring the initial node into quorate state with "pvecm e 1"
* if needed, [[Editing_corosync.conf|edit /etc/pve/corosync.conf]] (remember increasing the version number!); it will later be auto-copied to /etc/corosync/corosync.conf on each node by one of the PVE services, where in turn it will be copied into the local /etc/corosync/corosync.conf [[https://forum.proxmox.com/threads/roles-of-the-different-corosync-conf-files-in-a-cluster.26894/]].
* in the totem{} stanza, add "transport: udpu"
* pre-add the nodes to the nodelist{} stanza.
* on each node : systemctl restart corosync  (if this command does not work, use killall -9 corosync )
* then, on each node : /etc/init.d/pve-cluster restart


<source lang="xml"><cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/></source>
'''Important Note:''' if the nodes are not in the same subnet, you will probably also have to edit '''bindnetaddr''' in the totem stanza and change it to "0.0.0.0" for the cluster to initialize.  It defaults to the IP of the first cluster member, and any other members in the same subnet will be able to initialize, but '''''members in a different subnet will see corosync unable to initialize''''' because it can't figure out an IP address to bind to.  There may be security implications to do allowing corosync to bind to the wildcard address.
Simply commenting out the bindnetaddr line may also work equally well, then corosync will figure it out dynamically on each node.


*activate via GUI
[[Category:Troubleshooting]]
*add all nodes you want to join in /etc/hosts and reboot
*before you add a node, make sure you add all other nodes in /etc/hosts

Latest revision as of 12:57, 24 July 2024

Introduction

Multicast allows a single transmission to be delivered to multiple servers at the same time.

This is the basis for cluster communications in Proxmox VE 2.0 to Proxmox VE 5.4. which uses corosync and cman, and would apply to any other solution which utilizes those clustering tools.

Note: Proxmox VE 6.0 uses corosync 3 which switched out the underlying transport stack with Kronosnet (knet). Kronosnet currently only supports unicast.

If multicast does not work in your network infrastructure, you should fix it so that it does. If all else fails, use unicast instead, but beware of the node count limitations with unicast.

IGMP snooping

IGMP snooping prevents flooding multicast traffic to all ports in the broadcast domain by only allowing traffic destined for ports which have solicited such traffic. IGMP snooping is a feature offered by most major switch manufacturers and is often enabled by default on switches. In order for a switch to properly snoop the IGMP traffic, there must be an IGMP querier on the network. If no querier is present, IGMP snooping will actively prevent ALL IGMP/Multicast traffic from being delivered!

If IGMP snooping is disabled, all multicast traffic will be delivered to all ports which may add unnecessary load, potentially allowing a denial of service attack.

IGMP querier

An IGMP querier is a multicast router that generates IGMP queries. IGMP snooping relies on these queries which are unconditionally forwarded to all ports, as the replies from the destination ports is what builds the internal tables in the switch to allow it to know which traffic to forward.

IGMP querier can be enabled on your router, switch, or even linux bridges.

Configuring IGMP/Multicast

Ensuring IGMP Snooping and Querier are enabled on your network (recommended)

Juniper - JunOS

Juniper EX switches, by default, enable IGMP snooping on all vlans as can be seen by this config snippet:

[edit protocols]
user@switch# show igmp-snooping
vlan all;

However, IGMP querier is not enabled by default. If you are using RVIs (Routed Virtual Interfaces) on your switch already, you can enabled IGMP v2 on the interface which enables the querier. However, most administrators do not use RVIs in all vlans on their switches and should be configured instead on the router. The below config setting is the same on Juniper EX switches using RVIs as it is on Juniper SRX service gateways/routers, and effectively enables IGMP querier on the specified interface/vlan. Note you must set this on all vlans which require multicast!:

set protocols igmp $iface version 2

Cisco

On Cisco switches, IGMP snooping is enabled by default. You do have to enable an IGMP snooping querier though:

ip igmp snooping querier

This will enable it for all vlans. You can verify that it is enabled:

show ip igmp snooping querier 
Vlan      IP Address               IGMP Version   Port             
-------------------------------------------------------------
1         172.16.34.4              v2            Switch                   
2         172.16.34.4              v2            Switch                   
3         172.16.34.4              v2            Switch                   

HP - ProCurve

HP Procurve switches, by default, has disabled IGMP on all vlans as can be seen by this config snippet:

# show ip igmp

Likewise, IGMP querier is also not enabled by default. When IGMP is enabled on a vlan ProCurve will negotiate with other devices for which to be querier and according to RFC the device with the lowest IP will win. Note you must set this on all vlans which require multicast! (vlan 30 used for demo):

# conf t
(config)# vlan 30
(vlan-30)# ip igmp high-priority-forward

To verify:

# sh ip igmp 30        

 Status and Counters - IP Multicast (IGMP) Status

 VLAN ID : 30
 VLAN Name : Proxmox
 Querier Address : This switch is Querier

  Active Group Addresses Reports Queries Querier Access Port
  ---------------------- ------- ------- -------------------
  239.192.105.237        214020  0                          

Netgear

Using web/gui:

Per VLAN

Enable IGMP snooping.

Enable IGMP snooping on your VLANs under the IGMP VLAN configuration.

Enable multicast router mode on the ports that uplinks to the other switches.

Enable IGMP Querier

Leave the global address at 0.0.0.0.

Set instead a Querier IP address per VLAN under the Querier VLAN configuration (VLAN10=1.1.1.10 and VLAN15=1.1.1.15.

Next switch VLAN10=2.2.2.10 and VLAN15=2.2.2.15, etc).

Make sure “Querier Election Participation Mode” is enabled for each VLAN.

OR

Global

Enable IGMP snooping.

Enable IGMP snooping on your ports under the IGMP interface configuration.

Enable multicast router mode on the ports that uplinks to the other switches.

Enable IGMP Querier

Set a global Querier IP address (1.1.1.1 and next switch 2.2.2.2, etc.)

Brocade

Linux: Enabling Multicast querier on bridges

If your router or switch does not support enabling a multicast querier, and you are using a classic linux bridge (not Open vSwitch), then you can enable the multicast querier on the Linux bridge by adding this statement to your /etc/network/interfaces bridge configuration:

  post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )

Disabling IGMP Snooping (not recommended)

Juniper - JunOS

set protocols igmp-snooping vlan all disable

Cisco Managed Switches

# conf t
# no ip igmp snooping

HP - ProCurve

Disabling IGMP must be done on every vlan where it is enabled.

# conf t
(config)# vlan 30
(vlan-30)# no ip igmp

Linux: Disabling Multicast snooping on bridges

Snooping should be enabled on either the router / switch or on the linux bridge, but it may not work if enabled on both. If you have a hosting provider that has igmp snooping enabled on the multicast switch, it may be necessary to disable snooping on the linux bridge. In that case use:

  post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
  post-up ( echo 0 > /sys/class/net/$IFACE/bridge/multicast_snooping )

Multicast with Infiniband

IP over Infiniband (IPoIB) supports Multicast but Multicast traffic is limited to 2043 Bytes when using connected mode even if you set a larger MTU on the IPoIB interface.

Corosync has a setting, netmtu, that defaults to 1500 making it compatible with connected mode Infiniband.

Using omping to test multicast

start omping on all nodes with the following command and check the output, e.g: this is the precise version it sends 10000 packets in a interval of 1ms

omping -c 10000 -i 0.001 -F -q node1 node2 node3

crude with tons of detail

omping node1 node2 node3

find the multicast address on proxmox 4.X run this:

corosync-cmapctl -g totem.interface.0.mcastaddr

then use muticast address

 omping -m yourmulticastadress node1 node2 node3

Troubleshooting

Diagnosis from first principles

These instructions assume you aren't using unicast UDP (transport="udpu"); I've tried to note where that will make a difference.

If you are already experiencing issues, the steps taken to diagnose the problem may make the problem worse in the short term.

If you have poor-quality (or even just misconfigured) ethernet switches, some of these tests may crash your entire network, but at least then you'll know where the source of the problem is...

  1. Ensure all the nodes are in the same subnet.
    1. If you aren't clear on networking, this boils down to: do all your nodes use the same IP address for their default gateway?
    2. If you are deliberately using UDPU transport, this is not a hard requirement, but even in that case, having your hosts in the same subnet will make your task significantly easier.
  2. Ensure all the nodes can (unicast) ping each other without any packet loss at moderately-high packet rates.
    1. Test using "ping -f".
    2. Your network needs to be robust enough to have all your nodes flood-pinging each other simultaneously with < 1% packet loss.
  3. Ensure all the nodes can resolve each other's hostnames.
    1. The previous test should have taken care of this if you used hostnames instead of IP addresses.
    2. Otherwise use nslookup(1) or dig(1) to test DNS, or host(1) or ping(1) to test if you're relying on /etc/hosts.
    3. Theoretically, this shouldn't matter if you're using multicast, but not having this right will likely cause hard-to-diagnose issues later.
  4. Ensure multicast works at high packet rates. This does not apply if you are deliberately using UDPU.
    1. Run omping see below.
    2. You may want to use a parallel-SSH client of some sort to ensure omping starts up almost simultaneously on every node. This will cause each host to send a multicast packet once per millisecond.
    3. If this causes your ethernet switch to fail, consider upgrading your switch.
    4. The final "%loss" number should be < 1%.
  5. Ensure multicast works for > 5 minutes at a time. This does not apply if you are deliberately using UDPU.
    1. Run "omping -c 600 -i 1 -q <list of all nodes>" on every node simultaneously (see above).
    2. This test should take ten (10) minutes to run, which is twice as long as the default IGMPv2 leave timer, thus proving that IGMP snooping isn't the source of any problem.

If all of these tests have succeeded, and you are starting with freshly-installed Proxmox VE nodes, you should be able to form a multicast cluster without any issues. See below for further notes on UDPU.

Use unicast (UDPU) instead of multicast, if all else fails

Unicast is a technology for sending messages to a single network destination. In corosync, unicast is implemented as UDP-unicast (UDPU). Due to increased network traffic (compared to multicast) the number of supported nodes is limited, do not use it with more that 4 cluster nodes.

FYI: OVH is a good example of a hosting company where you may need to use UDPU instead of multicast, as your hosts will generally not be able to send or receive multicast traffic to/from each other. This author wishes he knew what their network engineers were smoking, but since their network works well despite its strangeness, it must have have been something really good.

  • Carefully read the entire corosync.conf(5) and votequorum(5) manpages.
  • create the cluster as usual
  • if needed, bring the initial node into quorate state with "pvecm e 1"
  • if needed, edit /etc/pve/corosync.conf (remember increasing the version number!); it will later be auto-copied to /etc/corosync/corosync.conf on each node by one of the PVE services, where in turn it will be copied into the local /etc/corosync/corosync.conf [[1]].
  • in the totem{} stanza, add "transport: udpu"
  • pre-add the nodes to the nodelist{} stanza.
  • on each node : systemctl restart corosync (if this command does not work, use killall -9 corosync )
  • then, on each node : /etc/init.d/pve-cluster restart

Important Note: if the nodes are not in the same subnet, you will probably also have to edit bindnetaddr in the totem stanza and change it to "0.0.0.0" for the cluster to initialize. It defaults to the IP of the first cluster member, and any other members in the same subnet will be able to initialize, but members in a different subnet will see corosync unable to initialize because it can't figure out an IP address to bind to. There may be security implications to do allowing corosync to bind to the wildcard address. Simply commenting out the bindnetaddr line may also work equally well, then corosync will figure it out dynamically on each node.