[PVE-User] Multicast problems with Intel X540 - 10Gtek network card?

Stefan M. Radman smr at kmi.com
Tue Dec 4 23:50:41 CET 2018


Don't put your corosync traffic on bridges.
Dedicate an untagged interface on each node for corosync.

All you need for your cluster network is this:

auto eth3
iface eth3 inet static
    address  192.168.10.201
    netmask  255.255.255.0
#corosync ring0

Put that interface into an isolated VLAN with IGMP snooping enabled.
Prune that VLAN from all trunks to limit its extent and your troubles.

Stefan


On Dec 4, 2018, at 8:03 PM, Ronny Aasen <ronny+pve-user at aasen.cx<mailto:ronny+pve-user at aasen.cx>> wrote:

vmbr10 is a bridge (or as switch by another name)
if you want the switch to work reliably with multicast you probably need to enable multicast querier.
|echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier|

or you can disable snooping, so that it treats multicast as broadcast. |
echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping|

this problem with multicast traffic also may lead to unreliable ipv6 nd  and nd-ra usage.
https://pve.proxmox.com/wiki/Multicast_notes have some more notes and exampes around mulicast_querier

kind regards
Ronny Aasen



On 04.12.2018 17:54, Eneko Lacunza wrote:
Hi all,

Seems I found the solution.

eth3 on proxmox1 is a broadcom 1gbit card connected to HPE switch; it is VLAN 10 untagged on the switch end.

I changed the vmbr10 bridge to use eth4.10 on the X540 card, and after ifdown/ifup and corosync and pve-cluster restart, now everything seems good; cluster is stable and omping is happy too after 10 minutes :)

It is strange because multicast is on VLAN 1 network...

Cheers and thanks a lot
Eneko

El 4/12/18 a las 16:18, Eneko Lacunza escribió:

hi Marcus,

El 4/12/18 a las 16:09, Marcus Haarmann escribió:
Hi,

you did not provide details about your configuration.
How is the network card set up ? Bonding ?
Send your /etc/network/interfaces details.
If bonding is active, check if the mode is correct in /proc/net/bonding.
We encountered differences between /etc/network/interfaces setup and resulting mode.
Also, check your switch configuration, VLAN setup, MTU etc.
Yes, sorry about that. I have double checked the switch and all 3 node SFP+ port have the same configuration.

/etc/network/interfaces  in proxmox1 node:
auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual

auto vmbr10
iface vmbr10 inet static
    address  192.168.10.201
    netmask  255.255.255.0
    bridge_ports eth3
    bridge_stp off
    bridge_fd 0

auto vmbr0
iface vmbr0 inet static
    address  192.168.0.201
    netmask  255.255.255.0
    gateway  192.168.0.100
    bridge_ports eth4
    bridge_stp off
    bridge_fd 0

auto eth4.100
iface eth4.100 inet static
    address 10.0.2.1
    netmask 255.255.255.0
    up ip addr add 10.0.3.1/24 dev eth4.100

Cluster is running on vmbr0 network (192.168.0.0/24)

Cheers


Marcus Haarmann


Von: "Eneko Lacunza" <elacunza at binovo.es<mailto:elacunza at binovo.es>>
An: "pve-user" <pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>>
Gesendet: Dienstag, 4. Dezember 2018 15:57:10
Betreff: [PVE-User] Multicast problems with Intel X540 - 10Gtek network card?

Hi all,

We have just updated a 3-node Proxmox cluster from 3.4 to 5.2, Ceph
hammer to Luminous and the network from 1 Gbit to 10Gbit... one of the
three Proxmox nodes is new too :)

Generally all was good and VMs are working well. :-)

BUT, we have some problems with the cluster; promxox1 node joins and
then after about 4 minutes drops from the cluster.

All multicast tests
https://pve.proxmox.com/wiki/Multicast_notes#Using_omping_to_test_multicast
run fine except the last one:

*** proxmox1:

root at proxmox1:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4

proxmox3 : waiting for response msg

proxmox4 : waiting for response msg

proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox3 : given amount of query messages was sent

proxmox4 : given amount of query messages was sent

proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.073/0.184/0.390/0.061

proxmox3 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.092/0.207/0.421/0.068

proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.167/0.369/0.059

proxmox4 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.063/0.185/0.386/0.064


*** proxmox3:

root at proxmox3:/etc# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4

proxmox1 : waiting for response msg

proxmox4 : waiting for response msg

proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox1 : waiting for response msg

proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox4 : given amount of query messages was sent

proxmox1 : given amount of query messages was sent

proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.083/0.193/1.030/0.055

proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.102/0.209/1.050/0.054

proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.041/0.108/0.172/0.026

proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.123/0.190/0.030


*** root at proxmox4:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4

proxmox1 : waiting for response msg

proxmox3 : waiting for response msg

proxmox1 : waiting for response msg

proxmox3 : waiting for response msg

proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging

proxmox1 : given amount of query messages was sent

proxmox3 : given amount of query messages was sent

proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.085/0.188/0.356/0.040

proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.114/0.208/0.377/0.041

proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.117/0.289/0.023

proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.134/0.290/0.026


Ok, so it seems we have a network problem on proxmox1 node. Network
cards are as follows:

- proxmox1: Intel X540 (10Gtek)
- proxmox3: Intel X710 (Intel)
- proxmox4: Intel X710 (Intel)

Switch is Dell N1224T-ON.

Does anyone have experience with Intel X540 chip network cards or Linux
ixgbe network driver or 10Gtek manufacturer?

If we change corosync communication to 1 Gbit network cards (broadcom)
connected to an old HPE 1800-24G switch, cluster is stable...

We also have a running cluster with Dell n1224T-ON switch and X710
network cards without issues.

Thanks a lot
Eneko







_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list