[PVE-User] Multicast problems with Intel X540 - 10Gtek network card?

Eneko Lacunza elacunza at binovo.es
Tue Dec 4 17:54:04 CET 2018


Hi all,

Seems I found the solution.

eth3 on proxmox1 is a broadcom 1gbit card connected to HPE switch; it is 
VLAN 10 untagged on the switch end.

I changed the vmbr10 bridge to use eth4.10 on the X540 card, and after 
ifdown/ifup and corosync and pve-cluster restart, now everything seems 
good; cluster is stable and omping is happy too after 10 minutes :)

It is strange because multicast is on VLAN 1 network...

Cheers and thanks a lot
Eneko

El 4/12/18 a las 16:18, Eneko Lacunza escribió:
>
> hi Marcus,
>
> El 4/12/18 a las 16:09, Marcus Haarmann escribió:
>> Hi,
>>
>> you did not provide details about your configuration.
>> How is the network card set up ? Bonding ?
>> Send your /etc/network/interfaces details.
>> If bonding is active, check if the mode is correct in /proc/net/bonding.
>> We encountered differences between /etc/network/interfaces setup and 
>> resulting mode.
>> Also, check your switch configuration, VLAN setup, MTU etc.
> Yes, sorry about that. I have double checked the switch and all 3 node 
> SFP+ port have the same configuration.
>
> /etc/network/interfaces  in proxmox1 node:
> auto lo
> iface lo inet loopback
> iface eth0 inet manual
> iface eth1 inet manual
> iface eth2 inet manual
> iface eth3 inet manual
> iface eth4 inet manual
> iface eth5 inet manual
>
> auto vmbr10
> iface vmbr10 inet static
>     address  192.168.10.201
>     netmask  255.255.255.0
>     bridge_ports eth3
>     bridge_stp off
>     bridge_fd 0
>
> auto vmbr0
> iface vmbr0 inet static
>     address  192.168.0.201
>     netmask  255.255.255.0
>     gateway  192.168.0.100
>     bridge_ports eth4
>     bridge_stp off
>     bridge_fd 0
>
> auto eth4.100
> iface eth4.100 inet static
>     address 10.0.2.1
>     netmask 255.255.255.0
>     up ip addr add 10.0.3.1/24 dev eth4.100
>
> Cluster is running on vmbr0 network (192.168.0.0/24)
>
> Cheers
>
>>
>> Marcus Haarmann
>>
>>
>> Von: "Eneko Lacunza" <elacunza at binovo.es>
>> An: "pve-user" <pve-user at pve.proxmox.com>
>> Gesendet: Dienstag, 4. Dezember 2018 15:57:10
>> Betreff: [PVE-User] Multicast problems with Intel X540 - 10Gtek 
>> network card?
>>
>> Hi all,
>>
>> We have just updated a 3-node Proxmox cluster from 3.4 to 5.2, Ceph
>> hammer to Luminous and the network from 1 Gbit to 10Gbit... one of the
>> three Proxmox nodes is new too :)
>>
>> Generally all was good and VMs are working well. :-)
>>
>> BUT, we have some problems with the cluster; promxox1 node joins and
>> then after about 4 minutes drops from the cluster.
>>
>> All multicast tests
>> https://pve.proxmox.com/wiki/Multicast_notes#Using_omping_to_test_multicast 
>>
>> run fine except the last one:
>>
>> *** proxmox1:
>>
>> root at proxmox1:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
>>
>> proxmox3 : waiting for response msg
>>
>> proxmox4 : waiting for response msg
>>
>> proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox3 : given amount of query messages was sent
>>
>> proxmox4 : given amount of query messages was sent
>>
>> proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.073/0.184/0.390/0.061
>>
>> proxmox3 : multicast, xmt/rcv/%loss = 600/262/56%, 
>> min/avg/max/std-dev = 0.092/0.207/0.421/0.068
>>
>> proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.049/0.167/0.369/0.059
>>
>> proxmox4 : multicast, xmt/rcv/%loss = 600/262/56%, 
>> min/avg/max/std-dev = 0.063/0.185/0.386/0.064
>>
>>
>> *** proxmox3:
>>
>> root at proxmox3:/etc# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
>>
>> proxmox1 : waiting for response msg
>>
>> proxmox4 : waiting for response msg
>>
>> proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox1 : waiting for response msg
>>
>> proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox4 : given amount of query messages was sent
>>
>> proxmox1 : given amount of query messages was sent
>>
>> proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.083/0.193/1.030/0.055
>>
>> proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev 
>> = 0.102/0.209/1.050/0.054
>>
>> proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.041/0.108/0.172/0.026
>>
>> proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev 
>> = 0.048/0.123/0.190/0.030
>>
>>
>> *** root at proxmox4:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
>>
>> proxmox1 : waiting for response msg
>>
>> proxmox3 : waiting for response msg
>>
>> proxmox1 : waiting for response msg
>>
>> proxmox3 : waiting for response msg
>>
>> proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
>>
>> proxmox1 : given amount of query messages was sent
>>
>> proxmox3 : given amount of query messages was sent
>>
>> proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.085/0.188/0.356/0.040
>>
>> proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev 
>> = 0.114/0.208/0.377/0.041
>>
>> proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 
>> 0.048/0.117/0.289/0.023
>>
>> proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev 
>> = 0.064/0.134/0.290/0.026
>>
>>
>> Ok, so it seems we have a network problem on proxmox1 node. Network
>> cards are as follows:
>>
>> - proxmox1: Intel X540 (10Gtek)
>> - proxmox3: Intel X710 (Intel)
>> - proxmox4: Intel X710 (Intel)
>>
>> Switch is Dell N1224T-ON.
>>
>> Does anyone have experience with Intel X540 chip network cards or Linux
>> ixgbe network driver or 10Gtek manufacturer?
>>
>> If we change corosync communication to 1 Gbit network cards (broadcom)
>> connected to an old HPE 1800-24G switch, cluster is stable...
>>
>> We also have a running cluster with Dell n1224T-ON switch and X710
>> network cards without issues.
>>
>> Thanks a lot
>> Eneko
>>
>>
>
>


-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list