[PVE-User] Multicast problems with Intel X540 - 10Gtek network card?

Marcus Haarmann marcus.haarmann at midoco.de
Tue Dec 4 16:09:35 CET 2018


Hi, 

you did not provide details about your configuration. 
How is the network card set up ? Bonding ? 
Send your /etc/network/interfaces details. 
If bonding is active, check if the mode is correct in /proc/net/bonding. 
We encountered differences between /etc/network/interfaces setup and resulting mode. 
Also, check your switch configuration, VLAN setup, MTU etc. 

Marcus Haarmann 


Von: "Eneko Lacunza" <elacunza at binovo.es> 
An: "pve-user" <pve-user at pve.proxmox.com> 
Gesendet: Dienstag, 4. Dezember 2018 15:57:10 
Betreff: [PVE-User] Multicast problems with Intel X540 - 10Gtek network card? 

Hi all, 

We have just updated a 3-node Proxmox cluster from 3.4 to 5.2, Ceph 
hammer to Luminous and the network from 1 Gbit to 10Gbit... one of the 
three Proxmox nodes is new too :) 

Generally all was good and VMs are working well. :-) 

BUT, we have some problems with the cluster; promxox1 node joins and 
then after about 4 minutes drops from the cluster. 

All multicast tests 
https://pve.proxmox.com/wiki/Multicast_notes#Using_omping_to_test_multicast 
run fine except the last one: 

*** proxmox1: 

root at proxmox1:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4 

proxmox3 : waiting for response msg 

proxmox4 : waiting for response msg 

proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox3 : given amount of query messages was sent 

proxmox4 : given amount of query messages was sent 

proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.073/0.184/0.390/0.061 

proxmox3 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.092/0.207/0.421/0.068 

proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.167/0.369/0.059 

proxmox4 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.063/0.185/0.386/0.064 


*** proxmox3: 

root at proxmox3:/etc# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4 

proxmox1 : waiting for response msg 

proxmox4 : waiting for response msg 

proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox1 : waiting for response msg 

proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox4 : given amount of query messages was sent 

proxmox1 : given amount of query messages was sent 

proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.083/0.193/1.030/0.055 

proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.102/0.209/1.050/0.054 

proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.041/0.108/0.172/0.026 

proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.123/0.190/0.030 


*** root at proxmox4:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4 

proxmox1 : waiting for response msg 

proxmox3 : waiting for response msg 

proxmox1 : waiting for response msg 

proxmox3 : waiting for response msg 

proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging 

proxmox1 : given amount of query messages was sent 

proxmox3 : given amount of query messages was sent 

proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.085/0.188/0.356/0.040 

proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.114/0.208/0.377/0.041 

proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.117/0.289/0.023 

proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.134/0.290/0.026 


Ok, so it seems we have a network problem on proxmox1 node. Network 
cards are as follows: 

- proxmox1: Intel X540 (10Gtek) 
- proxmox3: Intel X710 (Intel) 
- proxmox4: Intel X710 (Intel) 

Switch is Dell N1224T-ON. 

Does anyone have experience with Intel X540 chip network cards or Linux 
ixgbe network driver or 10Gtek manufacturer? 

If we change corosync communication to 1 Gbit network cards (broadcom) 
connected to an old HPE 1800-24G switch, cluster is stable... 

We also have a running cluster with Dell n1224T-ON switch and X710 
network cards without issues. 

Thanks a lot 
Eneko 


-- 
Zuzendari Teknikoa / Director Técnico 
Binovo IT Human Project, S.L. 
Telf. 943569206 
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa) 
www.binovo.es 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 



More information about the pve-user mailing list