[pve-devel] Quorum problems with NICs Intel of 10 Gb/s and VMsturns off

Alexandre DERUMIER aderumier at odiso.com
Wed Dec 24 12:49:39 CET 2014


>> I'm interested to known what is this option ;) 
>>Mememory Mapped I/O Above 4 GB : Disable 

So you need to disable it, to not have any problem ?

Maybe this is related:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2050443

>>yes, i can to write in /etc/pve 
>>And talking about of the red lights: 
>>After of some hours, the problem mysteriously disappeared. 

That mean that the pvestatd daemon in hanging.

pvestatd check sequentially status of host, vm then storage.

And sometime, a slow/overloaded/hanging storage can block pvestatd.

you can restart the pvestatd daemon ,it should fix the problem.
Check also the logs to see if the daemon is hanging on a specific storage.


>>Moreover, i have doubts over these 3 options (Bios Hardware): 
>>- OS Watchdog timer (option available in all my servers) 

you can use it if you don't use fencing from proxmox. I'll restart the server in case of a kernel panic for example.


>>- I/OAT DMA Engine ( i am testing with two servers DELL R320, each server 
>>with 2 NICs Intel of 1 Gb/s, 4 ports each one) 
Don't known too much about this one.


>>- Dell turbo (i don't remember the exact text), 
>>But the Dell recommendation is enable only in performance profile 
>>This option only appear in servers Dell R720 

Maybe it's related to turbo core, with some intel processor.
Generally, I recommand to turn this off, because it's dynamically shutdown some cores to speedup other cores.
And virtualization don't like this too much, because of changing clock frequency. (bsod under windows)


----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 24 Décembre 2014 08:38:28
Objet: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and VMsturns off

Hi Alexandre 

Thanks for your reply, and here my answers: 

> I'm interested to known what is this option ;) 
Mememory Mapped I/O Above 4 GB : Disable 

>Can you check that you can write to /etc/pve? 
yes, i can to write in /etc/pve 
And talking about of the red lights: 
After of some hours, the problem mysteriously disappeared. 

Moreover, i have doubts over these 3 options (Bios Hardware): 
- OS Watchdog timer (option available in all my servers) 
- I/OAT DMA Engine ( i am testing with two servers DELL R320, each server 
with 2 NICs Intel of 1 Gb/s, 4 ports each one) 
- Dell turbo (i don't remember the exact text), 
But the Dell recommendation is enable only in performance profile 
This option only appear in servers Dell R720 


----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Cesar Peschiera" <brain at click.com.py> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Monday, December 22, 2014 2:58 PM 
Subject: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
VMsturns off 


>>After several checks, I found the problem in these two servers: a 
>>configuration in the Hardware Bios that isn't compatible with the 
>>pve-kernel-3.10.0-5, and my NICs was getting the link to down and after 
>>up. 
>>(i guess that soon i will comunicate my setup of BIOS in Dell R720). 
>>... :-) 

I'm interested to known what is this option ;) 



>>The strange behaviour is that when i run "pvecm status", i get this 
>>message: 
>>Version: 6.2.0 
>>Config Version: 41 
>>Cluster Name: ptrading 
>>Cluster Id: 28503 
>>Cluster Member: Yes 
>Cluster Generation: 8360 
>>Membership state: Cluster-Member 
>>Nodes: 8 
>>Expected votes: 8 
>>Total votes: 8 
>>Node votes: 1 
>>Quorum: 5 
>>Active subsystems: 6 
>>Flags: 
>>Ports Bound: 0 177 
>>Node name: pve5 
>>Node ID: 5 
>>Multicast addresses: 239.192.111.198 
>>Node addresses: 192.100.100.50 

So, you have quorum here. All nodes are ok . I don't see any problem. 


>>And in the PVE GUI i see the red light in all the others nodes. 

That's mean that the pvestatd daemon is hanging/crashed. 


Can you check that you can write to /etc/pve. 

if not, try to restart 

/etc/init.d/pve-cluster restart 

then 

/etc/init.d/pvedaemon restart 
/etc/init.d/pvestatd restart 



----- Mail original ----- 
De: "Cesar Peschiera" <brain at click.com.py> 
À: "aderumier" <aderumier at odiso.com>, "pve-devel" 
<pve-devel at pve.proxmox.com> 
Envoyé: Lundi 22 Décembre 2014 04:01:31 
Objet: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
VMsturns off 

After several checks, I found the problem in these two servers: a 
configuration in the Hardware Bios that isn't compatible with the 
pve-kernel-3.10.0-5, and my NICs was getting the link to down and after up. 
(i guess that soon i will comunicate my setup of BIOS in Dell R720). 
... :-) 

But now i have other problem, with the mix of PVE-manager 3.3-5 and 2.3-13 
versions in a PVE cluster of 8 nodes: I am losing quorum in several nodes 
very often. 

Moreover, for now i can not apply a upgrade to my old PVE nodes, so for the 
moment i would like to know if is possible to make a quick configuration for 
that all my nodes always has quorum. 

The strange behaviour is that when i run "pvecm status", i get this message: 
Version: 6.2.0 
Config Version: 41 
Cluster Name: ptrading 
Cluster Id: 28503 
Cluster Member: Yes 
Cluster Generation: 8360 
Membership state: Cluster-Member 
Nodes: 8 
Expected votes: 8 
Total votes: 8 
Node votes: 1 
Quorum: 5 
Active subsystems: 6 
Flags: 
Ports Bound: 0 177 
Node name: pve5 
Node ID: 5 
Multicast addresses: 239.192.111.198 
Node addresses: 192.100.100.50 

And in the PVE GUI i see the red light in all the others nodes. 

Can apply a some kind of temporal solution as "Quorum: 1" for that my nodes 
can work well and not has this strange behaviour? (Only until I performed 
the updates) 
Or, what will be the more simple and quick temporal solution for avoid to do 
a upgrade in my nodes? 
(something as for example: add to the rc.local file a line that says: "pvecm 
expected 1") 

Note about of the Quorum: I don't have any Hardware fence device enabled, so 
i do not care that each node always have quorum (i always can turns off the 
server manually and brutally if it is necessary). 

----- Original Message ----- 
From: "Cesar Peschiera" <brain at click.com.py> 
To: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Saturday, December 20, 2014 9:30 AM 
Subject: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
VMsturns off 


> Hi Alexandre 
> 
> I put 192.100.100.51 ip address directly to bond0, and i don't have 
> network 
> enabled (as if the node is totally isolated) 
> 
> This was my setup: 
> ------------------- 
> auto bond0 
> iface bond0 inet static 
> address 192.100.100.51 
> netmask 255.255.255.0 
> gateway 192.100.100.4 
> slaves eth0 eth2 
> bond_miimon 100 
> bond_mode 802.3ad 
> bond_xmit_hash_policy layer2 
> 
> auto vmbr0 
> iface vmbr0 inet manual 
> bridge_ports bond0 
> bridge_stp off 
> bridge_fd 0 
> post-up echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping 
> post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_querier 
> 
> ...... :-( 
> 
> Some other suggestion? 
> 
> ----- Original Message ----- 
> From: "Alexandre DERUMIER" <aderumier at odiso.com> 
> To: "Cesar Peschiera" <brain at click.com.py> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Sent: Friday, December 19, 2014 7:59 AM 
> Subject: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
> VMsturns off 
> 
> 
> maybe can you try to put 192.100.100.51 ip address directly to bond0, 
> 
> to avoid corosync traffic going through to vmbr0. 
> 
> (I remember some old offloading bugs with 10gbe nic and linux bridge) 
> 
> 
> ----- Mail original ----- 
> De: "Cesar Peschiera" <brain at click.com.py> 
> À: "aderumier" <aderumier at odiso.com> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Vendredi 19 Décembre 2014 11:08:33 
> Objet: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
> VMsturns off 
> 
>>can you post your /etc/network/interfaces of theses 10gb/s nodes ? 
> 
> This is my configuration: 
> Note: The LAN use 192.100.100.0/24 
> 
> #Network interfaces 
> auto lo 
> iface lo inet loopback 
> 
> iface eth0 inet manual 
> iface eth1 inet manual 
> iface eth2 inet manual 
> iface eth3 inet manual 
> iface eth4 inet manual 
> iface eth5 inet manual 
> iface eth6 inet manual 
> iface eth7 inet manual 
> iface eth8 inet manual 
> iface eth9 inet manual 
> iface eth10 inet manual 
> iface eth11 inet manual 
> 
> #PVE Cluster and VMs (NICs are of 10 Gb/s): 
> auto bond0 
> iface bond0 inet manual 
> slaves eth0 eth2 
> bond_miimon 100 
> bond_mode 802.3ad 
> bond_xmit_hash_policy layer2 
> 
> #PVE Cluster and VMs: 
> auto vmbr0 
> iface vmbr0 inet static 
> address 192.100.100.51 
> netmask 255.255.255.0 
> gateway 192.100.100.4 
> bridge_ports bond0 
> bridge_stp off 
> bridge_fd 0 
> post-up echo 0 > 
> /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping 
> post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_querier 
> 
> #A link for DRBD (NICs are of 10 Gb/s): 
> auto bond401 
> iface bond401 inet static 
> address 10.1.1.51 
> netmask 255.255.255.0 
> slaves eth1 eth3 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #Other link for DRBD (NICs are of 10 Gb/s): 
> auto bond402 
> iface bond402 inet static 
> address 10.2.2.51 
> netmask 255.255.255.0 
> slaves eth4 eth6 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #Other link for DRBD (NICs are of 10 Gb/s): 
> auto bond403 
> iface bond403 inet static 
> address 10.3.3.51 
> netmask 255.255.255.0 
> slaves eth5 eth7 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #A link for the NFS-Backups (NICs are of 1 Gb/s): 
> auto bond10 
> iface bond10 inet static 
> address 10.100.100.51 
> netmask 255.255.255.0 
> slaves eth8 eth10 
> bond_miimon 100 
> bond_mode balance-rr 
> #bond_mode active-backup 
> mtu 9000 
> 



More information about the pve-devel mailing list