[PVE-User] Strange cluster/graphics problem in 3-node cluster

Eneko Lacunza elacunza at binovo.es
Fri May 17 09:56:32 CEST 2019


Hi Gianni,

El 16/5/19 a las 19:38, Gianni Milo escribió:
> Are you using LACP or linux bonding on node2,3 for the VM + cluster traffic
> ?
It's LACP.
> Are you using VLANs to separate VM/cluster traffic ?
Not currently; there are multiple VLANs on that LACP though, but cluster 
VLAN is the same as some VM network.

Our next try will be to undo de LACP an just use 1 link for VMs and the 
other only for the cluster VLAN... and isolate cluster from the current 
VLAN too.
> Have you checked multicast notes in the pve wiki ? Have you tried UDPU
> instead of multicast as last option ?
Yes, but haven't tried UDPU, yet.
> No idea about missing rrd graphs...
This is the strange part, and the reason for my mail. Otherwise I'd be 
preparing maintenance windows to change node's network config right 
away... :)

Thanks a lot
Eneko


>
>
> On Thu, 16 May 2019 at 16:41, Eneko Lacunza <elacunza at binovo.es> wrote:
>
>> Hi all,
>>
>> In a 3-node cluster, we're experiencing a strange clustering problem.
>>
>> Sometimes, the first node drops out of quorum, usually for some hours,
>> only to return back to quorum later.
>>
>> During the last 2 weeks, this has happened 7 times.
>>
>> Additionally, one time the second and third node dropped out of quorum,
>> and soon after first and third node reached quorum. Second node rejoined
>> after a manual restart of pve-cluster.
>>
>> The strange thing (at least for me) is that 2nd and 3rd node have lost
>> rrd data around the times 1st node was out (no graphics at GUI for those
>> hours). 1st node has all rrd data, graphics are complete.
>>
>> I understand that we could have a network problem (we're trying to catch
>> the problem live again for additional tests...), but why is rrd data
>> missing on cluster-joined nodes? Any idea?
>>
>>
>> Servers:
>> node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for
>> storage
>> node2 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G
>> for storage
>> node3 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G
>> for storage
>>
>> VM storage is EMC VNXe3200
>> Switch is HP 5406zl with 5 switch-modules.
>> - Node1 is connected to module E (8x10G),
>> - node2 and node3 are connected to module A (24x1G).
>> Storage switches(2) are Cisco Catalyst 2960G
>>
>> Nodes have plenty of free RAM (usage below 50%), use less than 10-20%
>> max network, CPU mean use is below 20%)
>>
>> (for all three nodes)
>> # pveversion -v
>> proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
>> pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
>> pve-kernel-4.15: 5.2-12
>> pve-kernel-4.15.18-9-pve: 4.15.18-30
>> corosync: 2.4.4-pve1
>> criu: 2.11.1-1~bpo90
>> glusterfs-client: 3.8.8-1
>> ksm-control-daemon: 1.2-2
>> libjs-extjs: 6.0.1-2
>> libpve-access-control: 5.1-3
>> libpve-apiclient-perl: 2.0-5
>> libpve-common-perl: 5.0-43
>> libpve-guest-common-perl: 2.0-18
>> libpve-http-server-perl: 2.0-11
>> libpve-storage-perl: 5.0-33
>> libqb0: 1.0.3-1~bpo9
>> lvm2: 2.02.168-pve6
>> lxc-pve: 3.0.2+pve1-5
>> lxcfs: 3.0.2-2
>> novnc-pve: 1.0.0-2
>> proxmox-widget-toolkit: 1.0-22
>> pve-cluster: 5.0-31
>> pve-container: 2.0-31
>> pve-docs: 5.3-1
>> pve-edk2-firmware: 1.20181023-1
>> pve-firewall: 3.0-16
>> pve-firmware: 2.0-6
>> pve-ha-manager: 2.0-5
>> pve-i18n: 1.0-9
>> pve-libspice-server1: 0.14.1-1
>> pve-qemu-kvm: 2.12.1-1
>> pve-xtermjs: 1.0-5
>> qemu-server: 5.0-43
>> smartmontools: 6.5+svn4324-1
>> spiceterm: 3.0-5
>> vncterm: 1.5-3
>> zfsutils-linux: 0.7.12-pve1~bpo1
>>
>>
>> Thanks a lot
>> Eneko
>>
>> --
>> Zuzendari Teknikoa / Director Técnico
>> Binovo IT Human Project, S.L.
>> Telf. 943569206
>> Astigarraga bidea 2
>> <https://maps.google.com/?q=Astigarraga+bidea+2&entry=gmail&source=g>, 2º
>> izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
>> www.binovo.es
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list