[PVE-User] Proxmox 4.1 cluster issue

Meyer, Kevin kevin.meyer at treml-sturm.de
Wed Feb 17 12:51:17 CET 2016


Hi,

check your Switchconfig and make sure IGMP Snooping is configured correctly. Had this problem a few month ago.


Mit freundlichen Gr??en / Best regards

Kevin Meyer
Projekte


Treml & Sturm Datentechnik GmbH
M?hlheimer Stra?e 209
D-63075 Offenbach am Main
Deutschland/Germany

Telefon: +49 (0) 69 - 8990820
Telefax: +49 (0) 69 - 89908233

E-Mail: info at treml-sturm.de<mailto:info at treml-sturm.de>
Internet: http://www.treml-sturm.de<http://www.treml-sturm.de/>

Gesch?ftsf?hrende Gesellschafter:
Johannes Treml und Roland Sturm
Sitz der Gesellschaft: Offenbach am Main
Registergericht: Amtsgericht Offenbach am Main
Registernummer: 5 HRB 10140
USt-ID: DE 182038999
Von: pve-user [mailto:pve-user-bounces at pve.proxmox.com] Im Auftrag von Guy Plunkett
Gesendet: Mittwoch, 17. Februar 2016 12:47
An: PVE User List <pve-user at pve.proxmox.com>
Betreff: Re: [PVE-User] Proxmox 4.1 cluster issue

I've just rebuild all my proxmox heads and created a new cluster. No HA.

This was working just fine before upgrading to proxmox 4.1

Within 5 minutes adding all 4 systems to the cluster proxmox03 and proxmox01 have dropped from the cluster group.

I'm seeing the following filling up the logs

Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e



Feb 17 11:44:48 proxmox01 corosync[3195]: [MAIN  ] Completed service synchronization, ready to provide servi
Feb 17 11:44:54 proxmox01 corosync[3195]: [TOTEM ] A new membership (10.240.0.100:220) was formed. Members l
Feb 17 11:44:54 proxmox01 corosync[3195]: [TOTEM ] Failed to receive the leave message. failed: 3 1
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] notice: members: 4/3172
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [status] notice: members: 4/3172
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [status] notice: node lost quorum
Feb 17 11:44:54 proxmox01 corosync[3195]: [QUORUM] This node is within the non-primary component and will NO
Feb 17 11:44:54 proxmox01 corosync[3195]: [QUORUM] Members[1]: 4
Feb 17 11:44:54 proxmox01 corosync[3195]: [MAIN  ] Completed service synchronization, ready to provide servi
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] crit: received write while not quorate - trigger resync
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] crit: leaving CPG group
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: start cluster connection
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: members: 4/3172
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: all data is up to date



----
Guy



On 17 Feb 2016, at 07:23, Thomas Lamprecht <t.lamprecht at proxmox.com<mailto:t.lamprecht at proxmox.com>> wrote:

Note that /etc/cluster/cluster.conf isn't needed anymore, everything cluster relevant will we read out of /etc/pve/corosync.conf (which looks good as far as I can see).

You said you upgrade, are you really _really_ sure you did not miss a step (no offense)?

I assume you rebuild the cluster cleanly with pvecm addnode <...>?

Can you post also your /etc/hostname and /etc/network/interfaces,
but it seems to be able to connect initially, thus they should be fine...


proxmox04 seems to be the problem, as the other can connect just fine.

Can you post whats happening there with:
$ journalctl -u corosync.service -u pve-cluster.service -b

So we filter out (possible) irrelevant other logging.

cheers,
Thomas
On 02/16/2016 07:46 PM, Guy Plunkett wrote:
Hello,

I've upgraded my Dell M1000 blade centre to Proxmox 4.1. The upgrade seems to go fine, however I can't seem to have all 4 nodes connected at once.  It seems to work for a short time then then one node will disappear,  I can SSH to it just fine, and have to restart corosync and pve-cluster and it will join again, however shortly later another node will disappear.

Finally a node crashes and restarts. There is nothing present in the syslogs as to why this node cashed.

I've spent 2 days fighting with this to try and resolve it.  This was working just fine on 3.x.

Please can someone help here I'm pulling my hair out trying to get this working, and I don't have much left!

Cheers,
-Guy

Feb 16 16:32:50 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35536) was formed. Members
Feb 16 16:32:50 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:32:50 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:32:53 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35540) was formed. Members
Feb 16 16:32:53 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:32:53 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:32:56 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35544) was formed. Members
Feb 16 16:32:56 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:32:56 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:32:59 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35548) was formed. Members
Feb 16 16:32:59 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:32:59 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:33:02 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35552) was formed. Members
Feb 16 16:33:02 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:33:02 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:33:05 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35556) was formed. Members
Feb 16 16:33:05 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:33:05 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:33:08 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35560) was formed. Members
Feb 16 16:33:08 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:33:08 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:33:11 proxmox01 corosync[5747]:  [TOTEM ] A new membership (10.240.0.100:35564) was formed. Members
Feb 16 16:33:11 proxmox01 corosync[5747]:  [QUORUM] Members[3]: 4 3 2
Feb 16 16:33:11 proxmox01 corosync[5747]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 16 16:36:45 proxmox01 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2723" x-info="http://www.rsyslog.com<http://www.rsyslog.com/>"] start
Feb 16 16:36:45 proxmox01 systemd-modules-load[999]: Module 'fuse' is builtin
Feb 16 16:36:45 proxmox01 systemd-modules-load[999]: Inserted module 'vhost_net'
Feb 16 16:36:45 proxmox01 hdparm[1031]: Setting parameters of disc: (none).
Feb 16 16:36:45 proxmox01 lvm[1280]: 3 logical volume(s) in volume group "pve" now active





# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="Cork-Training" config_version="6">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
  <clusternode name="proxmox01" votes="1" nodeid="1"/>
  <clusternode name="proxmox02" votes="1" nodeid="2"/><clusternode name="proxmox03" votes="1" nodeid="3"/><clusternode name="proxmox04" votes="1" nodeid="4"/></clusternodes>

</cluster>


# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox04
    nodeid: 1
    quorum_votes: 1
    ring0_addr: proxmox04
  }

  node {
    name: proxmox03
    nodeid: 2
    quorum_votes: 1
    ring0_addr: proxmox03
  }

  node {
    name: proxmox02
    nodeid: 3
    quorum_votes: 1
    ring0_addr: proxmox02
  }

  node {
    name: proxmox01
    nodeid: 4
    quorum_votes: 1
    ring0_addr: proxmox01
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Cork-Training
  config_version: 6
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.240.0.100
    ringnumber: 0
  }

}




----
Guy







_______________________________________________

pve-user mailing list

pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>

http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160217/84cb8590/attachment.htm>


More information about the pve-user mailing list