[pve-devel] need help, lost quorum on all nodes

Stefan Priebe s.priebe at profihost.ag
Wed Jan 16 12:41:21 CET 2013


Any news on this?

Stefan

Am 14.01.2013 20:48, schrieb Stefan Priebe:
> Hi Alexandre,
>
> i can't help with your corosync problem but i'm running 3.6.11 on two
> nodes and on another one 3.7.1 without a problem.
>
> Stefan
>
> Am 14.01.2013 20:47, schrieb Alexandre DERUMIER:
>> Ok, I found the problem.
>>
>> I had installed a custom 3.7 kernel on the upgrade node, and it seem
>> to cause problem to corosync cluster (I don't know why,I'll to
>> investigate tomorrow)
>>
>> maybe it's related to dlm ?
>>
>> ----- Mail original -----
>>
>> De: "Alexandre DERUMIER" <aderumier at odiso.com>
>> À: pve-devel at pve.proxmox.com
>> Envoyé: Lundi 14 Janvier 2013 18:10:35
>> Objet: [pve-devel] need help, lost quorum on all nodes
>>
>> Hi,
>>
>> I have lost quorum on my 8 nodes cluster, when trying to upgrade one
>> node to last stable
>>
>> when the problem occur:
>>
>> Jan 14 17:25:34 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 17:25:34 corosync [CLM ] New Configuration:
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.38)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.40)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.49)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.51)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.52)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.53)
>> Jan 14 17:25:34 corosync [CLM ] Members Left:
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.39)
>> Jan 14 17:25:34 corosync [CLM ] Members Joined:
>> Jan 14 17:25:34 corosync [QUORUM] Members[7]: 1 2 3 4 5 6 8
>> Jan 14 17:25:34 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 17:25:34 corosync [CLM ] New Configuration:
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.38)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.40)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.49)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.51)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.52)
>> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.53)
>> Jan 14 17:25:34 corosync [CLM ] Members Left:
>> Jan 14 17:25:34 corosync [CLM ] Members Joined:
>> Jan 14 17:25:34 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jan 14 17:25:35 corosync [CPG ] chosen downlist: sender r(0)
>> ip(10.3.94.53) ; members(old:8 left:1)
>> Jan 14 17:25:35 corosync [MAIN ] Completed service synchronization,
>> ready to provide service.
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7ca 7cb 7cc 7cd 7ce
>> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd
>> 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8
>> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf
>> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca
>> ....
>> Jan 14 17:29:36 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 17:29:36 corosync [CLM ] New Configuration:
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.40)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.51)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.53)
>> Jan 14 17:29:36 corosync [CLM ] Members Left:
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.38)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.49)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.52)
>> Jan 14 17:29:36 corosync [CLM ] Members Joined:
>> Jan 14 17:29:36 corosync [QUORUM] Members[6]: 1 2 4 5 6 8
>> Jan 14 17:29:36 corosync [QUORUM] Members[5]: 1 2 4 5 8
>> Jan 14 17:29:36 corosync [CMAN ] quorum lost, blocking activity
>> Jan 14 17:29:36 corosync [QUORUM] This node is within the non-primary
>> component and will NOT provide any services.
>> Jan 14 17:29:36 corosync [QUORUM] Members[4]: 1 2 4 8
>> Jan 14 17:29:36 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 17:29:36 corosync [CLM ] New Configuration:
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.40)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.51)
>> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.53)
>> Jan 14 17:29:36 corosync [CLM ] Members Left:
>> Jan 14 17:29:36 corosync [CLM ] Members Joined:
>> Jan 14 17:29:36 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jan 14 17:29:36 corosync [CPG ] chosen downlist: sender r(0)
>> ip(10.3.94.53) ; members(old:7 left:3)
>> Jan 14 17:29:36 corosync [MAIN ] Completed service synchronization,
>> ready to provide service.
>>
>>
>> But I can't get it up anymore
>>
>> I'm trying
>>
>> /etc/init.d/cman restart on each node
>> Starting cluster:
>> Checking if cluster has been disabled at boot... [ OK ]
>> Checking Network Manager... [ OK ]
>> Global setup... [ OK ]
>> Loading kernel modules... [ OK ]
>> Mounting configfs... [ OK ]
>> Starting cman... [ OK ]
>> Waiting for quorum... Timed-out waiting for cluster
>>
>>
>>
>> corosync log of node1 when restart cman
>>
>> Jan 14 18:04:10 corosync [SERV ] Unloading all Corosync service engines.
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> extended virtual synchrony service
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> configuration service
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> cluster closed process group service v1.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> cluster config database access v1.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> profile loading service
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> cluster membership service B.01.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> checkpoint service B.01.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> event service B.01.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> distributed locking service B.03.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> message service B.03.01
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> CMAN membership service 2.90
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais
>> timer service A.01.01
>> Jan 14 18:04:10 corosync [MAIN ] Corosync Cluster Engine exiting with
>> status 0 at main.c:1856.
>> Jan 14 18:04:11 corosync [MAIN ] Corosync Cluster Engine ('1.4.4'):
>> started and ready to provide service.
>> Jan 14 18:04:11 corosync [MAIN ] Corosync built-in features: nss
>> Jan 14 18:04:11 corosync [MAIN ] Successfully read config from
>> /etc/cluster/cluster.conf
>> Jan 14 18:04:11 corosync [MAIN ] Successfully parsed cman config
>> Jan 14 18:04:11 corosync [MAIN ] Successfully configured openais
>> services to load
>> Jan 14 18:04:11 corosync [TOTEM ] Initializing transport (UDP/IP
>> Multicast).
>> Jan 14 18:04:11 corosync [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Jan 14 18:04:11 corosync [TOTEM ] The network interface [10.3.94.49]
>> is now up.
>> Jan 14 18:04:11 corosync [QUORUM] Using quorum provider quorum_cman
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:04:11 corosync [CMAN ] CMAN 1352871249 (built Nov 14 2012
>> 06:34:12) started
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync CMAN
>> membership service 2.90
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais
>> cluster membership service B.01.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais event
>> service B.01.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais
>> checkpoint service B.01.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais
>> message service B.03.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais
>> distributed locking service B.03.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais timer
>> service A.01.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> extended virtual synchrony service
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> configuration service
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> cluster closed process group service v1.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> cluster config database access v1.01
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> profile loading service
>> Jan 14 18:04:11 corosync [QUORUM] Using quorum provider quorum_cman
>> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:04:11 corosync [MAIN ] Compatibility mode set to whitetank.
>> Using V1 and V2 of the synchronization engine.
>> Jan 14 18:04:11 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 18:04:11 corosync [CLM ] New Configuration:
>> Jan 14 18:04:11 corosync [CLM ] Members Left:
>> Jan 14 18:04:11 corosync [CLM ] Members Joined:
>> Jan 14 18:04:11 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 18:04:11 corosync [CLM ] New Configuration:
>> Jan 14 18:04:11 corosync [CLM ] r(0) ip(10.3.94.49)
>> Jan 14 18:04:11 corosync [CLM ] Members Left:
>> Jan 14 18:04:11 corosync [CLM ] Members Joined:
>> Jan 14 18:04:11 corosync [CLM ] r(0) ip(10.3.94.49)
>> Jan 14 18:04:11 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jan 14 18:04:11 corosync [QUORUM] Members[1]: 6
>> Jan 14 18:04:11 corosync [QUORUM] Members[1]: 6
>> Jan 14 18:04:11 corosync [CPG ] chosen downlist: sender r(0)
>> ip(10.3.94.49) ; members(old:0 left:0)
>> Jan 14 18:04:11 corosync [MAIN ] Completed service synchronization,
>> ready to provide service.
>>
>>
>> corosync log of node2 when restart cman
>>
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> extended virtual synchrony service
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> configuration service
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> cluster closed process group service v1.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> cluster config database access v1.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> profile loading service
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> cluster membership service B.01.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> checkpoint service B.01.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> event service B.01.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> distributed locking service B.03.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> message service B.03.01
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> CMAN membership service 2.90
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais
>> timer service A.01.01
>> Jan 14 18:05:30 corosync [MAIN ] Corosync Cluster Engine exiting with
>> status 0 at main.c:1856.
>> Jan 14 18:05:31 corosync [MAIN ] Corosync Cluster Engine ('1.4.4'):
>> started and ready to provide service.
>> Jan 14 18:05:31 corosync [MAIN ] Corosync built-in features: nss
>> Jan 14 18:05:31 corosync [MAIN ] Successfully read config from
>> /etc/cluster/cluster.conf
>> Jan 14 18:05:31 corosync [MAIN ] Successfully parsed cman config
>> Jan 14 18:05:31 corosync [MAIN ] Successfully configured openais
>> services to load
>> Jan 14 18:05:31 corosync [TOTEM ] Initializing transport (UDP/IP
>> Multicast).
>> Jan 14 18:05:31 corosync [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Jan 14 18:05:31 corosync [TOTEM ] The network interface [10.3.94.50]
>> is now up.
>> Jan 14 18:05:31 corosync [QUORUM] Using quorum provider quorum_cman
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:05:31 corosync [CMAN ] CMAN 1352871249 (built Nov 14 2012
>> 06:34:12) started
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync CMAN
>> membership service 2.90
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais
>> cluster membership service B.01.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais event
>> service B.01.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais
>> checkpoint service B.01.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais
>> message service B.03.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais
>> distributed locking service B.03.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais timer
>> service A.01.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> extended virtual synchrony service
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> configuration service
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> cluster closed process group service v1.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> cluster config database access v1.01
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> profile loading service
>> Jan 14 18:05:31 corosync [QUORUM] Using quorum provider quorum_cman
>> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync
>> cluster quorum service v0.1
>> Jan 14 18:05:31 corosync [MAIN ] Compatibility mode set to whitetank.
>> Using V1 and V2 of the synchronization engine.
>> Jan 14 18:05:31 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 18:05:31 corosync [CLM ] New Configuration:
>> Jan 14 18:05:31 corosync [CLM ] Members Left:
>> Jan 14 18:05:31 corosync [CLM ] Members Joined:
>> Jan 14 18:05:31 corosync [CLM ] CLM CONFIGURATION CHANGE
>> Jan 14 18:05:31 corosync [CLM ] New Configuration:
>> Jan 14 18:05:31 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 18:05:31 corosync [CLM ] Members Left:
>> Jan 14 18:05:31 corosync [CLM ] Members Joined:
>> Jan 14 18:05:31 corosync [CLM ] r(0) ip(10.3.94.50)
>> Jan 14 18:05:31 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jan 14 18:05:31 corosync [QUORUM] Members[1]: 4
>> Jan 14 18:05:31 corosync [QUORUM] Members[1]: 4
>> Jan 14 18:05:31 corosync [CPG ] chosen downlist: sender r(0)
>> ip(10.3.94.50) ; members(old:0 left:0)
>> Jan 14 18:05:31 corosync [MAIN ] Completed service synchronization,
>> ready to provide service.
>>
>>
>> Any idea ?
>>
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>



More information about the pve-devel mailing list